Semi-Supervised End-to-End Speech Recognition

Case ID:
C15375

Unmet Need

The speech and voice recognition market is set to reach 25 B at 20% CAGR through 2025. The development of software with low error rate and fast conversion between speech and text has been of great interest. Specifically, there is a need for speech and text encoders for speech only or text only datasets without parallel speech and text data.

Technology Overview

The inventors have developed a novel semi-supervised method for end-to-end automatic speech recognition (ASR). It can exploit large unpaired speech and text datasets, which require much less human effort to create paired speech-to-text datasets. By combining speech-to-text and text-to-text mappings through a shared network, improvements can be made to speech-to-text mapping by learning to reconstruct the unpaired text data in a semisupervised end-to-end manner. The experimental results obtained with the proposed semi-supervised training shows a larger character error rate reduction from 15.8% to 14.4% than a conventional language model integration on the Wall Street Journal dataset.

Stage of Development

Prototype proof of concept testing has been completed.

Patent Information:
Title App Type Country Serial No. Patent No. File Date Issued Date Expire Date Patent Status
TRAINING APPARATUS, TRAINING METHOD, AND TRAINING PROGRAM ORD: Ordinary Utility Japan 2019-159953   9/2/2019     Pending
Inventors:
Category(s):
Get custom alerts for techs in these categories/from these inventors:
For Information, Contact:
Andrew Wichmann
wichmann@jhu.edu
410-614-0300
Save This Technology:
2017 - 2022 © Johns Hopkins Technology Ventures. All Rights Reserved. Powered by Inteum