Unmet Need
The speech and voice recognition market is set to reach 25 B at 20% CAGR through 2025. The development of software with low error rate and fast conversion between speech and text has been of great interest. Specifically, there is a need for speech and text encoders for speech only or text only datasets without parallel speech and text data.
Technology Overview
The inventors have developed a novel semi-supervised method for end-to-end automatic speech recognition (ASR). It can exploit large unpaired speech and text datasets, which require much less human effort to create paired speech-to-text datasets. By combining speech-to-text and text-to-text mappings through a shared network, improvements can be made to speech-to-text mapping by learning to reconstruct the unpaired text data in a semisupervised end-to-end manner. The experimental results obtained with the proposed semi-supervised training shows a larger character error rate reduction from 15.8% to 14.4% than a conventional language model integration on the Wall Street Journal dataset.
Stage of Development
Prototype proof of concept testing has been completed.