Semi-Supervised End-to-End Speech Recognition

Case ID:

C15375

Unmet Need

The speech and voice recognition market is set to reach 25 B at 20% CAGR through 2025. The development of software with low error rate and fast conversion between speech and text has been of great interest. Specifically, there is a need for speech and text encoders for speech only or text only datasets without parallel speech and text data.

Technology Overview

The inventors have developed a novel semi-supervised method for end-to-end automatic speech recognition (ASR). It can exploit large unpaired speech and text datasets, which require much less human effort to create paired speech-to-text datasets. By combining speech-to-text and text-to-text mappings through a shared network, improvements can be made to speech-to-text mapping by learning to reconstruct the unpaired text data in a semisupervised end-to-end manner. The experimental results obtained with the proposed semi-supervised training shows a larger character error rate reduction from 15.8% to 14.4% than a conventional language model integration on the Wall Street Journal dataset.

Stage of Development

Prototype proof of concept testing has been completed.

Patent Information: