Cepstrum Interpolation towards Robust Speech Recognition over the Phone

H. Zhang and J. Xu (Singapore)


Automatic Speech Recogntion, Acoustic Modeling, Acoustic Adaptation, Bayesian learnging.


Today, the performance of even the best state-of-the-art Automatic Speech Recognition (ASR) tends to deteriorate obviously when speech is transmitted over telephone lines. How to improve ASR robustness in noisy channel environments becomes a life and death problem for many real applications. The challenge in addressing such network environments is that they change every moment and show quite different characteristics in terms of signal to-noise ratio (SNR), stationarity and spectral structure. Previous adaptation methods with complex parameterization could not follow these channel-related variations reliably during the process of a single utterance. So an online adaptation especially designed for noisy channel environments is necessary. In this paper, a prototype library is established to describe acoustic similarities by exploring large amount of channel contaminated data. The pre-calculated statistics of this library makes it possible to implement a fast channel selection reliably. Furthermore, a Bayesian learning scheme is developed to compensate channel distortion dynamically through a linear interpolation across the library. In our experiments, the new method leads to 10% relative reduction in Word Error Rate (WER) with respect to conventional Maximum Likelihood Linear Regression (MLLR).

Important Links:

Go Back