Normalization on Subband Temporal Envelopes for Large Vocabulary Continuous Speech Recognition

X. Lu, M. Unoki, and S. Nakamura (Japan)

Keywords

Temporal modulation, modulation normalization, modulation transfer function, noisy robust speech recognition.

Abstract

We have proposed a robust feature extraction algorithm for speech recognition in reverberant environments based on the normalization of the subband temporal envelopes (STEs). In the algorithm, the STEs of clean and rever berated speech were both transformed to a reference space via modulation transfer functions (MTFs). We have con firmed its effectiveness on AURORA-2J corpus with simu lated reverberation environments. However, the AURORA 2J task did not need to pay much attention to the prob lem of losing the discriminative information among acous tic models in feature extraction since there were only a few acoustic models (10 digits). For large vocabulary con tinuous speech recognition (LVCSR), there are more than tens of thousands of acoustic models (tri-phone models). The discriminative information among acoustic models is important in robust feature processing. Therefore, in this study, we tested our proposed algorithm on a LVCSR task in simulated reverberant environments. Furthermore, we extended the algorithm for LVCSR for additive noise con ditions. The performance of using the Mel frequency cep stral coefficients with mean and variance normalization was used as the baseline. Experimental results showed that, by using the proposed algorithm, there were 10.34% and 19.17% relative improvements on the LVCSR for reverber ation, and additive noise conditions, respectively.

Important Links:

Go Back