W.M. Liu, V.J. Rivas Bastante, F. Romero Rodriguez, N.W.D. Evans, and J.S.D. Mason (UK)
ASR (automatic speech recognition), segmentation, morphological filtering.
This paper examines the separation of speech signals from additive noise using a recently proposed signal, noise segmentation approach based on statistical properties of the spectrogram [1,2]. Competitive ASR results were reported in [3] despite using only crude spectrogram shape information suggesting that the approach offers high reliability in identifying regions of different signal dominance and might be robust down to negative SNRs. This paper extends these early results in two directions. First extension investigates the contribution of spectrogram shapes plus magnitudes versus shapes alone, the same ASR experiments as in [3] are repeated but this time with magnitude information recovered in regions deemed to contain speech. Results show consistent improvement for all SNRs down to -5dB. Second extension relates to computational efficiency, a modified one-pass version of the originally iterative process is proposed by deducing empirically an optimal final stopping condition for each SNR. This is found to reduce computational time significantly (factors ranging from 7 to 18) whilst improving ASR accuracy.
Important Links:
Go Back