S.-A. Selouani, H. Tolba, and D. O'Shaughnessy (Canada)
Speech Recognition, Formants, Ear-model, MFCCs, MultiStream Paradigm, HMMs
This paper introduces a multi-stream paradigm of acoustic front-end processing. It aims at improving the performance of Hidden Markov Model (HMM)-based Automatic Speech Recognition (ASR) systems by combining features based on human audition and the Fourier power spectrum. Auditory-based acoustic distinctive cues are merged with classical MFCCs and formants to constitute a new multivariate feature vector. Experiments using the HMM Toolkit (HTK) and a subset of the TIMIT database were carried out to test the effectiveness of this new acoustic representation. Results showed that an improvement by about 6:5% of the word recognition accuracy can be achieved when multi stream approach N-mixture tri-phone models and a bigram language model are used.
Important Links:
Go Back