A Perceptual Masking based Feature Set for Speech Recognition

Padmalochini Umakanthan and Kaliappan Gopalan

Keywords

Psychoacoustics, speech recognition, dynamic time warping, cepstral features

Abstract

This paper proposes a set of features based on the psychoacoustic masking phenomenon of human auditory system for speech recognition. Features are determined using the difference between spectral energy of speech frames and their global masking thresholds in each of 17 bands of an utterance. Performance of the proposed features in a keyword spotting experiment employing dynamic time warping for feature matching showed the viability of the perceptually significant feature set. For multisyllabic words, features from both the proposed set and mel frequency cepstral coefficients (MFCCs) performed equally while for monosyllabic words the proposed set outperformed MFCCs.

Important Links:



Go Back