H. Schramm, X. Aubert, B. Bakker, C. Meyer, and H. Ney (Germany)
Automatic speech recognition, spontaneous speech, pronunciation modeling, speaking rate modeling, speakingrate compensation, filled-pause modeling
In this paper we present a technique for improved acous tic and pronunciation modeling of speech variabilities of different origin. For refined representation of the different speech variability classes the method applies class-specific acoustic and pronunciation modeling and recombines the specific models using a lexicon-based word-level model combination technique. A theoretical framework for the word-level model combination is provided that incorpo rates alternative pronunciations and acoustic models in a weighted sum of acoustic probabilities. This technique may in general be used to model various speech varieties. In a first step, however, we applied it to rate-of-speech and filled-pause related variability only. On a highly sponta neous real-life medical dictation task, we observed a 12% relative improvement of the word error rate.
Important Links:
Go Back