Modeling Spontaneous Speech Variability in Professional Dictation

H. Schramm; X. Aubert; B. Bakker; C. Meyer; H. Ney

Modeling Spontaneous Speech Variability in Professional Dictation

H. Schramm, X. Aubert, B. Bakker, C. Meyer, and H. Ney (Germany)

Keywords

Automatic speech recognition, spontaneous speech, pronunciation modeling, speaking rate modeling, speakingrate compensation, filled-pause modeling

Abstract

In this paper we present a technique for improved acous tic and pronunciation modeling of speech variabilities of different origin. For refined representation of the different speech variability classes the method applies class-specific acoustic and pronunciation modeling and recombines the specific models using a lexicon-based word-level model combination technique. A theoretical framework for the word-level model combination is provided that incorpo rates alternative pronunciations and acoustic models in a weighted sum of acoustic probabilities. This technique may in general be used to model various speech varieties. In a first step, however, we applied it to rate-of-speech and filled-pause related variability only. On a highly sponta neous real-life medical dictation task, we observed a 12% relative improvement of the word error rate.

Important Links:

DOI:
From Proceeding (444) Signal and Image Processing - 2004

Go Back