Improving the Readability of Class Lecture Automatic Speech Recognition Results using Multiple Hypotheses

Y. Fujii; K. Yamamoto; S. Nakagawa

doi:10.2316/P.2010.678-091

Improving the Readability of Class Lecture Automatic Speech Recognition Results using Multiple Hypotheses

Y. Fujii, K. Yamamoto, and S. Nakagawa (Japan)

Keywords

improving readability, confusion network, automatic speech recognition, classroom lecture speech

Abstract

This paper presents a method for improving the readability of class lecture Automatic Speech Recognition (ASR) results, which hitherto have been difﬁcult for humans to understand, even in the absence of recognition errors. This is because the speech in a class lecture is relatively casual and contains many ill-formed utterances with ﬁlled pauses, restarts, and so on. Recently there has been extensive research on paraphrasing and correcting recognition results. However, research on improving the readability of recognition results has focused mainly on manually transcribed texts, but not ASR results. Due to the presence of many kinds of speciﬁc words and the casual style, even state-of the-art recognizers can only achieve a 30-50% word error rate (WER) for the speech in class lectures. In this paper, we propose a novel method that utilizes multiple hy potheses of the ASR results to improve readability of the recognition results. Experimental results show the pro posed method resemble the manually paraphrased text the most and subjective test show the proposed method improve the readability of the ASR results under erroneous conditions where WER is high and 37.7%.

Important Links:

DOI: 10.2316/P.2010.678-091
From Proceeding (678) Signal Processing, Pattern Recognition and Applications - 2010

Go Back