M. Kim, Y.R. Oh, and H.K. Kim (Korea)
Non-native speech recognition, pronunciation variation modeling, multiple pronunciation dictionary, confusability measure
This paper addresses issues associated with an efficient pronunciation variation modeling for non-native automatic speech recognition (ASR), where non-native speech is mostly characterized by different pronunciation from native speech. In order to improve the performance of non-native ASR, a multiple pronunciation dictionary using an indirect data-driven approach is first proposed. However, this approach results in an increased search space for ASR decoding due to the increase of the dictionary size. Therefore, we propose a method for optimizing the size of the multiple pronunciation dictionary by removing some confusable pronunciation variants in the dictionary. To this end, a confusability measure is also proposed here based on the Levenshtein distance between two different pronunciation variants. In addition, the number of phonemes for each pronunciation variant is used to optimize the dictionary size. To investigate the effect of the proposed approach on ASR performance, English is selected as a target language and English utterances spoken by Koreans are considered as non-native speech. It is shown from the continuous non native ASR experiments that the ASR system using the optimized multiple pronunciation dictionary can achieve the average word error rate reduction by 13.53% with less computational complexity by 21.10% relatively, compared with that using the multiple pronunciation dictionary without optimization.
Important Links:
Go Back