C. Li, R. Venkateswarlu, and Y. Xu (Singapore)
Speaker recognition, End-point detection, Noisy speech, Three-step.
Many papers have addressed speaker recognition, its importance and convenience in real applications [1]. Finding an endpoint of speech is very important for speaker recognition. In a noise free environment [2], it is not difficult to do so by traditional methods, like short time energy, zero-crossing rate, etc. However, if the speakers use the natural mode to give the utterance in real-life environment, the system's performance would degrade significantly because of additional noise and pause. This paper addresses a robust and efficient endpoint detection algorithm against noisy environment. In noisy condition where endpoint could not be determined precisely, one possible method is to shift reference model along test utterance (relax start-end point DTW [3]). We propose a method called `three-step endpoint (3S-EP)' which results in nearly 20% ERR (Equal Error Rate) improvement over the baseline system.
Important Links:
Go Back