Improving Speaker Detection in Multi-speaker Utterances through Automatic Purification of Training Data

D.C. Smith and D. Richman (USA)


- Speech processing, speaker detection, training data purification, Gaussian mixture models, ROC curves, score normalization.


This article is concerned with automatic purification of data used in training statistical models for automatic speaker detection. It is assumed that the available data for training a model for detecting a particular speaker of inter est (SOI) is contaminated by utterances from at least one other speaker. Our approach consists of three steps: (1) build a Gaussian mixture model (GMM) for the SOI, train ing on the contaminated training files; (2) score consecu tive segments of these training files with this GMM; (3) build a new purified GMM from highest scoring segments. We apply our method to a set of SOIs from the Switch Board I corpus (using summed conversation sides), and show that the purified GMMs are significantly more accu rate than the contaminated GMMs for detecting the pres ence of the SOIs in test data known to contain multi speaker utterances. This evaluation is text-independent, and no assumptions about the identity or relationship of the non-SOIs in the training and testing data are made.

Important Links:

Go Back