H. Higuchi, Y. Sagawa, and N. Sugie (Japan)
Intelligent Information Systems, Sound Source Separation, Spectrogram, Onset, Offset, Image Processing
We propose a method for separating two almost concurrently uttered speeches recorded by a pair of microphones. First, two spectrograms are generated from signals recorded by a pair of microphones. The onsets and the offsets of frequency components are extracted as the features using image processing techniques. Then the correspondences of the features between the spectrograms are determined and the intermicrophone time differences are extracted. Each of frequency components with the common onset/offset occurrences and time difference are grouped together as originating one of the speech signals. A set of band-pass filters are prepared corresponding to each group of frequency components. Finally, each of the separated speech signals is extracted by applying the set of band-pass filters to the voice signal recorded by a microphone. Experiments were conducted with the mixture of a male speech sound and a female speech sound both consisting of Japanese vowels. The evaluation results demonstrated that the separation was done reasonably well with the proposed method.
Important Links:
Go Back