H. Sato, Y. Mitsukura, M. Fukumi, and N. Akamatsu (Japan)
emotional speech classification, emotional space map, prosidy, pitch, the length of utterance time, neural network
Prosodic characteristics obtained from speech are very important elements to characterize human emotion. In this paper, we pay attention to fundamental frequency (pitch) and the length of utterance time (tempo) parameters extracted from prosodic characteristics in particular. A new emotional speech classification method is proposed by analyzing these parameters obtained from teh emotional speech and using a neural network (NN). In the present research, the human emotions are classified broadly into four patterns such as neutral, angry, sad, and joyful. We used two method, the cepstrum analysis and our proprietary IC combined analog and digital, for pitch extraction. Moreover, input values to NN are then emotional pitch patterns, which are time-varying. It is shown that NN can achieve emotional speech classification and generation of a space map by learning each emotional pitch pattern by means of computer simulations.
Important Links:
Go Back