Chi Xu


Multi-modal emotion recognition, music features, Hevner model, convolutional neural network


Music is one of the ways to express emotions. Recognising and extracting the emotional features of music and recommending them to different audiences according to different emotions is currently a hot research topic. In this paper, the Hevner model was used to describe different music emotions, and a multi-modal emotion recognition approach was adopted to extract features of music in two modalities: audio and lyrics text. The convolutional neural network (CNN) model classified the emotional features of the data set. The experimental results showed that compared with single- modal recognition, the precision, and recall rate of the multi-modal emotion recognition proposed in this paper were both increased by more than 30%, with an increase of about 0.3 in value. At the same time, the ten-fold cross-validation accuracy of music emotion recognition under the CNN method was 95.36%, and the recognition time was 17.66 s, which was better than the support vector machine (SVM) and the Bayesian models. Better emotional classification of music is an important foundation for accurately recommending music to different audiences. The experimental results prove that in the future, the multi-modal emotion recognition approach can be used to extract music features and classify music using the CNN model, and this method has high accuracy.

Important Links:

Go Back