Paper
8 June 2023 Speech emotion recognition based on CNN-MGU-attention
Yanni Wang
Author Affiliations +
Proceedings Volume 12707, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2023); 1270743 (2023) https://doi.org/10.1117/12.2680932
Event: International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2023), 2023, Changsha, China
Abstract
Speech is the most natural way of human communication, carrying a wealth of emotional information, and emotion is an important part of human cognitive process, so speech emotion recognition (SER) has become an important research direction in the field of pattern recognition. SER technology enables computers to understand human emotions, which is an indispensable step to realize artificial intelligence. However, there are some problems in the field of SER, such as the lack of effective emotion feature set and effective emotion recognition model. Starting from improving the recognition performance of SER model, this paper builds a new model CNN-MGU-Attention on the basis of deep neural network to improve the performance of speech emotion recognition system. The model is composed of convolutional neural network (CNN), minimum gated unit (MGU) and attention mechanism. Firstly, MGU can simplify the gate structure to the smallest possible state, and has far fewer parameters than long Short-Term memory (LSTM) and gated unit (GRU), which greatly reduces the training complexity and improves the training speed. Secondly, the attention mechanism was used to learn the correlation degree between the input and output sequences of the model, so as to pay more attention to the effective information. Finally, the Softmax layer was used to classify the sentiment. Experiments show that the CNNMGU-Attention network model proposed in this study achieves an average recognition accuracy of 88.90%and 86.21% on the CASIA and EMODB databases respectively, which achieves better recognition performance compared with previous research results.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yanni Wang "Speech emotion recognition based on CNN-MGU-attention", Proc. SPIE 12707, International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2023), 1270743 (8 June 2023); https://doi.org/10.1117/12.2680932
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Emotion

Speech recognition

Education and training

Databases

Image processing

Data modeling

Neural networks

Back to Top