The automatic speech emotion recognition has important applications in human-machine communication. Majority of current research in this area is focused on finding optimal feature parameters. In recent studies, several glottal features were examined as potential cues for emotion differentiation. In this study, a new type of feature parameter is proposed, which calculates energy entropy on values within selected Wavelet Packet frequency bands. The modeling and classification tasks are conducted using the classical GMM algorithm. The experiments use two data sets: the Speech Under Simulated Emotion (SUSE) data set annotated with three different emotions (angry, neutral and soft) and Berlin Emotional Speech (BES) database annotated with seven different emotions (angry, bored, disgust, fear, happy, sad and neutral). The average classification accuracy achieved for the SUSE data (74%-76%) is significantly higher than the accuracy achieved for the BES data (51%-54%). In both cases, the accuracy was significantly higher than the respective random guessing levels (33% for SUSE and 14.3% for BES).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.