Conference paper
Zero-shot Cross-lingual Speech Emotion Recognition: A Study of Loss Functions and Feature Importance
Deep learning has led to the rapid advancement of speech emotion recognition (SER) hence enabling its application and deployment in wide ranging applications and sectors. However, conventional challenges like generalizing over unseen corpora and languages, and newer challenges like the lack of interpretability and transparency of deep learning models impact the security of these methods, thereby negatively influencing their usability and acceptability in real-world applications.
Here, we address this gap by investigating the influence of the formulation and design of the learning function on the ability to transfer emotion representation learned in one language to other languages. Furthermore, we examine the importance of the different feature groups for the emotion classes, and the associations between the feature groups and the learning functions.
From the evaluation, we conclude that the dimensional model of emotion, specifically activation is more transferable than emotion classes over unseen languages than valence. However, this transferability does not necessarily translate to higher classification accuracy.
Language: | English |
---|---|
Year: | 2022 |
Proceedings: | 2<sup>nd</sup> Symposium on Security and Privacy in Speech Communication |
Types: | Conference paper |
DOI: | 10.21437/SPSC.2022-5 |
ORCIDs: | Das, Sneha and Clemmensen, Line Katrine Harder |