Speech emotion recognition in Spanish TV Debates

Zubiaga Amar, Irune; Justo Blanco, Raquel; De Velasco Vázquez, Mikel; Torres Barañano, María Inés

Ver/

Postprint (180.5Kb)

Fecha

2022

Autor

Zubiaga Amar, Irune

Justo Blanco, Raquel

De Velasco Vázquez, Mikel

Torres Barañano, María Inés

Metadatos

Mostrar el registro completo del ítem

Estadisticas en RECOLECTA
(LA Referencia)

Proceedings of IberSPEECH : 186-190 (2022)

URI

http://hdl.handle.net/10810/59459

Resumen

Emotion recognition from speech is an active field of study that can help build more natural human-machine interaction systems. Even though the advancement of deep learning technology has brought improvements in this task, it is still a very challenging field. For instance, when considering real life scenarios, things such as tendency toward neutrality or the ambiguous definition of emotion can make labeling a difficult task causing the data-set to be severally imbalanced and not very representative. In this work we considered a real life scenario to carry out a series of emotion classification experiments. Specifically, we worked with a labeled corpus consisting of a set of audios from Spanish TV debates and their respective transcriptions. First, an analysis of the emotional information within the corpus was conducted. Then different data representations were analyzed as to choose the best one for our task; Spectrograms and UniSpeech-SAT were used for audio representation and DistilBERT for text representation. As a final step, Multimodal Machine Learning was used with the aim of improving the obtained classification results by combining acoustic and textual information.