Evaluation of STT technologies performance and database design for Spanish dysarthric speech

Madina González, Margot

dc.contributor.advisor	Navas Cordón, Eva
dc.contributor.advisor	Hernáez Rioja, Inmaculada
dc.contributor.author	Madina González, Margot
dc.date.accessioned	2023-06-30T14:51:04Z
dc.date.available	2023-06-30T14:51:04Z
dc.date.issued	2023-06-30
dc.identifier.uri	http://hdl.handle.net/10810/61821
dc.description.abstract	[EN] Automatic Speech Recognition (ASR) systems have become an everyday use tool worldwide. Their use has spread throughout these last years and they have also been implemented in Environmental Control Systems (ECS) or Speech Generating Devices (SGD), among others. These systems might be especially beneficial for people with physical disabilities, as they would be able to control different devices with voice commands, therefore reducing the physical effort they have to make. However, people with functional diversity usually present difficulties in speech articulation too. One of the most common speech articulation problems is dysarthria, a disorder in the nervous system which causes weakness in muscles used for speech. Existing commercial ASR systems are not able to correctly understand dysarthric speech, so people with this condition cannot exploit this technology. Some investigation tackling this issue has been conducted, but an optimal solution has not been reached yet. On the other hand, nearly all existing investigation on the matter is in English, no previous study has approached the problem in other languages. Apart form this, ASR systems require of large speech databases, which are currently very few, most of them in English and they have not been designed for this end. Some commercial ASR systems offer a customization interface where users can train a base model with their speech data and thus improve the recognition accuracy. In this thesis, we evaluated the performance of the commercial ASR system Microsoft Azure Speech to Text. First, we reviewed the current state of the art. Then, we created a pilot database in Spanish and recorded it with 3 heterogeneous people with dysarthria and 1 typical speaker to be used as reference. Lastly, we trained the system and conducted different experiments to measure its accuracy. Results show that, overall, the customized models outperform the base models of the system. However, the results were not homogeneous, but vary depending on the speaker. Even though the recognition accuracy improved considerably, the results were far from being as good as those obtained for typical speech.	es_ES
dc.language.iso	eng	es_ES
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	automatic speech recognition	es_ES
dc.subject	dysarthria
dc.subject	intelligibility
dc.subject	Spanish
dc.title	Evaluation of STT technologies performance and database design for Spanish dysarthric speech	es_ES
dc.type	info:eu-repo/semantics/masterThesis
dc.date.updated	2021-06-14T09:00:34Z
dc.language.rfc3066	es
dc.rights.holder	© 2021, la autora
dc.contributor.degree	Máster Universitario en Análisis y Procesamiento del Lenguaje
dc.contributor.degree	Hizkuntzaren Azterketa eta Prozesamendua Unibertsitate Masterra
dc.identifier.gaurregister	114375-729038-09	es_ES
dc.identifier.gaurassign	123042-729038	es_ES

Files in this item

Name:: TFM_Margot_Madina.pdf
Size:: 4.194Mb
Format:: PDF
Description:: Master_Thesis

View/Open

This item appears in the following Collection(s)

Máster Universitario en Análisis y Procesamiento del Lenguaje

Show simple item record