Show simple item record

dc.contributor.authorRaman, Sneha
dc.contributor.authorSarasola Aramendia, Xabier
dc.contributor.authorNavas Cordón, Eva ORCID
dc.contributor.authorHernáez Rioja, Inmaculada ORCID
dc.date.accessioned2021-07-15T07:34:00Z
dc.date.available2021-07-15T07:34:00Z
dc.date.issued2021-06-26
dc.identifier.citationApplied Sciences 11(13) : (2021) // Article ID 5940es_ES
dc.identifier.issn2076-3417
dc.identifier.urihttp://hdl.handle.net/10810/52459
dc.description.abstractPathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set.es_ES
dc.description.sponsorshipThis project was supported by funding from the European Union’s H2020 research and innovation programme under the MSCA GA 675324 (the ENRICH network: www.enrich-etn.eu (accessed on 25 June 2021)), and the Basque Government (PIBA_2018_1_0035 and IT355-19).es_ES
dc.language.isoenges_ES
dc.publisherMDPIes_ES
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/675324es_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/es/
dc.subjectpathological speeches_ES
dc.subjectvoice conversiones_ES
dc.subjectintelligibilityes_ES
dc.subjectspeech recognitiones_ES
dc.titleEnrichment of Oesophageal Speech: Voice Conversion with Duration-Matched Synthetic Speech as Targetes_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.date.updated2021-07-08T14:21:46Z
dc.rights.holder© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).es_ES
dc.relation.publisherversionhttps://www.mdpi.com/2076-3417/11/13/5940es_ES
dc.identifier.doi10.3390/app11135940
dc.contributor.funderEuropean Commission
dc.departamentoesIngeniería de comunicaciones
dc.departamentoeuKomunikazioen ingeniaritza


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Except where otherwise noted, this item's license is described as © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).