LSTM based voice conversion for laryngectomees

Serrano García, Luis; Tavarez Arriba, David; Sarasola, Xabier; Raman, Sneha; Saratxaga Couceiro, Ibon; Navas Cordón, Eva; Hernáez Rioja, Inmaculada

dc.contributor.author	Serrano García, Luis
dc.contributor.author	Tavarez Arriba, David
dc.contributor.author	Sarasola, Xabier
dc.contributor.author	Raman, Sneha
dc.contributor.author	Saratxaga Couceiro, Ibon
dc.contributor.author	Navas Cordón, Eva
dc.contributor.author	Hernáez Rioja, Inmaculada
dc.date.accessioned	2019-05-15T15:33:18Z
dc.date.available	2019-05-15T15:33:18Z
dc.date.issued	2018-11-23
dc.identifier.citation	IberSPEECH 2018 21-23 November 2018, Barcelona, Spain : 122-126 (2018)	es_ES
dc.identifier.uri	http://hdl.handle.net/10810/32818
dc.description.abstract	This paper describes a voice conversion system designed withthe aim of improving the intelligibility and pleasantness of oe-sophageal voices. Two different systems have been built, oneto transform the spectral magnitude and another one for thefundamental frequency, both based on DNNs. Ahocoder hasbeen used to extract the spectral information (mel cepstral co-efficients) and a specific pitch extractor has been developed tocalculate the fundamental frequency of the oesophageal voices.The cepstral coefficients are converted by means of an LSTMnetwork. The conversion of the intonation curve is implementedthrough two different LSTM networks, one dedicated to thevoiced unvoiced detection and another one for the predictionof F0 from the converted cepstral coefficients. The experi-ments described here involve conversion from one oesophagealspeaker to a specific healthy voice. The intelligibility of thesignals has been measured with a Kaldi based ASR system. Apreference test has been implemented to evaluate the subjectivepreference of the obtained converted voices comparing themwith the original oesophageal voice. The results show that spec-tral conversion improves ASR while restoring the intonation ispreferred by human listeners	es_ES
dc.description.sponsorship	This work has been partially funded by the Spanish Ministryof Economy and Competitiveness with FEDER support (RE-STORE project, TEC2015-67163-C2-1-R), the Basque Govern-ment (BerbaOla project, KK-2018/00014) and from the Euro-pean Unions H2020 research and innovation programme un-der the Marie Curie European Training Network ENRICH(675324).	es_ES
dc.language.iso	eng	es_ES
dc.publisher	International Speech Communication Association	es_ES
dc.relation	info:eu-repo/grantAgreement/EC/H2020/675324	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/TEC2015-67163-C2-1-R	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.subject	voice conversion	es_ES
dc.subject	speech and voice disorders	es_ES
dc.subject	alaryngeal voices	es_ES
dc.subject	speech intelligibility	es_ES
dc.title	LSTM based voice conversion for laryngectomees	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.rights.holder	(c) 2018 ISCA	es_ES
dc.relation.publisherversion	https://www.isca-speech.org/archive/IberSPEECH_2018/abstracts/IberS18_O3-4_Serrano.html	es_ES
dc.identifier.doi	10.21437/IberSPEECH.2018-26
dc.contributor.funder	European Commission
dc.departamentoes	Ingeniería de comunicaciones	es_ES
dc.departamentoeu	Komunikazioen ingeniaritza	es_ES

Files in this item

Name:: IberS18_O3-4_Serrano.pdf
Size:: 278.7Kb
Format:: PDF
Description:: Texto completo

View/Open

This item appears in the following Collection(s)

Comunicaciones
OpenAire
European Commission

Show simple item record