Show simple item record

dc.contributor.authorOdriozola Sustaeta, Igor
dc.contributor.authorHernáez Rioja, Inmaculada ORCID
dc.contributor.authorNavas Cordón, Eva ORCID
dc.date.accessioned2018-07-17T06:45:14Z
dc.date.available2018-07-17T06:45:14Z
dc.date.issued2018-05-31
dc.identifier.citationExpert Systems with Applications 110 : 52–61 (2018)es_ES
dc.identifier.issn0957-4174
dc.identifier.urihttp://hdl.handle.net/10810/28103
dc.descriptionPreprint del artículo públicado online el 31 de mayo 2018es_ES
dc.description.abstractVoice activity detection (VAD) is an essential task in expert systems that rely on oral interfaces. The VAD module detects the presence of human speech and separates speech segments from silences and non-speech noises. The most popular current on-line VAD systems are based on adaptive parameters which seek to cope with varying channel and noise conditions. The main disadvantages of this approach are the need for some initialisation time to properly adjust the parameters to the incoming signal and uncertain performance in the case of poor estimation of the initial parameters. In this paper we propose a novel on-line VAD based only on previous training which does not introduce any delay. The technique is based on a strategy that we have called Multi-Normalisation Scoring (MNS). It consists of obtaining a vector of multiple observation likelihood scores from normalised mel-cepstral coefficients previously computed from different databases. A classifier is then used to label the incoming observation likelihood vector. Encouraging results have been obtained with a Multi-Layer Perceptron (MLP). This technique can generalise for unseen noise levels and types. A validation experiment with two current standard ITU-T VAD algorithms demonstrates the good performance of the method. Indeed, lower classification error rates are obtained for non-speech frames, while results for speech frames are similar.es_ES
dc.description.sponsorshipThis work was partially supported by the EU (ERDF) under grant TEC2015-67163-C2-1-R (RESTORE) (MINECO/ERDF, EU) and by the Basque Government under grant KK-2017/00043 (BerbaOla).es_ES
dc.language.isoenges_ES
dc.publisherElsevier Ltd.es_ES
dc.relationinfo:eu-repo/grantAgreement/MINECO/TEC2015-67163-C2-1-Res_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.subjectVADes_ES
dc.subjectobservation likelihoodes_ES
dc.subjectMNSes_ES
dc.subjecton-line speech processinges_ES
dc.titleAn on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoodses_ES
dc.typeinfo:eu-repo/semantics/preprintes_ES
dc.rights.holder© 2018 Elsevier Ltd. All rights reservedes_ES
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0957417418303373es_ES
dc.identifier.doi10.1016/j.eswa.2018.05.038
dc.departamentoesIngeniería de comunicacioneses_ES
dc.departamentoeuKomunikazioen ingeniaritzaes_ES


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record