An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods

Odriozola Sustaeta, Igor; Hernáez Rioja, Inmaculada; Navas Cordón, Eva

dc.contributor.author	Odriozola Sustaeta, Igor
dc.contributor.author	Hernáez Rioja, Inmaculada
dc.contributor.author	Navas Cordón, Eva
dc.date.accessioned	2018-07-17T06:45:14Z
dc.date.available	2018-07-17T06:45:14Z
dc.date.issued	2018-05-31
dc.identifier.citation	Expert Systems with Applications 110 : 52–61 (2018)	es_ES
dc.identifier.issn	0957-4174
dc.identifier.uri	http://hdl.handle.net/10810/28103
dc.description	Preprint del artículo públicado online el 31 de mayo 2018	es_ES
dc.description.abstract	Voice activity detection (VAD) is an essential task in expert systems that rely on oral interfaces. The VAD module detects the presence of human speech and separates speech segments from silences and non-speech noises. The most popular current on-line VAD systems are based on adaptive parameters which seek to cope with varying channel and noise conditions. The main disadvantages of this approach are the need for some initialisation time to properly adjust the parameters to the incoming signal and uncertain performance in the case of poor estimation of the initial parameters. In this paper we propose a novel on-line VAD based only on previous training which does not introduce any delay. The technique is based on a strategy that we have called Multi-Normalisation Scoring (MNS). It consists of obtaining a vector of multiple observation likelihood scores from normalised mel-cepstral coefficients previously computed from different databases. A classifier is then used to label the incoming observation likelihood vector. Encouraging results have been obtained with a Multi-Layer Perceptron (MLP). This technique can generalise for unseen noise levels and types. A validation experiment with two current standard ITU-T VAD algorithms demonstrates the good performance of the method. Indeed, lower classification error rates are obtained for non-speech frames, while results for speech frames are similar.	es_ES
dc.description.sponsorship	This work was partially supported by the EU (ERDF) under grant TEC2015-67163-C2-1-R (RESTORE) (MINECO/ERDF, EU) and by the Basque Government under grant KK-2017/00043 (BerbaOla).	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier Ltd.	es_ES
dc.relation	info:eu-repo/grantAgreement/MINECO/TEC2015-67163-C2-1-R	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.subject	VAD	es_ES
dc.subject	observation likelihood	es_ES
dc.subject	MNS	es_ES
dc.subject	on-line speech processing	es_ES
dc.title	An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods	es_ES
dc.type	info:eu-repo/semantics/preprint	es_ES
dc.rights.holder	© 2018 Elsevier Ltd. All rights reserved	es_ES
dc.relation.publisherversion	https://www.sciencedirect.com/science/article/pii/S0957417418303373	es_ES
dc.identifier.doi	10.1016/j.eswa.2018.05.038
dc.departamentoes	Ingeniería de comunicaciones	es_ES
dc.departamentoeu	Komunikazioen ingeniaritza	es_ES

Files in this item

Name:: preprint.pdf
Size:: 1.705Mb
Format:: PDF
Description:: Preprint

View/Open

This item appears in the following Collection(s)

Artículos

Show simple item record