UPV-EHU ADDI
  • Volver
    • English
    • Español
    • Euskera
  • Login
  • Español 
    • English
    • Español
    • Euskera
  • FAQ
Ver ítem 
  •   Inicio
  • INVESTIGACIÓN
  • Artículos, Comunicaciones, Libros
  • Artículos
  • Ver ítem
  •   Inicio
  • INVESTIGACIÓN
  • Artículos, Comunicaciones, Libros
  • Artículos
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods

Thumbnail
Ver/
Preprint (1.705Mb)
Fecha
2018-05-31
Autor
Odriozola Sustaeta, Igor
Hernaez Rioja, Inmaculada Concepción
Navas Cordón, Eva
Metadatos
Mostrar el registro completo del ítem
Expert Systems with Applications 110 : 52–61 (2018)
URI
http://hdl.handle.net/10810/28103
Resumen
Voice activity detection (VAD) is an essential task in expert systems that rely on oral interfaces. The VAD module detects the presence of human speech and separates speech segments from silences and non-speech noises. The most popular current on-line VAD systems are based on adaptive parameters which seek to cope with varying channel and noise conditions. The main disadvantages of this approach are the need for some initialisation time to properly adjust the parameters to the incoming signal and uncertain performance in the case of poor estimation of the initial parameters. In this paper we propose a novel on-line VAD based only on previous training which does not introduce any delay. The technique is based on a strategy that we have called Multi-Normalisation Scoring (MNS). It consists of obtaining a vector of multiple observation likelihood scores from normalised mel-cepstral coefficients previously computed from different databases. A classifier is then used to label the incoming observation likelihood vector. Encouraging results have been obtained with a Multi-Layer Perceptron (MLP). This technique can generalise for unseen noise levels and types. A validation experiment with two current standard ITU-T VAD algorithms demonstrates the good performance of the method. Indeed, lower classification error rates are obtained for non-speech frames, while results for speech frames are similar.
Colecciones
  • Artículos

DSpace software copyright © 2002-2015  DuraSpace
OpenAIRE
OpenAIRE
 

 

Listar

Todo DSpaceComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosDepartamentos (cas.)Departamentos (eus.)MateriasEsta colecciónPor fecha de publicaciónAutoresTítulosDepartamentos (cas.)Departamentos (eus.)Materias

Mi cuenta

Acceder

Estadísticas

Ver Estadísticas de uso

DSpace software copyright © 2002-2015  DuraSpace
OpenAIRE
OpenAIRE