Synthetic speech detection using phase information

Saratxaga Couceiro, Ibon; Sánchez de la Fuente, Jon; Wu, Zhizheng; Hernáez Rioja, Inmaculada; Navas Cordón, Eva

View/Open

Preprint (771.2Kb)

Date

2016-04-16

Author

Saratxaga Couceiro, Ibon

Sánchez de la Fuente, Jon

Wu, Zhizheng

Hernáez Rioja, Inmaculada

Navas Cordón, Eva

Metadata

Show full item record

Speech Communication 81 : 30–41 (2016)

URI

http://hdl.handle.net/10810/23565

Abstract

Taking advantage of the fact that most of the speech processing techniques neglect the phase information, we seek to detect phase perturbations in order to prevent synthetic impostors attacking Speaker Verification systems. Two Synthetic Speech Detection (SSD) systems that use spectral phase related information are reviewed and evaluated in this work: one based on the Modified Group Delay (MGD), and the other based on the Relative Phase Shift, (RPS). A classical module-based MFCC system is also used as baseline. Different training strategies are proposed and evaluated using both real spoofing samples and copy-synthesized signals from the natural ones, aiming to alleviate the issue of getting real data to train the systems. The recently published ASVSpoof2015 database is used for training and evaluation. Performance with completely unrelated data is also checked using synthetic speech from the Blizzard Challenge as evaluation material. The results prove that phase information can be successfully used for the SSD task even with unknown attacks.

Collections

Comunicaciones

Estadisticas RECOLECTA - LA Referencia