Availability of Voice Deepfake Technology and its Impact for Good and Evil

Amezaga Vélez, Naroa

View/Open

TFM_NaroaAmezaga.pdf (1.905Mb)

Date

2021-11-22

Author

Amezaga Vélez, Naroa

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

URI

http://hdl.handle.net/10810/53942

Abstract

Artificial Intelligence and specially Machine Learning and Deep Learning techniques are increasingly populating today’s technological and social landscape. These advancements have overwhelmingly contributed to the development of Speech Synthesis, also known as, Text-To-Speech, where speech is artificially produced from text by means of computer technology. Despite existing such a variety of speech synthesis tools and systems, there is a fundamental common drawback: unnatural, robotic and impersonal synthesized voices. That’s where Voice Cloning technology comes into play, which allows to generate an artificial synthetic speech that resembles a targeted human voice. This new practice offers many benefits in several fields, such as, healthcare, education, or advertising. However, there is a known fact that every coin has two sides. Likewise, with the development of voice cloning, the generation of fake video and voices, known as “deepfakes”, has risen, causing a loss of trust and greater fear towards technology. In this way, the objective of this project is to analyze the availability of deepfake voice technologies nowadays and its impact for good and evil. Therefore, we chose to focus on the educational field, by implementing a voice query assistant that answers to questions related to a course, like the professor’s contact information or the date of the final exam. To enhance user experience, we added the extra feature of answering with the corresponding professor’s cloned voice. Moreover, as an initial point for a forthcoming investigation, we provide a design of a qualitative research study that allows participants to test the built framework and to give their views in order to gain a better understanding of the impact that voice cloning causes on people. Since all technology related concept, voice cloning is not exempt from discussion, so we also conduct an analysis about the misuse, the regulation, and the future of it. The results of the case study show that it is possible to clone someone’s voice based on just a few seconds of reference audio, which creates a superior user experience, but at the same time, reveals how easily can anyone have access to voice cloning. This expresses very well the importance of the new challenges opened by this potential technology and the need of safeguarding and regulation. There is no doubt that to understand the dynamics and impact of voice cloning and reach more robust conclusions, future research is needed.

La inteligencia artificial y, especialmente, las técnicas de aprendizaje automático y aprendizaje profundo están cada vez más presentes en el panorama tecnológico y social actual. Estos avances han contribuido de manera abrumadora al desarrollo de la Síntesis de Voz, también conocida como Texto a Voz, donde, por medio de tecnología informática, el habla se produce de manera artificial. A pesar de haber gran variedad de herramientas y sistemas de síntesis de voz, existe un inconveniente común: voces sintetizadas antinaturales, robóticas e impersonales. Ahí es donde entra en juego la tecnología de Clonación de Voz, que permite generar un discurso artificial que se asemeja a una voz humana especifica. Esta nueva práctica ofrece muchos beneficios en campos como la salud, la educación o la publicidad. Sin embargo, con el desarrollo de la clonación de voz, ha aumentado la generación de videos y voces falsos, conocidos como “deepfakes”, provocando una pérdida de confianza y miedo hacia la tecnología. De esta forma, el objetivo de este proyecto es analizar la disponibilidad de tecnologías de clonación de voz en la actualidad y su impacto tanto positivo como negativo. Para ello, optamos por centrarnos en el campo educativo, implementando un asistente de voz que responde a preguntas relacionadas con una asignatura, como la información de contacto del profesor o la fecha del examen final y, para mejorar la experiencia del usuario, hemos añadido una función adicional: responder con la voz clonada del profesor correspondiente. Además, como punto inicial para una investigación futura, proporcionamos el diseño de un estudio cualitativo que permite a los participantes probar el sistema creado y dar su opinión, para así analizar el impacto que causa la clonación de voz en las personas. Junto a esto, y teniendo en cuenta que la clonación de voz no está exenta de discusión, también realizamos un análisis sobre el mal uso, la regulación y el futuro de la misma. En suma, los resultados del proyecto muestran que es posible clonar la voz de alguien basándose en solo unos segundos de audio de referencia, lo que crea una experiencia de usuario mejorada, pero, al mismo tiempo, revela la facilidad con la que cualquiera puede tener acceso a la clonación de voz. Esto pone de manifiesto los nuevos retos a los que nos tenemos que enfrentar y la necesidad de prevención y regulación. Aun así, no hay duda de que para comprender la dinámica y el impacto de esta tecnología y llegar a conclusiones más sólidas, son necesarias investigaciones futuras.

Adimen Artifizialak, eta bereziki Ikasketa Automatikoak eta Ikaskuntza Sakonak, gero eta garrantzi handiagoa dute gaur egungo ikuspegi teknologiko eta sozialean. Aurrerapen horiek izugarri lagundu dute hizketa-sintesiaren garapenean, non, ordenagailu baten bitartez, ahotsa artifizialki sortzen den. Gaur egun hizketa sintesirako sistema ugari egon arren, gehienek eragozpen komun bat dute: ahots ez natural, robotiko eta inpertsonala. Horren konponbide gisa, ahots-klonazioa erabiltzen da, zeinari esker, norbaiten ahotsaren antza duen hizketa sintetiko artifiziala sortzen den. Teknologia berri honek hainbat abantaila eskaintzen ditu zenbait arlotan, hala nola, osasunean, hezkuntzan edota publizitatean. Hala ere, ahots-klonazioaren garapenarekin batera, "deepfake" izenarekin ezagutzen diren bideo eta ahots faltsuen sorrerak gora egin du, teknologiarekiko konfiantza galduz eta beldurra areagotuz. Era horretan, proiektu honen helburua gaur egungo ahots-klonazioaren erabilgarritasuna eta eragin positibo eta negatiboak aztertzea da. Horretarako, hezkuntza eremuan zentratu gara, irakasgai batekin lotutako galderei (irakaslearen kontaktu informazioa edo amaierako azterketaren data adibidez) erantzuteko gai den ahots-laguntzailea garatuz. Horrez gain, erabiltzailearen esperientzia hobetzeko, erantzuna dagokion irakaslearen ahots klonatuarekin erreproduzituko da. Gainera, etorkizuneko ikerlan posible bati begira, ikerketa kualitatibo baten diseinua eskaintzen da, non parte-hartzaileek garatutako sistema probatu eta beraien iritzia emango duten, ahots-klonazioak pertsonengan duen eragina hobeto ulertzeko. Bestalde, ahots-klonazioa, teknologiarekin loturiko kontzeptu guztiak bezala, ez dago eztabaidatik salbu, eta beraz, honen erabilera okerra, erregulazioa eta etorkizuna aztertuko dira. Proiektu honen bidez norbaiten ahotsa klonatzea posible dela ondoriozta daiteke, erreferentzia gisa jatorrizko ahotsaren segundo gutxi batzuk oinarri hartuta. Horrek, aurreratu bezala, erabiltzaileen esperientzia hobetzen du, baina aldi berean, agerian uzten du ahots-klonazioa edonoren eskura dagoela. Beraz, teknologia honek sortutako erronka berriei aurre egiteko, prebentzio eta erregulazioaren beharra nahitaezkoa da. Argi dago honen dinamika hobeto ulertzeko eta ondorio sendoagoetara heltzeko etorkizunean ikerketa gehiago behar direla.

Collections

Máster Universitario en Ingeniería de Telecomunicación

Except where otherwise noted, this item's license is described as Atribución-NoComercial-CompartirIgual (cc by-nc-sa)