Show simple item record

dc.contributor.advisorAgerri Gascón, Rodrigo ORCID
dc.contributor.advisorRigau Claramunt, Germán ORCID
dc.contributor.authorChung, Yi-Ling
dc.date.accessioned2017-10-18T06:52:58Z
dc.date.available2017-10-18T06:52:58Z
dc.date.issued2017-09-27
dc.date.submitted2017-09-26
dc.identifier.urihttp://hdl.handle.net/10810/23026
dc.description.abstractThe lack of hand curated data is a major impediment to developing statistical semantic processors for many of the world languages. A major issue of semantic processors in Nat- ural Language Processing (NLP) is that they require manually annotated data to perform accurately. Our work aims to address this issue by leveraging existing annotations and semantic processors from multiple source languages by projecting their annotations via statistical word alignments traditionally used in Machine Translation. Taking the Named Entity Recognition (NER) task as a use case of semantic processing, this work presents a method to automatically induce Named Entity taggers using parallel data, without any manual intervention. Our method leverages existing semantic processors and annotations to overcome the lack of annotation data for a given language. The intuition is to transfer or project semantic annotations, from multiple sources to a target language, by statistical word alignment methods applied to parallel texts (Och and Ney, 2000; Liang et al., 2006). The projected annotations can then be used to automatically generate semantic processors for the target language. In this way we would be able to provide NLP processors with- out training data for the target language. The experiments are focused on 4 languages: German, English, Spanish and Italian, and our empirical evaluation results show that our method obtains competitive results when compared with models trained on gold-standard out-of-domain data. This shows that our projection algorithm is effective to transport NER annotations across languages via parallel data thus providing a fully automatic method to obtain NER taggers for as many as the number of languages aligned via parallel corpora.es_ES
dc.language.isoenges_ES
dc.rightsinfo:eu-repo/semantics/openAccesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/*
dc.subjectnamed entity recognitiones_ES
dc.subjectcross-lingual projectiones_ES
dc.subjectnatural language processinges_ES
dc.titleAutomatic generation of named entity taggers leveraging parallel corporaes_ES
dc.typeinfo:eu-repo/semantics/masterThesises_ES
dc.rights.holderAtribución-NoComercial-SinDerivadas 3.0 España*
dc.departamentoesLenguajes y sistemas informáticoses_ES
dc.departamentoeuHizkuntza eta sistema informatikoakes_ES


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Atribución-NoComercial-SinDerivadas 3.0 España
Except where otherwise noted, this item's license is described as Atribución-NoComercial-SinDerivadas 3.0 España