Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification
dc.contributor.author | Iñurrieta Urmeneta, Usoa | |
dc.contributor.author | Aduriz, Itziar | |
dc.contributor.author | Díaz de Ilarraza Sánchez, Arantza | |
dc.contributor.author | Labaka Intxauspe, Gorka | |
dc.contributor.author | Sarasola Gabiola, Kepa Mirena | |
dc.date.accessioned | 2021-01-21T13:27:30Z | |
dc.date.available | 2021-01-21T13:27:30Z | |
dc.date.issued | 2020-08-27 | |
dc.identifier.citation | Plos One 15(8) : (2019) // Article ID e0237767 | es_ES |
dc.identifier.issn | 1932-6203 | |
dc.identifier.uri | http://hdl.handle.net/10810/49828 | |
dc.description.abstract | Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work. | es_ES |
dc.description.sponsorship | This work was funded by the Basque Government, who qualified the IXA research group (of which the authors of this article are members) as an A type research group (IT1343-19). It is also part of the project entitled "MODENA: advanced neural modeling for high-quality translation" (KK-2018/00087). | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Public Library Science | es_ES |
dc.rights | info:eu-repo/semantics/openAccess | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
dc.title | Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.holder | 2020 Inurrieta et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. | es_ES |
dc.rights.holder | Atribución 3.0 España | * |
dc.relation.publisherversion | https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0237767 | es_ES |
dc.identifier.doi | 10.1371/journal.pone.0237767 | |
dc.departamentoes | Lenguajes y sistemas informáticos | es_ES |
dc.departamentoeu | Hizkuntza eta sistema informatikoak | es_ES |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as 2020 Inurrieta et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.