Dealing with dialectal variation in the construction of the Basque historical corpus
View/ Open
Date
2020-12Author
Estarrona Ibarloza, Ainara
Etxeberria Uztarroz, Izaskun
Etxepare Igiñiz, Ricardo
Padilla Moyano, Manuel
Soraluze Irureta, Ander
Metadata
Show full item record
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects : 79-89 (2020)
Abstract
This paper analyses the challenge of working with dialectal variation when semi-automatically normalising and analysing historical Basque texts. This work is part of a more general ongoing project for the construction of a morphosyntactically annotated historical corpus of Basque called Basque in the Making (BIM): A Historical Look at a European Language Isolate, whose main objective is the systematic and diachronic study of a number of grammatical features. This will be not only the first tagged corpus of historical Basque, but also a means to improve language processing tools by analysing historical Basque varieties more or less distant from present-day standard Basque.