Sintaktikoki etiketatutako euskara historikoaren corpusa eraikitzen
Ikusi/ Ireki
Data
2020Egilea
Estarrona Ibarloza, Ainara
Etxeberria Uztarroz, Izaskun
Etxepare Igiñiz, Ricardo
Padilla Moyano, Manuel
Soraluze Irureta, Ander
Fontes Linguae Vasconum 50 urte: Ekarpen berriak euskararen ikerketari : 237-252 (2020)
Laburpena
In this paper we present an ongoing project to build a morphosyntactically annotated historical corpus of Basque. The corpus will have around one million words, encompassing the most significant written production of Basque between the 15th and 18th centuries. Morphosyntactic tagging will allow for systematic searches at different levels of complexity: lemma, form, part of speech, morphosyntactic feature, and also a number of syntactic constructions. In addition, a set of metadata will enable searches based on socio-historical criteria too. Beyond being the first annotated historical corpus of Basque, through this project tools for language processing will be improved by analysing Basque historical varieties more or less distant from present-day standard Basque. Moreover, this project aims to establish a model for further works in historical corpora of Basque.