Testu arteko koherentziazko erlazio-egitura: lehen urratsak euskaraz

Rodríguez Castrillo, Alazne

View/Open

Bukaerako_proiektua_AlazneRodriguez.pdf (2.371Mb)

Date

2016-07-15

Author

Rodríguez Castrillo, Alazne

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

URI

http://hdl.handle.net/10810/18634

Abstract

[EU]Testu bat koherente egiten duten arrazoiak ulertzea oso baliagarria da testuaren beraren ulermenerako, koherentzia eta koherentzia-erlazioak testu bat edo gehiago koherente diren ondorioztatzen laguntzen baitigu. Lan honetan gai bera duten testu ezberdinen arteko koherentziazko 3 Cross Document Structure Theory edo CST (Radev, 2000) erlazio aztertu eta sailkatu dira. Hori egin ahal izateko, euskaraz idatziriko gai berari buruzko testuak segmentatzeko eta beraien arteko erlazioak etiketatzeko gidalerroak proposatzen dira. 10 testuz osaturiko corpusa etiketatu da; horietako 3 cluster bi etiketatzailek aztertu dute. Etiketatzaileen arteko adostasunaren berri ematen dugu. Koherentzia-erlazioak garatzea oso garrantzitsua da Hizkuntzaren Prozesamenduko hainbat sistementzat, hala nola, informazioa erauzteko sistementzat, itzulpen automatikoarentzat, galde-erantzun sistementzat eta laburpen automatikoarentzat. Etorkizunean CSTko erlazio guztiak corpus esanguratsuan aztertuko balira, testuen arteko koherentzia- erlazioak euskarazko testuen prozesaketa automatikoa bideratzeko lehenengo pausua litzateke hemen egindakoa.

[EN]It is of utmost importance to comprehend what makes a text coherent to completely understand it, since coherence and coherence-relations will tell whether a text or a set of texts are coherence or not. The development of coherence relations is very important for natural language processing's tools, such as information retrieval systems, automatic translation, query-based answer systems and automatic summarization. In order to provide these tools with the required coherence relations, we will analyse and classify 3 coherence relations from Cross-Document Structure (CST) (Radev, 2000) within this project. In order to achieve our purpose, we propose a segmentation and annotation guideline of texts with the same topic. In the foreseeable future all CST relations will be analysed in a representative corpus, therefore this project will focus on setting the first steps towards Basque automatic coherence relations in cross-document texts.

Collections

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International