Elaboration of a RST Chinese Treebank

Cao, Shuyuan

dc.contributor.advisor	Iruskieta Quintian, Mikel
dc.contributor.advisor	Da Cunha, Iria
dc.contributor.author	Cao, Shuyuan
dc.date.accessioned	2018-04-11T07:09:04Z
dc.date.available	2018-04-11T07:09:04Z
dc.date.issued	2018-03-20
dc.identifier.uri	http://hdl.handle.net/10810/26206
dc.description.abstract	[EN] As a subfield of Artificial Intelligence (AI), Natural Language Processing (NLP) aims to automatically process human languages. Fruitful achievements of variant studies from different research fields for NLP exist. Among these research fields, discourse analysis is becoming more and more popular. Discourse information is crucial for NLP studies. As the most spoken language in the world, Chinese occupy a very important position in NLP analysis. Therefore, this work aims to present a discourse treebank for Chinese, whose theoretical framework is Rhetorical Structure Theory (RST) (Mann and Thompson, 1988). In this work, 50 Chinese texts form the research corpus and the corpus can be consulted from the following aspects: segmentation, central unit (CU) and discourse structure. Finally, we create an open online interface for the Chinese treebank.	es_ES
dc.description.abstract	[EU] Adimen Artifizialaren (AA) barneko arlo bat izanez, Hizkuntzaren Prozesamenduak (HP) giza-hizkuntzak automatikoko prozesatzea du helburu. Arlo horretako ikasketa anitzetan lorpen emankor asko eman dira. Ikasketa-arlo ezberdin horien artean, diskurtso-analisia gero eta ezagunagoa da. Diskurtsoko inforamzioa interes handikoa da HPko ikasketetan. Munduko hiztun gehien duen hizkuntza izanda, txinera aztertzea oso garrantzitsua da HPan egiten ari diren ikasketetarako. Hori dela eta, lan honek txinerako diskurtso-egituraz etiketaturiko zuhaitz-banku bat aurkeztea du helburu, Egitura Erretorikoaren Teoria (EET) (Mann eta Thompson, 1988) oinarrituta. Lan honetan, ikerketa-corpusa 50 testu txinatarrez osatu da, ea zuhaitz-bankua hiru etiketatze-mailatan aurkeztuko da: segmentazioa, unitate zentrala (UZ) eta diskurtso-egitura. Azkenik, corpusa webgune batean argitaratu da zuhaitz-bankua kontsultatzeko.
dc.language.iso	eng	es_ES
dc.rights	info:eu-repo/semantics/openAccess	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/es/	*
dc.subject	NLP	es_ES
dc.subject	discourse analysis	es_ES
dc.subject	RST	es_ES
dc.subject	Chinese	es_ES
dc.subject	corpus	es_ES
dc.subject	HP
dc.subject	diskurtso-analisia
dc.subject	EET
dc.subject	txinera
dc.subject	corpusa
dc.title	Elaboration of a RST Chinese Treebank	es_ES
dc.type	info:eu-repo/semantics/masterThesis	es_ES
dc.rights.holder	Atribución-NoComercial-CompartirIgual 3.0 España	es_ES

Files in this item

Name:: license_rdf
Size:: 1.012Kb
Format:: application/rdf+xml

View/Open

Name:: TFM_Shuyuan Cao.pdf
Size:: 1.783Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Máster Universitario en Análisis y Procesamiento del Lenguaje

Show simple item record

Except where otherwise noted, this item's license is described as Atribución-NoComercial-CompartirIgual 3.0 España