End to end approach for i2b2 2012 challenge based on Cross-lingual models
View/ Open
Santamaría, Edgar Andrés
Show full item recordAbstract
BACKGROUND - We propose a Cross-lingual approach to i2b2 2012 challenge for Clinical
Records focused on the temporal relations in clinical narratives. Corpus of discharge
summaries annotated with temporal information was provided for automatically
extracting : (1) clinically significant events, including both clinical concepts such as
problems, tests, treatments, and clinical departments, and events relevant to the patient’s
clinical timeline, such as admissions, transfers between departments, etc; (2) temporal
expressions, referring to the dates, times, duration, or frequencies in the clinical text. The
values of the extracted temporal expressions had to be normalized to an ISO specification
standard; and (3) temporal relations, among the clinical events and temporal expressions.
GOALS - The objectives involved in the current work consists on outperforming previous
State of the Art for the i2b2 2012 challenge and adapting Cross-lingual model into
clinical specific domain with low Data resources available.
METHODS - The task has been conceived as a pipeline of different modules, an event and
temporal expression token-classifier and a text-classifier for relation extraction, each of
them independently developed from the other. We used XLM-RoBERTa Cross-lingual
RESULTS - For event detection, the proposed token-classifier obtains a 0.91 Span F1. For
temporal expressions, our sentence-classifier achieves a 0.91 Span F1. For temporal
relation, we propose sentence classifier based on sequential-taggers that performs at 0.29
F1 measure. DESKRIBAPENA - Narratiba klinikoen domeinuan i2b2 2012 erronkarako hizkuntzarteko
ikuspegia jorratzen duen soluzioa proposatzen dugu. Erronka honek txosten medikuetan
islatzen diren gertaeren arteko denbora-erlazioak iragartzea du helburu. Horretarako, lan
hau alde batetik (1) klinikoki esanguratsuak diren gertaerak, adibidez, kontzeptu
klinikoak, probak, tratamenduak, sail klinikoak eta bestetik, (2) denbora-adierazpenak,
adibidez, txostenak esleituta duen data, denbora, iraupen edo maiztasuna adierazten
duten espresioak antzeman eta bukatzeko gertaera klinikoen eta (3)
denbora-adierazpenen arteako erlazioak anotatuta duen corpus batetik abiatzen da.
HELBURUAK - Lanaren helburuak i2b2 2012 artearen egoera hobetzea eta Cross-lingual
modeloa Data baliabide baxuak dituen domeinu kliniko espezifikora egokitzea dira.
METODOAK - Lana modulu desberdinetako hobi gisa ulertu da, gertaera eta
denbora-adierazpenetarako sekuentzia-markatzaileak, eta denbora-erlaziorako
perpaus-sailkatzailea, independenteki garatu dira. XLM-RoBERTa Cross-lingual modeloa
erabili izan da lan honetan.
EMAITZAK - Gertaerak atzemateko, 0.91 Span F1 exekutatzen duen
sekuentzia-markatzailea proposatzen dugu. Denbora-adierazpenetarako, 0.91 Span F1
egiten duen sekuentzia-markatzailea bat proposatzen dugu. Denbora-erlaziorako, 0.29 F1
neurria egiten duten sekuentzia-markatzaileetan oinarritutako perpaus-sailkatzailea
proposatzen dugu.