Interpretable deep learning to map diagnostic texts to ICD-10 codes

Atutxa Salazar, Aitziber; Díaz de Ilarraza Sánchez, Arantza; Gojenola Galletebeitia, Koldobika; Oronoz Anchordoqui, Maite; Pérez de Viñaspre Garralda, Olatz

View/Open

Postprint (647.8Kb)

Date

2019-05-22

Author

Atutxa Salazar, Aitziber

Díaz de Ilarraza Sánchez, Arantza

Gojenola Galletebeitia, Koldobika

Oronoz Anchordoqui, Maite

Pérez de Viñaspre Garralda, Olatz

Metadata

Show full item record

Estadisticas en RECOLECTA
(LA Referencia)

International Journal of Medical Informatics 129 : 49-59 (2019)

URI

http://hdl.handle.net/10810/70252

Abstract

Background Automatic extraction of morbid disease or conditions contained in Death Certificates is a critical process, useful for billing, epidemiological studies and comparison across countries. The fact that these clinical documents are written in regular natural language makes the automatic coding process difficult because, often, spontaneous terms diverge strongly from standard reference terminology such as the International Classification of Diseases (ICD). Objective Our aim is to propose a general and multilingual approach to render Diagnostic Terms into the standard framework provided by the ICD. We have evaluated our proposal on a set of clinical texts written in French, Hungarian and Italian. Methods ICD-10 encoding is a multi-class classification problem with an extensive (thousands) number of classes. After considering several approaches, we tackle our objective as a sequence-to-sequence task. According to current trends, we opted to use neural networks. We tested different types of neural architectures on three datasets in which Diagnostic Terms (DTs) have their ICD-10 codes associated. Results and conclusions Our results give a new state-of-the art on multilingual ICD-10 coding, outperforming several alternative approaches, and showing the feasibility of automatic ICD-10 prediction obtaining an F-measure of 0.838, 0.963 and 0.952 for French, Hungarian and Italian, respectively. Additionally, the results are interpretable, providing experts with supporting evidence when confronted with coding decisions, as the model is able to show the alignments between the original text and each output code.

Collections

Artículos