Information extraction from verbal autopsies
Laburpena
Civil registration and vital statistics registers births and deaths and compiles statistics. These statistics are a key factor to promote public health policies, register longevity and the health of the population. Death certificates issued in health institutions are the main source to collect the cause of death (CoD). Nevertheless, such counts are not straightforward, indeed, it is estimated that 65% of deaths in the world remain uncounted [D’Ambruoso, 2013]. In places where there is no access to health facilities and, hence, to death certificates, the World Health Organization (WHO) designed the Verbal Autopsy as an instrument to collect evidences about the CoD statistics.
A Verbal Autopsy (VA) consists of an interview to the relative or the caregiver of the deceased. The VA conveys both an open response (OR) and the closed questions (CQs). On the one hand, the OR consists of a free narrative of the events expressed in natural language and without any pre-determined structure. On the other hand, the CQs are a set of a few hundreds controlled questions each with a small number of permitted answers (e.g. yes/no).
InterVA is a suite of computer models and it is included in the WHO 2016 instrument, which gathers several algorithms chosen by the WHO for the analysis of verbal autopsies. InterVA estimates the CoD, based, merely, upon the CQs while the OR is disregarded. We hypothesize that the incorporation of the text provided by the OR might convey relevant information to discern the CoD and, accordingly, InterVA could be benefited from Natural Language Processing approaches. Empirical results corroborated that the CoD prediction capability of the InterVA algorithm is outperformed taking into account the valuable information conveyed by the OR. The experimental layout compares InterVA with other approaches well suited to the processing of structured inputs as is the case of the CQs. Next, alternative approaches based on language models are employed to analyze the OR. Finally, the best approach for each facet (CQs and OR) was combined leading to a multi-modal approach.