Explanatory argument extraction of correct answers in resident medical exams
View/ Open
Date
2024-11Author
Goenaga Azcarate, Iakes
Atutxa Salazar, Aitziber
Oronoz Anchordoqui, Maite
Metadata
Show full item record
Artificial Intelligence in Medicine 157 : (2024) // Article ID 102985
Abstract
Developing technology to assist medical experts in their everyday decision-making is currently a hot topic in the field of Artificial Intelligence (AI). This is specially true within the framework of Evidence-Based Medicine (EBM), where the aim is to facilitate the extraction of relevant information using natural language as a tool for mediating in human–AI interaction. In this context, AI techniques can be beneficial in finding arguments for past decisions in evolution notes or patient journeys, especially when different doctors are involved in a patient’s care. In those documents the decision-making process towards treating the patient is reported. Thus, applying Natural Language Processing (NLP) techniques has the potential to assist doctors in extracting arguments for a more comprehensive understanding of the decisions made. This work focuses on the explanatory argument identification step by setting up the task in a Question Answering (QA) scenario in which clinicians ask questions to the AI model to assist them in identifying those arguments. In order to explore the capabilities of current AI-based language models, we present a new dataset which, unlike previous work: (i) includes not only explanatory arguments for the correct hypothesis, but also arguments to reason on the incorrectness of other hypotheses; (ii) the explanations are written originally in Spanish by doctors to reason over cases from the Spanish Residency Medical Exams. Furthermore, this new benchmark allows us to set up a novel extractive task by identifying the explanation written by medical doctors that supports the correct answer within an argumentative text. An additional benefit of our approach lies in its ability to evaluate the extractive performance of language models using automatic metrics, which in the Antidote CasiMedicos dataset corresponds to a 74.47 F1 score. Comprehensive experimentation shows that our novel dataset and approach is an effective technique to help practitioners in identifying relevant evidence-based explanations for medical questions.