Abstract
Healthcare systems are under pressure from an aging population, rising costs, and increasingly complex conditions and treatments. Although data are determined to play a bigger role in how doctors diagnose and prescribe treatments, they struggle due to a lack of time and an abundance of structured and unstructured information. To address this challenge, we introduce MediCoSpace, a visual decision-support tool for more efficient doctor-patient consultations. The tool links patient reports to past and present diagnoses, diseases, drugs, and treatments, both for the current patient and other patients in comparable situations. MediCoSpace uses textual medical data, deep-learning supported text analysis and concept spaces to facilitate a visual discovery process. The tool is evaluated by five medical doctors. The results show that MediCoSpace facilitates a promising, yet complex way to discover unlikely relations and thus suggests a path toward the development of interactive visual tools to provide physicians with more holistic diagnoses and personalized, dynamic treatments for patients.
1 Introduction
With an aging population and rising healthcare costs, the pressure on healthcare systems is increasing [19]. This pressure is especially felt by doctors and nurses. While not a universal cure, information systems promise to make a significant impact to help doctors with documentation, information retrieval, and decision support. One example is the introduction of the electronic health record (EHR) system, where EHRs contain medical narratives. These are textual notes about the patient’s condition and progress, to which doctors mainly contribute and on which diagnoses heavily rely [33]. However, analyzing the information contained in these notes is hard because of the complexity of medical issues, the different formats of the EHR systems, and the usability of hospital systems displaying these notes. These systems display textual notes as lengthy lists of narrative text that require extensive scrolling, which leads to information overload and consequently narrative fragmentation [58]. These lengthy lists are not surprising because, on average, individual notes contain 642 words [51] and patients have hundreds of them. Chronically ill patients often have the most notes, e.g., a patient with chronic kidney disease in the U.S. has on average 338 notes [49]. Physicians in the U.S. spend on average 5 [47] to 9 [59] minutes to review the patient information in the EHR per patient encounter (in total 1.5 hours per day [59]). However, this slightly differs per specialty, e.g., endocrinologists spend the most time reviewing EHRs (33% more) and cardiologists the least (44% less) [59].
Especially, diagnosing patients with non-trivial conditions is a tedious task, which could take up to 4.8 years [16] and generate many notes. It is not surprising that physicians misdiagnose approximately 5% [18] to 15% [45] of their patients, ranging from 5% in radiology to 12% in emergency medicine [45]. These errors can have serious consequences regarding the patient’s chances of health and treatment success, medical costs (testing for diagnostic purposes accounts for approximately 10% of the healthcare costs in the U.S. [45]), and the doctor’s time (0.1% of hospital visits and 0.4% of hospital admission in the U.S. are results of diagnostic error-associated adverse events [45]).
Visual analytic tools could aid in getting a more holistic overview of the patient, especially, for the doctor’s decision-making process for diagnosing and devising treatment plans. For example, Sultanum et al. [56] redesign the structure of the EHR notes, by linking notes with similar medical concepts together, to find the patient narrative. However, there are limits to how much information can be assessed in relation: the relationships between diseases (including diagnoses and symptoms), drugs, and treatments. Likewise, we observe lacking tool support for linking potentially discovered relationships back to textual notes and proposing interesting parts of the patient history compared to similar patients.
Our contributions are threefold: (1) Insights into decision-support based on EHRs and problem characteristics of analyzing EHRs for diagnosing patients based on interviews with doctors. (2) A novel visual analytics decision-support tool, MediCoSpace, for augmenting doctor-patient consultations to give doctors in hospitals and general practitioners a data-driven overview of possible relations between diseases, drugs, and treatments, both historically and present, and related to similar patients. (3) Results from user expert evaluations that show how MediCoSpace could broaden the doctors’ solution space, offer new areas of interest, reduce personal biases and stimulate communication between different medical specialties. In the following sections, we introduce related work leading to the problem characterization and requirements. We then explain the data processing pipelines and describe the visualization features of the tool, which are evaluated by medical experts. Finally, we discuss the findings and conclude with future research opportunities.
2 Related Work
This section focuses on previous work on the processing and analysis of EHRs.
2.1 Text Processing and Semantic Concept Extraction from EHRs
Since EHR notes consist of free text, text analysis methods, including approaches that incorporate knowledge graphs or apply language models, can be used to extract the essential concept information. Linking data to an ontology is a common practice. Li et al. [39] present a design framework for named behavioral ontology learning from text. The framework describes linguistic and statistical approaches to address tasks such as variable and synonymous relationship extraction. Knowledge graphs [5, 31, 52] provide insights into (hierarchical) relations and the structure of medical concepts in relation to medical ontology knowledge. For instance, Li et al. [40] present a visual analytics approach by linking medical event sequences to a subgraph in a medical knowledge graph using a domain-knowledge–guided recurrent neural network (DG-RNN) model. Such approaches are effective, yet limited to the information stored in the particular knowledge graph in use.
Deep-learning-based language models (e.g., BERT [11]) have reached high performance in diverse natural language processing tasks. These models are pre-trained on large corpora, learning language structures in an unsupervised manner. Furthermore, domain or task-specific fine-tuning, i.e., adapting the pre-trained weights according to language characteristics of a specific domain (also known as additional pre-training) or downstream task [12], is also commonly used in the medical domain. There are several medical domain-specific adaptations of BERT, such as BioBERT [38] and PubMedBert [23] (both additionally pre-trained models on large-scale biomedical corpora), and ClinicalBERT [29] and clinical-kb-bert [25] (both additionally pre-trained on the MIMIC-III [35] dataset to capture patient-record related information and clinical-kb-bert is also pre-trained on UMLS [7] ontology knowledge). Neural language models can be used for different analysis purposes and downstream tasks. First, we can fine-tune them for the named entity recognition task. For instance, Sun et al. [57] fine-tune BioBERT on a machine reading comprehension task that allows it to predict named entity (chemicals, diseases, and proteins) occurrences. Second, we can use them to generate contextualized embedding representations (e.g., on word, sentence, or even document level). To understand named-entity similarity, we can thus use a medical domain-adapted language model to compute their embedding representations and apply a similarity function to determine their similarity. Since this is a very general approach and is not restricted to specific named-entity categories, we apply it in our work. Also, Loureiro et al. [41] use a language model for a medical entity linking task with the MedMentions [44] dataset, whereby the embedding similarity is one step (in addition to entity classification) in their processing pipeline to link entities to an ontology.
2.2 Physician-centric Visual Analysis of EHRs
Doctors use information systems to access and extend EHRs. Currently, Epic [14] is one of the most common commercial EHR systems, which, according to doctors, still suffers from problems, see the top of Figure 1. In general, in the research community, interactive EHR visualizations are most often visualized using bar/line/pie charts, glyphs, and timelines [60]. For example, LifeLines was one of the first tools to visualize textual notes from EHRs as events on a timeline [17]. Researchers have used this timeline structure to visualize EHRs of individual patients abundantly [6, 10, 24, 26, 28, 42, 56, 58] to, for example, display cause and effect [48] or disease progression [50]/risk [40] prediction. Also, Sultanum et al. [56] researched the importance of visualizing text for assisting doctors. Moreover, van der Linden et al. [58] visualized EHRs in a multiscale way to find the fragmented narratives based on the different tasks of the doctor.
Fig. 1.
Furthermore, stepping away from individual patients, many researchers have visualized patient cohorts as flow visualizations [21, 22, 30, 34, 36, 64, 65] for disease progression, which have limitations in identifying relations. Therefore, Jin et al. [34] visualized causal relations between medical events and two groups. Furthermore, many researchers have used text [20] or basic plots (e.g., line plots) [15] to display summary statistics and heatmaps [30] for visualizing research around the diagnosis process using medical concepts. For example, Hur et al. [30] focused on diagnosis predictions, for which they used different heatmaps (one for the entire cohort, one for the patient, or one to show the difference between them) to show the weights of the medical concepts used in their model.
While these tools make important steps, the diagnosis and creation of non-trivial treatment plans are more difficult than more trivial ones. To our knowledge, no medical decision-support system addresses relationships between diseases, drugs, and treatments combined with advanced search support within the patient’s history and across similar patients, and links this back to patient reports to discover and leverage possibly overlooked relations.
3 Problem Characterization
In this section, we describe the first steps of our user-centered design (UCD) process [63].
3.1 Physician’s Workflow
By interviewing a cardiologist (D1), a general practitioner (D2), a medical student (D3), and two medical doctors in internal medicine (D4) and cardiology (D5) about their workflows and comparing them to the processes from Balogh et al. [4] and Adler–Milstein et al. [1], we identified the following general workflow for diagnosing and making treatment plans for non-trivial conditions for doctors of all specialties and experience levels, see Figure 1. Accordingly, the interviewed doctors were from different specialties and experience levels. First, the doctor looks up patient appointment details. The patient already went through the experiencing health problems and engaged with the healthcare system stages from Balogh et al. [4]. Second, the doctor reviews the EHR for the medical history and the current disease(s) as preparation (related to the information integration and interpretation stage [1, 4]). Third, the doctor speaks with the patient and might conduct physical tests (related to the information gathering stage [1, 4]). Fourth, the doctor reviews the EHR in more detail to find previous and present diagnoses, issues and physiology, and how the patient appears to progress. Based on this, next steps (related to the formulation of next steps [1]) could be conversations with colleagues and possibly diagnostic testing (related to the information gathering stage [1, 4]). Also, the doctor matches the symptoms to the most probable diseases to form a working diagnosis (related to working/leading diagnosis stage [1, 4]), after which they research the best treatment option online and communicate this to the patient. This is often an iterative process possibly with multiple cycles based on certain outcomes. These final steps also correspond to the final stages of Balogh et al.’s [4] and Adler–Milstein et al.’s [1] processes. We noticed that it differs per specialty how much information the doctors require from the EHR. For example, internal medicine requires a deep dive into the EHR because patients often have vague symptoms, while a cardiologist often needs less information because medical imaging often indicates the main problems directly.
Also, doctors indicated that it is hard to find the correct disease based on ambiguous symptoms. The occurrence frequency of a disease also needs to be taken into account, as well as the patient’s lifestyle context; and sometimes it is hard to mentally let go of an initial diagnosis. Visual analytics can assist the doctor (in stages two and four of our workflow) in finding relations between symptoms, diseases, treatments, and drugs to get a more holistic overview of the patient to guide the doctor in information gathering, finding working diagnoses, and possible treatments.
3.2 Tool Requirements
In designing MediCoSpace, we focus on the fourth workflow step, the in-depth EHR review. From this, we derive the following requirements based on a thematic analysis [8] of the interviews:
R1:
Ability to see relations between diseases, drugs, and treatment of past and present for the current patient.
R2:
Ability to see when certain concepts or co-occurrences are mentioned.
R3:
Ability to find similar diseases, drugs, or treatments based on a current disease, drug, or treatment.
R4:
Ability to compare the patient’s relations to similar patients to see similarities and differences.
R5:
Ability to link the relations back to the original textual notes of this patient.
R6:
Ability to save interesting findings.
4 Data Processing and Feature Extraction Pipeline
This section describes our data sources and processing pipelines, see Figure 2, to show relations between diseases, drugs, and treatments concepts (i.e., medical entities). In the remainder of this article, we use the following mini case: the task involves generating a diagnosis for a patient with non-straightforward conditions (mainly cardiovascular diagnoses and a vague current diagnosis; weakness). We want to see if MediCoSpace can help the doctor with diagnosing this more clearly. We picked a patient with mainly cardiovascular diagnoses because of the time restrictions of the evaluation (cardiologists require less of an in-depth analysis).