On its Mission to Capture Unstructured EHR Data, Mercy Leaders Realize the Value of NLP

One of the core issues that clinicians and others have had with EHRs (electronic health records) is that so much of the relevant medical information that’s vital to improving patient care is “trapped” within EHR physician notes. As such, patient care organizations are more and more turning to natural language processing (NLP)—a technology that allows providers to gather and analyze unstructured data, such as free-text notes.

One such health system that has worked to leverage NLP to exact and transform data collected during routine clinical care is the St. Louis-based Mercy, one of the country’s largest health systems that includes more than 40 acute care and specialty hospitals, and 800 physician practices and outpatient facilities. In Mercy’s submission for the 2018 Healthcare Informatics Innovator Awards Program, organizational leaders outlined the key details behind the health system’s project, called “Using NLP on EHR notes in Heart Failure Patients.” The submission ended up receiving semifinalist status in this year’s program.

The overarching goal of this initiative, says Kerry Bommarito, manager of data science at Mercy, was to use NLP to extract key cardiology measures from physician and other clinical notes, and then incorporate the results into a dataset with discrete data fields. The dataset would then be used to obtain actionable information and contribute to the evaluation of outcomes of medical devices in heart failure patients—a subset population of which there have been approximately 100,000 patients in the Mercy system going back to 2011.

Bommarito explains that three core measures that are commonly stored in clinical notes and not available in discrete fields include ejection fraction measurement, patient symptoms including dyspnea (breathing difficulty), fatigue and dizziness, and the New York Heart Association (NYHA) heart failure classification—the latter which places patients in one of four categories based on how limited they are during physical activity.

“To be able to best classify how severe the CHF [congestive heart failure] was, we really needed to get these measures out of the physician notes,” Bommarito attests, adding that since heart failure is a chronic and progressive syndrome, changes in these three measures are important indicators of heart failure decompensation.

Kerry Bommarito

Joseph Drozda, M.D., cardiologist, director of outcomes research at Mercy, says that “perhaps 60 percent of the data you would really like is available to you as discrete data in the EHR. The remaining 40 percent is contained in text and clinical notes, and in order to get the meaningful data out, you have to [use] something like NLP to capture it.”

Indeed, Dr. Drozda believes the issue stems from most EHRs being originally developed as billing systems that were designed to capture data necessary for populating a claim form and submitting a bill. “They weren’t really designed for clinical care; that came afterwards,” he says. “And in a lot of ways, in the early stages, the EHR systems were putting the paper records in electronic format without any regard to trying to use the data on the back end. It was pretty much all text. I think it was a basic design flaw in most EHR systems right from the beginning, though we are starting to overcome it.”

Nonetheless, Drozda notes that this challenge is still difficult to overcome, because in order to capture something like ejection fraction—a measurement in determining how well the heart is pumping out blood, helping to diagnose and track heart failure—visually speaking, a clinician has to look for a dropdown menu or find someplace to enter a value, which takes extra time. “So there are two things working against you—the basic underlying technology challenges and the workflow challenges that clinicians face in entering discrete data. We have to be very careful in how much discrete data [we make] clinicians enter, as they are already concerned with deaths by 1,000 clicks,” he says.

Joseph Drozda, M.D.

As such, for Mercy’s heart failure patient population, project leaders brought in all of the notes that they had at the time, totaling about 34 million going back seven years. NLP queries were developed by a team of Mercy data scientists to search for relevant linguistic patterns, and then the queries were evaluated for both precision and recall. When the queries were determined to have a high accuracy, the results were integrated into a comprehensive data schema that contains real-world clinical data for each heart failure patient from before the diagnosis of CHF to their current state, Bommarito explains.

The final queries were validated and had a high accuracy with an F-measure (the measure of as test’s accuracy) score above 0.90 (1 is the highest F-measure value possible). “This F-measure score shows that Mercy’s queries were highly precise (positive predictive value) and had high recall (also known as the true positive rate or sensitivity), says Bommarito. What’s more, “These results show that natural language processing is a reliable and accurate method to extract relevant data from clinical notes in a CHF population,” she concluded.

Mercy leaders further note that capturing this data will help them answer a key clinical question: how does cardiac resynchronization therapy (CRT) affect patients who have heart failure? As they explain, CRT is a way of pacing the heart so that it pumps more efficiently. And it is used in patients not just with heart failure, but with a specific type: heart failure with reduced ejection fraction. “So when we are looking at how CRT affects the natural history of patients with heart failure with reduced ejection fraction, we need to have the ejection fraction and the NYHA functional classification that tells us how the patient is doing symptomatically,” Drozda explains.

At that point, Mercy can capture its entire population of heart failure patients, which includes reduced ejection fraction, and those who have reserved ejection fraction—but not reduced—by using NLP. “And we can look at these patients long before they receive CRT; we are going back three years in history to get started, so we can see how the patient did for the year or two before the CRT device was put in, and see how he or she did after it was put in, and then compare those patients with CRT devices with those who didn’t [have them]. So it’s a great opportunity to look at the impact of a high-end technology on a very sick population of patients with heart failure. And without NLP, we couldn’t do this,” Drozda contends.

What’s more, the NLP software that Mercy leveraged, from U.K-headquartered Linguamatics, included a library of terms that was a great starting point for the project’s team, Bommarito says. She offers an example that when Mercy was looking for a shortness of breath symptom, for instance, the library had all of the different medical ways that a clinician might state ‘shortness of breath.’ So we didn’t have to sit there and think of ways that the doctor might say ‘shortness of breath.’ And that was really helpful.’”

Adds Drozda, “NLP has come a long way. I have been a skeptic based on others’ experiences, but this is my first time involved and the results [we have gotten] have been tremendous. Those F-measure scores are amazing to me; it’s restored my faith in NLP’s ability to get us out of this data capturing conundrum.”