Value of unstructured patient narratives

July 2, 2010

Current EHRs capture most information — patient demographics, medications and problem lists — as structured data, and often codify the details to support billing instead of clinical activities.

Jeffrey Barry

Cautionary tales of throwing the patient out with the paper — in technical terms, failing to fully utilize unstructured clinicians' notes in the EHR — are surfacing everywhere. In her April 22 New York Times commentary, Pauline Chen, MD, discussed the importance of the patient narrative, and the challenges of replicating nuances of care in current EHRs. A month earlier, Gordon Schiff, MD, and David W. Bates, MD, wrote in The New England Journal of Medicine that “free-text narrative will often be superior to point-and-click boilerplate in accurately capturing a patient's history.”

Thought-critical, free-text physicians' notes are under threat. Current EHRs capture most information — patient demographics, medications and problem lists — as structured data, and often codify the details to support billing instead of clinical activities. The frequent use of the word “structured” in the definition for meaningful use released by the Centers for Medicare and Medicaid Services (CMS) may further encourage and compound this trend.

Doctors may be vocalizing the issue, but public health researchers also stand to gain from a richer electronic patient narrative. The ability to access and mine robust databases of patient information would enable public health researchers to more effectively perform nuanced, descriptive research. In turn, advanced technology to capture and report clinical documentation may better meet meaningful-use requirements for providing electronic syndromic surveillance data, immunization registries, and reportable lab results to public health agencies. Academic and research hospitals that unlock the unstructured data's enormous potential could also use it to attract both quality investigators and increased funding.

Providing color and context
The unstructured free text of the physician's progress notes provides color to the structured data's black and white. Notes contain the doctor's comments following a patient visit, along with helpful reminders, patient history, intake, examination and discharge information. The information is also essential for physicians to communicate about a common patient. A recent study by Nuance Communications illustrated that 94 percent of physicians felt that it was important to include doctors' notes in the patient record.”

Notes also show great potential to meet CMS measures for disease surveillance, as well as for adverse event reporting and nuanced public health research.

Syndromic surveillance of epidemics, cancer clusters and even bioterrorism from unstructured data in EHRs could help target resources to slow disease spread. In 2008, Jeff Friedlin and colleagues at Regenstrief Institute and Indiana University School of Medicine used a data-mining tool with natural language processing (NLP) technology — a program that seeks to understand the context and connotation of text — to scour electronic free-text culture test reports and detect incidence of methicillin-resistant Staphylococcus aurerus (MRSA). The researchers were able to produce sensitivity, specificity and positive predictive values exceeding 99 percent. If this technology could be applied to other forms of unstructured data and diseases, it could be harnessed to alert the local health department to possible outbreaks significantly faster.

Additionally, relying on caregivers to report adverse events can lead to an underestimation of their frequency by a factor of around 20, according to another study by Bates and colleagues. Manually reviewing charts is generally effective but prohibitively expensive. Using NLPs can streamline this process. This could greatly improve post-market surveillance for new drugs and medical devices.

According to Dr. James G. Jollis and his colleagues from the Duke University Medical Center, electronic clinical documentation also may be a more trustworthy source for public health research. The diagnostic codes used for insurance and billing — sometimes the only data that some of today's EHRs will spit out — can be unreliable for research and highly variable. The physicians' notes provide context for the quantitative data found elsewhere in the health record.

Mining unstructured data
No doubt, incorporating physicians' notes into the EHR and then into public-health research proves challenging — but it is possible. The first requirement is a robust electronic clinical documentation system that allows clinicians to capture their experiences treating patients. Next, researchers Scott Spangler and Jeffrey Kreulen at IBM describe a three-step process for mining that unstructured data based on exploring, understanding and analyzing. The process starts by exploring the data to find relevant information, such as by using a keyword search or by selecting particular structured fields to limit the amount of data that ultimately needs to be parsed. This alone, though, lacks the ability to understand context, such as the difference between confirming and negating a potential diagnosis.

The next phase addresses this by understanding the selected information to create an analyzable structure. This could include NLPs, which rely on various methods such as pattern matching or rule-based techniques. Some early adopters such as New York-Presbyterian Hospital-Columbia University Medical Center and the National Cancer Institute already have this in place. Spangler and Kreulen also suggest using taxonomic methods, partitioning and clustering as potential methods. After this process is complete, then the data can begin to be analyzed, looking for trends and correlations appropriate for the research.

As EHRs become increasingly widespread due to the billions of dollars in federal stimulus incentives, harnessing unstructured clinicians' notes gives us the power to yield valuable patient data. With each year of data, more information will be gathered that could be used to find predictors for diseases or adverse effects of treatment that would otherwise have gone unnoticed by most traditional research studies. Though challenging, capturing and delving into this data will be worth the effort, and could potentially help healthcare institutions meet requirements for CMS reporting and for meaningful use, access funding and, most importantly, improve the health of entire populations.

Jeffrey Barry is research fellow, Healthcare Innovation and Technology Lab, and 2010 MPH Candidate at the Columbia University Mailman School of Public Health.

For more information on the Healthcare Innovation and Technology Lab:

Sponsored Recommendations

Care Access Made Easy: A Guide to Digital Self-Service for MEDITECH Hospitals

Today’s consumers expect access to digital self-service capabilities at multiple points during their journey to accessing care. While oftentimes organizations view digital transformatio...

Going Beyond the Smart Room: Empowering Nursing & Clinical Staff with Ambient Technology, Observation, and Documentation

Discover how ambient AI technology is revolutionizing nursing workflows and empowering clinical staff at scale. Learn about how Orlando Health implemented innovative strategies...

Enabling efficiencies in patient care and healthcare operations

Labor shortages. Burnout. Gaps in access to care. The healthcare industry has rising patient, caregiver and stakeholder expectations around customer experiences, increasing the...

Findings on the Healthcare Industry’s Lag to Adopt Technologies to Improve Data Management and Patient Care

Join us for this April 30th webinar to learn about 2024's State of the Market Report: New Challenges in Health Data Management.