Researchers: AI-Generated Clinical Summaries Need Finetuning

Feb. 6, 2024
LLMs summarizing clinical notes could introduce errors with effects on clinician decisions

Clinical applications of generative artificial intelligence (AI) and Large Language Models (LLM) are progressing; LLM-generated summaries can provide benefits and could replace many future EHR (Electronic Health Record) interactions. However, according to a team of researchers, LLMs summarizing clinical notes, medications, and other patient information lack the US Food and Drug Administration (FDA) oversight, which they see as a problem.

In a viewpoint article for the JAMA Network, published online on Jan. 29, Katherine E. Goodman, JD., Ph.D., Paul H. Yi, MD., and Daniel J. Morgan, MD., MS., wrote, “Simpler clinical documentation tools…create LLM-generated summaries from audio-recorded patient encounters. More sophisticated decision-support LLMs are under development that can summarize patient information from across the electronic health record (EHR). For example, LLMs could summarize a patient’s recent visit notes and laboratory results to create an up-to-date clinical “snapshot” before an appointment.”

Without standards for LLM-generated summaries, there’s a potential for patient harm, the article’s authors write. “Variations in summary length, organization, and tone could all nudge clinician interpretations and subsequent decisions either intentionally or unintentionally,” Goodman, Yi, and Morgan argued. The reason for summaries varying is that LLMs are probabilistic, and there is no correct response on which data to include and how to order it. Slight variations between prompts can impact the outputs. The JAMA network provided an example of a radiography report with notes of chills and a cough. The summary, in this instance, added the term “fever”. This added word completes an illness script and could affect the clinician’s diagnosis and recommended course of treatment.

The writers of the JAMA Network report, “[F]DA final guidance for clinical decision support software…provides an unintentional “roadmap” for how LLMs could avoid FDA regulation. Even LLMs performing sophisticated summarization tasks would not clearly qualify as devices because they provide general language-based outputs rather than specific predictions or numeric estimates of disease. With careful implementation, we expect that many LLMs summarizing clinical data could meet device-exemption criteria.”

The article’s authors recommend regulatory clarifications by the FDA, comprehensive standards, and clinical testing of LLM-generated summaries.

Sponsored Recommendations

A Cyber Shield for Healthcare: Exploring HHS's $1.3 Billion Security Initiative

Unlock the Future of Healthcare Cybersecurity with Erik Decker, Co-Chair of the HHS 405(d) workgroup! Don't miss this opportunity to gain invaluable knowledge from a seasoned ...

Enhancing Remote Radiology: How Zero Trust Access Revolutionizes Healthcare Connectivity

This content details how a cloud-enabled zero trust architecture ensures high performance, compliance, and scalability, overcoming the limitations of traditional VPN solutions...

Spotlight on Artificial Intelligence

Unlock the potential of AI in our latest series. Discover how AI is revolutionizing clinical decision support, improving workflow efficiency, and transforming medical documentation...

Beyond the VPN: Zero Trust Access for a Healthcare Hybrid Work Environment

This whitepaper explores how a cloud-enabled zero trust architecture ensures secure, least privileged access to applications, meeting regulatory requirements and enhancing user...