Researchers: LLM-Produced Discharge Summaries Comparable to MD-Produced Ones

May 5, 2025
A study published in JAMA Internal Medicine points to potential AI support for hospitalists

Can discharge summaries produced using large-language models (LLMs) actually match the accuracy and patient safety of discharge summaries produced by human physicians? A team of researchers at UCSF Health in San Francisco has produced a study on the subject, which was published online in JAMA Internal Medicine on May 5, to find out. The study was conducted by a team of 24 researchers, led by Christopher Y.K. Williams, MB BChir.

As the article’s authors note, “High-quality discharge summaries are associated with improved patient outcomes, but contribute to clinical documentation burden. Large language models (LLMs) provide an opportunity to support physicians by drafting discharge summary narratives.” And with that in mind, they conducted a “cross-sectional study” at the University of California, San Francisco that included “100 randomly selected inpatient hospital medicine encounters of 3 to 6 days’ duration between 2019 and 2022,” with 22 attending physician reviewers performing a blinded evaluation of physician- and LLM-generated narratives, evaluating them for “overall quality, reviewer preference, comprehensiveness, concision, coherence, and 3 error types (inaccuracies, omissions, and hallucinations),” and assigning “potential harmfulness scores ranging from 0 to 7 on an adapted Agency for Healthcare Research and Quality scale.”

What did the researchers find? “Across 100 encounters, LLM- and physician-generated narratives were comparable in overall quality on a Likert scale ranging from 1 to 5 (higher scores indicate higher quality. “LLM-generated narratives were more concise and more coherent than their physician-generated counterparts, but less comprehensive.” That said, “There was no significant difference n the potential for harm between LLM- and physician-generated narratives across individual errors.”

What does all this mean? “This blinded cross-sectional study of 100 physician-generated versus LLM-generated discharge summary narratives for quality and safety was conducted as a first step toward assessing the role of LLMs in drafting these narratives in clinical practice,” the article’s authors write. “Overall, no differences were found in either quality or reviewer preference between physician- and LLM-generated narratives. These results suggest that neither physicians nor LLMs consistently write perfect narratives. Although LLM-generated narratives were more likely to contain errors (particularly more omissions and inaccuracies), physician narratives were just as likely to contain hallucination-type errors. This is notable given the well-documented propensity for LLMs to hallucinate. One possible contributing factor to this finding may be the availability of new information to the physician on writing the discharge summary, which was not previously documented in the notes. However, human fallibility in reconstructing historical events over the course of the encounter could play a role.”

In the end, the researchers conclude that “In this cross-sectional study of 100 inpatient hospital medicine encounters, there were no differences in the overall quality rating or reviewer preferences between LLM- and physician-generated narratives. LLM-generated narratives were more concise and more coherent than their physician-generated counterparts, but less comprehensive. LLM-generated narratives were more likely to contain errors, but had low overall potential for harm. These findings,” they write, “suggest that LLMs could be used to draft discharge summary narratives of comparable quality and safety to physicians for inpatient hospital encounters and that, in clinical practice, using such summaries after human review may provide a viable option for hospitalists.”

The full team of researchers is as follows: Christopher Y. K. Williams, MB BChir; Charumathi Raghu Subramanian, M.D.; Syed Salman Ali M.D.; Michael Apolinario, M.D.; Elisabeth Askin, M.D.; Peter Barish, M.D.; Monica Cheng, M.D.; W. James Deardorff, M.D.; Nisha Donthi, M.D.; Smitha Ganeshan, M.D., MBA; Owen Huang, M.D.; Molly A. Kantor, M.D.; Andrew R. Lai, M.D., M.P.H.; Ashley Manchanda, D.O.; Kendra A. Moore, M.D., MBE; Anoop N. Muniyappa, M.D., M.S.; Geethu Nair, M.D.; Prashant P. Patel, D.O.; Lekshmi Santhosh, M.D., MAEd; Susan Schneider, M.D., MSPH; Shawn Torres, M.D.; Michi Yukawa, M.D., M.P.H.; Colin C. Hubbard, Ph.D.; Benjamin I. Rosner, M.D., Ph.D.

Sponsored Recommendations

Discover how leading health systems are transforming patient care and staff workflows using agentic AI. Join experts from Allina Health, Duke Health, and SoundHound AI to explore...
Struggling with denials and staffing gaps? Learn the five essential claim processes you should automate to boost efficiency, reduce manual work, and increase your clean claim ...
How can Tegria help you enhance your Payer Platform capabilities and gain momentum with provider rollouts?
Increase your business agility with Pure's digital payer platform