It’s becoming clearer and clearer now by the day: not only is artificial intelligence a very real phenomenon, it’s here to stay. We’re early days on, but the leaders of forward-thinking and forward-acting hospitals, medical groups, and health systems aren’t wasting any time; they’re determining which problems to solve and needs to address, they’re putting strong governance and project prioritization processes into place, and they’re moving ahead to implement various types of AI: algorithmic, generative, and agentic.
This topic emerged strongly in a webinar that I moderated today (June 24), with speakers from Allina Health, Duke Health, and SoundHound, in a discussion around the potential inherent in agentic AI to improve the patient experience. David Ingham, M.D., senior vice president and chief digital and information officer, represented Allina Health; Neil Deep Ray M.D., chief innovation and technology officer, represented Duke Health; and Nikhil Mislankar, healthcare delivery director, represented SoundHound. Allina and Duke are client organizations of SoundHound, which has been developing agentic AI tools for the two health systems to reduce friction and enhance the patient experience.
It was clear from listening to Drs. Ingham and Ray that they and their colleagues have been thinking both very strategically and very pragmatically about how to deploy agentic AI to improve patients’ experiences navigating their health systems. And they made it clear that they were not and are not looking for any kind of magic “silver bullet” technologies; both fully understand that the journey into AI is indeed a long-term journey, and that there is no simply turning on a switch or “plug-and-play” action involved; everything being considered is being thought through carefully, tested, and measured upon implementation. But it’s also clear that both physician-informaticist leaders recognize that intensifying staffing shortages and growing demands on the time of the clinicians, will make AI implementation a necessity, not a luxury.
Coverage
Everywhere across the U.S. healthcare system, these realities are playing out in real time now. For example, our Senior Contributing Editor, David Raths, this spring interviewed David Sontag, Ph.D., a professor at MIT and co-founder and CEO of a Boston-based startup company called Layer Health, which has developed a large language model (LLM)-powered data abstraction platform to extract clinical data from patient medical charts for data registries, clinical research and care optimization. Then this month, Raths reported on the announcement by leaders at the Salt Lake City, Utah-based Intermountain Health, that Intermountain’s “Clinical Data Management team will first work with Layer Health to validate the AI’s ability to achieve high accuracy prior to deployment, ensuring the AI meets clinical performance standards required to support real-world clinical registry reporting. Starting with registries in stroke, bariatric surgery and cardiovascular disease, Intermountain Health will deploy Layer Health’s AI platform across Intermountain’s full network, spanning 33 hospitals and multiple states, with plans to expand to other registry areas in the future.”
And he quoted a statement by Nickolas Mark, managing partner of Intermountain Ventures, who said that "Layer Health’s technology represents a meaningful step forward. We’re excited to partner with a company whose AI expertise and commitment to validation align with our broader vision for using innovation to solve complex operational challenges in healthcare."
Progress being measured and reported in leading journals
What’s more, progress, around such areas as large language model development, is being thoughtfully measured now, too. As Raths noted in a report on the subject earlier this month, “Duke University School of Medicine researchers have developed two new frameworks designed to evaluate the performance, safety, and reliability of large-language models in healthcare. Published in npj Digital Medicine and the Journal of the American Medical Informatics Association (JAMIA), the studies offer a new approach to ensuring that AI systems used in clinical settings meet the highest standards of quality and accountability. “
Indeed, he wrote, “As large-language models become increasingly embedded in medical practice — generating clinical notes, summarizing conversations, and assisting with patient communications — health systems are grappling with how to assess these technologies in ways that are both rigorous and scalable. The Duke-led studies, under the direction of Chuan Hong, Ph.D., assistant professor in Duke’s Biostatistics and Bioinformatics, aim to fill that gap.”
Raths reported that “The npj Digital Medicine study introduces SCRIBE, a structured evaluation framework for ambient digital scribing tools. According to Duke, SCRIBE draws on expert clinical reviews, automated scoring methods, and simulated edge-case testing to evaluate how well these tools perform across dimensions such as accuracy, fairness, coherence, and resilience.” And he quoted Duke’s Hong as stating that “Ambient AI holds real promise in reducing documentation workload for clinicians. But thoughtful evaluation is essential. Without it, we risk implementing tools that might unintentionally introduce bias, omit critical information, or diminish the quality of care. SCRIBE is designed to help prevent that.”
He also reported on a related study in JAMIA (the Journal of the American Medical Informatics Association) that applied “a complementary framework to assess large-language models used by the Epic platform to draft replies to patient messages. The research compares clinician feedback with automated metrics to evaluate aspects such as clarity, completeness, and safety. While the study found strong performance in tone and readability, it also revealed gaps in the completeness of responses — emphasizing the importance of continuous evaluation in practice.” And he quoted Michael Pencina, Ph.D., chief data scientist at Duke Health, who said in a statement that “This work helps close the distance between innovative algorithms and real-world clinical value. We are showing what it takes to implement AI responsibly, and how rigorous evaluation must be part of the technology’s life cycle, not an afterthought.”
Industry leaders see progress advancing
These kinds of joint ventures are typical of the kinds of activity taking place nationwide now. For my State of the Industry Report earlier this year, I interviewed Liam Bouchier, managing director, data and AI practice, at the Chicago-based Impact Advisors consulting firm. Boucher noted that, “From November 2022 when ChatGPT was introduced and became the fastest-growing technology adoption in history, there’s been a significant investment in generative AI in all industries, including healthcare. And AI isn’t new. But the generative AI is new, and that’s what’s different and is driving a lot of the unknowns. And in the past year, we’ve moved into this experimentation phase where organizations are figuring out what to do with it. And I think a lot of people are looking at the production phase, to prove the concept and the value. And that’s where you lead to innovation and expand capabilities. And once you get to that second phase of innovation, that’s where you really get to growth.”
Indeed, Bouchier emphasized to me, “We’re really just in the first phase of exploration and experimentation. There’s a huge potential and there are many use cases. But they haven’t been able to scale their work.” Not surprisingly, note-starting/message-starting is one of the technologies being most quickly and widely implemented right now. That doesn’t surprise Bouchier either, though he notes that “Those are very rudimentary and basic uses. There’s more value actually sitting outside the platforms,” he stresses.
Also for that report, I interviewed Brian Patterson, M.D., physician administrative director for clinical AI at UW Health in Madison, Wis. Dr. Patterson, who continues to practice as an emergency physician, told me that “We’re still excited about using LLMs [large language models] to improve patient care and reduce clinician burden. We’re already using AI LLMs to draft responses to patient questions in the chart. We’re piloting ambient technology, which has been going well. And we’re getting into summarization, which is going to be one of the next big waves, but these documentation reduction tasks—it’s clear that LLMs are able to generate a great deal of text. As we develop summarization for clinicians, that represents a big step up in terms of how much trust you need in the tool. If a summarization isn’t good even once a while, clinicians will learn not to trust it.”
In that report, I cited results from our State of the Industry Survey that found that, as of the end of last year, our survey found that 32 percent of survey respondents said that they had made significant progress in their adoption of AI technologies; 24 percent said they had not yet done so, but were planning to do so soon; and 36 percent said they had not done so.
Responding to our survey results, Dr. Patterson told me that “The key in your survey question was the word ‘significant.’ I think we’re teetering over the edge of inflated expectations, and the pit of disillusionment, before we get to the plane of productivity. And the talk for a few years has been, this is going to change medicine, are you implementing this in clinical care? But a lot more people are adopting these things, but it’s not going to magically change things. Improving the lives of the workforce or the outcomes for patients, that’s different from just using things.”
In the end, Patterson told me, “Like a lot of things that are hyped up, it’s going to be a little trickier to get clear value propositions—where these things work right out of the box and not, and where organizations are going. When I talk to other colleagues, the shine of newness has worn off. Hopefully, these technologies will be huge value-creators, but it will take a lot of work.”