AI Tools Keep Breaking Barriers—And That’s a Good Thing

Until recently, at least, there had been a fair amount of hand-wringing among some physicians with regard to artificial intelligence (AI) and machine learning (ML). Could those types of tools actually put doctors out of work? The anxiety first manifested among radiologists, as early predictions were that AI tools would be able to spot and perhaps even diagnose certain types of abnormalities better than humans could. And I can say that six years ago at the annual RSNA Conference, held every year the week immediately following Thanksgiving, in Chicago’s vast McCormick Place Convention Center, a great deal of trepidation was discernible among the radiologists present.

Fast-forward to last fall, and the picture had changed almost entirely, with radiologists realizing that not only will AI not replace radiologists; radiologists will increasingly come to rely on AI in order to become maximally efficient and effective, in a time of increasing shortages of radiologists nationwide and exploding utilization of diagnostic imaging services.

As Elizabeth S. Burnside, M.D., M.P.H., senior associate dean in the School of Medicine and Public Health at the University of Wisconsin-Madison, and deputy director of the Institute of Clinical Translational Science for Breast Imaging, at the University of Wisconsin, put it in her plenary address on Nov. 27 at RSNA23, there are three absolutely key elements involved: a commitment to engage stakeholders; analysis based on both quantitative and qualitative techniques; and decision-making continuously aligned among stakeholders and focused on outcomes. Among the key concerns, she noted, being expressed by clinicians and others in healthcare, include the fear of an eventual over-reliance on AI, with humans eventually losing their own analytical abilities; and the fear of the loss of the ability to engage stakeholders, listen to their concerns, discuss those concerns, assess them, and participate in collaborative decision-making. “What do I believe?” Burnside asked her audience rhetorically. “I believe that we will need effective leadership that is task-relevant” in order to achieve success with AI in clinical settings; and also, that “Successful leaders will adapt their leadership style to the performance readiness of the stakeholders,” in order to engage them in the work of evaluation, assessment, and collaborative decision-making, around AI. Similar sentiments and perspectives were expressed by other speakers at RSNA that week.

Speakers at HIMSS24 in Orlando last month, the annual conference of the Healthcare Information and Management Systems Society, took a similarly pragmatic approach to the issue. In a session entitled “Digital Transformation from Screening, Diagnosis to Personalized Care,” one of the panelists, David McClintock, M.D., chair of the Division of Computational Pathology and Artificial Intelligence at Mayo Clinic in Rochester, Minn., said plainly of the ability to leverage AI solutions to impact care delivery, that “It comes down to what the clinical use case is. We’re really looking at bringing in multiple different areas, and bringing in data to develop a model, in ways that won’t add additional steps to process, while also making sure you don’t cause harm to patients. You have to bring all those different aspects together. I had a friend of mine who’s a pathologist, who said, hey, I have an algorithm I’ve built and published a couple of papers on it, can you help me deploy it? It seemed like an innocent request,” he remembered. “But I found out he used a third-party solution, there was cybersecurity risk, the workflow was all manual to execute on the algorithms; and I asked him, how do you think the clinicians will react? And he said, well, I’ll add an addendum to the end of my paper. So when you think about building out solutions, you have to begin to think about how to build a roadmap. What are the resources needed? And will clinicians want to use the algorithm clinically? And who will support it? All those things, including the regulatory aspect, need to be put together.”

And that is where we are right now inside patient care organizations, as clinician, data science, health IT, and other leaders thoughtfully consider which clinical and operational potentialities to pursue. No one appears to be rushing chaotically into AI development; indeed, right now, a huge proportion of the talk in many patient care organizations is around AI governance, in order that organizations not stumble by pursuing efforts of little value or that even could cause harm. There is great thoughtfulness across the land.

At the same time, the available technology itself is advancing rapidly, nowhere more than in generative AI. As we reported last week, an article published online on April 12 in The New England Journal of Medicine reports that, for the first time, “[T]he advancement from GPT-3.5 to GPT-4 is demonstrating a major advance in machine learning through the leveraging of large language models (LLMs), with the LLMs outperforming actual medical students on medical board examinations. On April 12, in the supplemental publication NEJM-AI, a large team of researchers reported that set of medical board exam results, in an article entitled ‘GPT versus Resident Physicians — A Benchmark Based on Official Board Scores.’ The authors are Uriel Katz, M.D., Eran Cohen, M.D., Eliya Shachar, M.D., Jonathan Somer, B.Sc., Adam Fink, M.D., Eli Morse, M.D., Beki Shreiber, B.Sc., and Ido Wolf, M.D.”

What did the researchers do? “We evaluated the performance of generative pretrained transformer 3.5 (GPT-3.5) and GPT-4 on the 2022 Israeli board residency examinations and compared the results with those of 849 practicing physicians. Official physician scores were obtained from the Israeli Medical Association. To compare GPT and physician performance, we computed model percentiles among physicians in each examination. We accounted for model stochasticity by applying the model to each examination 120 times.” And what did they find? “GPT-4 ranked higher than the majority of physicians in psychiatry, and it performed similarly to the median physician in general surgery and internal medicine,” though “GPT-4 performance was lower in pediatrics and OB/GYN; but remained higher than a considerable fraction of practicing physicians.” Meanwhile, in comparison, “GPT-3.5 did not pass the examination in any discipline and was inferior to the majority of physicians in the five disciplines. Overall, GPT-4 passed the board residency examination in four of five specialties, revealing a median score higher than the official passing score of 65 percent.”

In short, “This work showed that GPT-4 performance is comparable with that of physicians on official medical board residency examinations,” the article’s authors write. “Model performance was near or above the official passing rate in all medical specialties tested. Given the maturity of this rapidly improving technology, the adoption of LLMs in clinical medical practice is imminent. Although the integration of AI poses challenges, the potential synergy between AI and physicians holds tremendous promise. This juncture represents an opportunity to reshape physician training and capabilities in tandem with the advancements in AI.”

A few Nervous Nellies might be freaking out over the fact that GPT-4 has now for the first time outperformed medical students on medical board exams; certainly, the marquee-level headline is noteworthy. But the real significance is not that machines will be delivering patient care anytime soon—or anytime at all, in fact. Instead, the potential here is for AI tools to become sophisticated enough to provide instantly available, at-the-shoulder clinical decision support for harried, time-pressed physicians, at the point of care.

And, just as physicians understand that any clinical decision support is simply a tool for a human physician to use, they are absolutely smart enough to understand that these AI tools can never take the place of practicing physicians. There is a clear analogy here to what happened when electronic prescribing first became available. Some in medicine murmured that having electronic prescribing tools at the ready on smartphones was somehow “infantilizing” practicing physicians; after all, hadn’t elderly physicians—those in practice since, say, the 1970s—had to memorize immense amounts of information, including the clinical and commercial names of hundreds of prescription medications, among other things? Well, yes, but consider the number of prescription medications available on the market now, as well as those flooding onto the market every year. Certainly, most primary care physicians will readily admit, at least privately, that there is no way that they can keep up with all those names, let alone all the advances in medical understanding being published in clinical journals monthly. Indeed, way back in 2004, researchers noted in the Journal of the Medical Library Association, that the number of new clinical journal articles coming out every month, at that time—7,287 across 341 clinical journals examined—would require a physician 627.5 hours per month to evaluate those articles. Given that there are only 720 hours altogether in the average calendar month, that would leave a physician roughly an hour a day left to eat, sleep, and practice medicine outside of reading journals. And since then, the number of articles has exploded, with the flourishing of online publication.

In any case, the key point here is that, looked at from the vantage point of what will be gained, AI tools, including generative AI tools such as ChatGPT-4, will inevitably help to support physicians’ clinical practice. Will there be challenges? Of course there will be. But those in the know—including those developing and implementing AI tools—already understand that the benefits will far outweigh the problems in the long term. The biggest challenge will be for physicians and other clinicians to effectively partner with data scientists, health IT leaders, and yes, vendors, to help to come up with the best, most useful IT tools, ones that will benefit physicians, other clinicians, and above all patients. Doubtless, we will all look back ten years from now and see this as a time of very early days—which inevitably will have led to major breakthroughs for medical practice and for patient care delivery overall.