At HIMSS22, Drawing Conclusions from AI and Machine Learning Work

On Monday, March 14, at HIMSS22, which is being held in the Orange County Convention Center in Orlando, Florida, the Machine Learning & AI for Healthcare Forum concluded with a presentation by Suchi Saria, Ph.D., the John C. Malone Associate Professor and the Director of the Machine Learning and Healthcare Lab at Johns Hopkins University in Baltimore.

Under the headline “Top 5 Considerations for a Successful AI Strategy,” Saria, who is also the CEO of the New York City-based Bayesian Health, an analytics solutions firm, noted that, 20 years ago, when she first got into machine learning and artificial intelligence, “At that time, very few people in AI and ML understood healthcare, and vice versa. And the languages and cultures of the fields were so far apart,” she recalled. “Then I came to Hopkins. We’ve done research projects with NIH, CDC, FDA [the National Institutes of Health, Centers for Disease Control and Prevention, and Food and Drug Administration, respectively], and other agencies and organizations. So, starting with pure research and then doing foundational research and translating it,” researchers and developers began to learn how to marry AI and machine learning to healthcare operations.

In fact, Saria told her audience, “Working with a startup, I got to see how messy other sectors were in terms of their data sets. It’s not just healthcare, in which the data is messy. And I unfortunately lost my nephew to sepsis, one of the biggest causes of inpatient mortality. I happened to have written one of the earliest papers, in 2015, on using AI to address sepsis. But applying research to the bedside is a whole lot harder than it looks. We spun out a company, Bayesian Health, out of Hopkins. It’s been a real experience applying these tools.”

Indeed, in that context, Saria said, “My research, translation, work outside healthcare, and work inside healthcare,” have influenced her perspectives on the forward direction of AI and ML in healthcare. “As we dive into the strategies,” she said, “the issue is not the lack of potential use cases; it’s knowing where to start. And there are a lot of learnings from the field from the past few years, that can be collected. That’s what we’ve attempted to do with Bayesian.”

What’s more, Saria said, when it comes to the position of AI and ML in healthcare, “We’re still very much thinking of all this potential as ‘cool.’ But we’re not realizing that there are massive staffing shortages that will only continue to intensify; and outcomes plummeted last year, because of staffing shortages; and more and more contracts involve performance targets. So AI is not a ‘nice-to-have’ thing; it’s something we need to do now, and turn into a repeatable muscle function in the [patient care] organization.” Further, she noted, “Not all AI is created equal: strategies for safe and effective adoption.” And she noted that a paper that she has coauthored will appear on March 23 in the New England Journal of Medicine.

Key pieces of advice

After having spent 20 years in Ai and machine learning, Saria told her audience that, when it comes to figuring out what to do in that area, in healthcare, “The first thing is that you have to decide what you want to accomplish. Having a clear need also makes it easy to keep the necessary focus, because any one of these projects is very, very hard. How many times have I heard systems be very enthusiastic, but six months later, they’re out of energy? First, they never put in the requisite energy in the first place.”

Next, she said, the second consideration is achieving “deep integration within workflow. Not just integrated within your EMR, but in a design context. We throw around the word ‘design’; but there’s a huge difference between a back-in-your-garage-assembled kind of mobile phone, and a phone that works every time. The energy difference is 100X, if not 10000X. So it’s really important to put in the energy to think about design, stakeholders, and actionability. Who will use it, how will they use it, what will they do with it? All of that is product design thinking.” The focus, she said, should be on “deeply integrated” functionality.

Saria then referenced correspondence in The New England Journal of Medicine published on July 15, 2021, under the headline “The Clinician and Dataset Shift in Artificial Intelligence,” which she coauthored with seven other researchers. In that correspondence, the authors wrote that “Artificial intelligence (AI) systems are now regularly being used in medical settings, although regulatory oversight is inconsistent and undeveloped. Safe deployment of clinical AI requires informed clinician-users, who are generally responsible for identifying and reporting emerging problems. Clinicians may also serve as administrators in governing the use of clinical AI. A natural question follows: are clinicians adequately prepared to identify circumstances in which AI systems fail to perform their intended function reliably? A major driver of AI system malfunction is known as ‘dataset shift.’ Most clinical AI systems today use machine learning, algorithms that leverage statistical methods to learn key patterns from clinical data. Dataset shift occurs when a machine-learning system underperforms because of a mismatch between the data set with which it was developed and the data on which it is deployed. For example, the University of Michigan Hospital implemented the widely used sepsis-alerting model developed by Epic Systems; in April 2020, the model had to be deactivated because of spurious alerting owing to changes in patients’ demographic characteristics associated with the coronavirus disease 2019 pandemic. This was a case in which dataset shift fundamentally altered the relationship between fevers and bacterial sepsis, leading the hospital’s clinical AI governing committee (which one of the authors of this letter chairs) to decommission its use. This is an extreme example; many causes of dataset shift are more subtle.”

Further, “Successful recognition and mitigation of dataset shift require both vigilant clinicians and sound technical oversight through AI governance teams. When using an AI system, clinicians should note misalignment between the predictions of the model and their own clinical judgment, as in the sepsis example above. Clinicians who use AI systems must frequently consider whether relevant aspects of their own clinical practice are atypical or have recently changed. For their part, AI governance teams must be sure that it is easy for clinicians to report concerns about the function of AI systems and provide feedback so that the clinician who is reporting will understand that the registered concern has been noted and, if appropriate, actions to mitigate the concern have been taken. Teams must also establish AI monitoring and updating protocols that integrate technical solutions and clinical voices into an AI safety checklist,” the authors wrote last year.

Speaking of that article, Saria wrote that “The paper brought in a lot of attention in the AI community. For the past ten years, we’ve been writing about data drips and chips. You have to look at life cycle and maintenance; and you need a process infrastructure with which you can measure and monitor performance and improve performance over time. In healthcare, this is something that we haven’t putting enough effort into. One of the speakers used this term ‘grad-ware’; we’re literally relying on a few graduate students to create the algorithms. But you need the energy and effort and resources to sustain this. And, as was noted in a panel earlier today, many times health systems try to attack sepsis, and find it really, really hard. So some systems have tried sepsis and given up or tried something easier. At the end of the day, healthcare delivery has to understand clinical data, and where your patients are going. And if we don’t understand the problem of clinical augmentation, we’re screwed, because of staffing shortages, etc.”

Indeed, Saria told her audience, 89 percent of providers have adopted some sort of sepsis tool; but careful examination of the sepsis tool implementations have found that, when her team at Bayesian looked closely at the success levels of sepsis-alert algorithms, they found that the actual rates of improvement in intervention turned out to be far more modest than they appeared at first glance. In fact, she said, “I’ve seen incorrect evaluation. People measured sepsis for mortality, then deployed the tool, then used billing code data, and evaluated. But it looks as though you’ve improved mortality, but there’s a dilution effect.”

As a result of what’s being learned, Saria said, “Going forward, we really need to be thinking platform instead of point. As you’re doing multiple use cases, your users are starting to … But CIOs would like to find one platform to apply to all situations, but that’s not possible. You’ll have different classes of platforms. And that’s OK. Think of it as buckets of problems you’re solving over time. And should you wait until you have your full data warehouse in place? Well, technology moves fast. If you spend five years building a data warehouse before you activate everything? That’s too long. You need to build as you go, create a strategy for using it in different areas, and keep moving forward and developing.”

Ultimately, Saria said, “As a data scientist, it blows my mind how nationally, we’re deploying technology over and over without doing any kind of evaluation. And in the absence of evaluation, it’s like a hamster on a wheel, running and running. We’re all doing a ton of work but not knowing what’s working. And the whole idea if of a learning health system. And end-to-end evaluations are crucial. We need to address the typical problems in evaluations: missing, incomplete, incorrect, and/or unsustainable data, and apples-to-oranges comparisons.” So the landscape right now remains mixed, she said: “The field is moving extraordinarily fast; there’s a lot to do. But the knowledge exists in pockets.”