How UCSF Health Is Thinking About Scalable, Trustworthy AI

At a recent Kaiser Permanente Institute for Health Policy event, University of California San Francisco Prof. Julia Adler-Milstein, Ph.D., described how UCSF Health is establishing an approach to developing and vetting artificial intelligence models that are both trustworthy and scalable.

Last year Adler-Milstein was named the founding chief of the Division of Clinical Informatics and Digital Transformation (DoC-IT) in the Department of Medicine (DOM) at UCSF. She is a professor of medicine and the director of UCSF’s Center for Clinical Informatics and Improvement Research (CLIIR).

“We don't want to just implement technology for technology's sake. We really want to ensure that they are solving our real-world problems and doing so in ways that are trustworthy,” she said. “We have to operationalize that. What does it mean to be trustworthy? And how do we develop processes that ensure that the AI does meet principles of trustworthiness?”

In terms of scalability, Adler-Milstein added, “We want to figure out which are the tools that are most valuable and get them to enterprise scale so that they can impact our entire patient population. So we are thinking a lot about how do you start with a pilot, rapidly determine whether it's doing what it's supposed to do, and then have a plan for scaling that so it does get to the enterprise level.”

She described a “three horizons” framework around deploying AI effectively. This involves the current mature business, the rapidly growing near-term business and emerging future business opportunities. She compared it to another realm: gas-powered cars, electric-powered cars and the future of autonomous vehicles. “We need to be investing on all three horizons. We need to make gas-powered cars better; we need to make electric vehicles better; and we need to be designing for this future state.”

In healthcare, that translates into trying to make the current fee-for-service system better by implementing AI for operational use cases. It also involves thinking about generative AI for augmenting clinician intelligence in the near future, and starting to see a future-state model of AI-driven virtual care. She said that could involve getting a sense for what might happen with a given patient trajectory, starting to anticipate some of the lab tests they may need and sending them for lab tests before they even come to see a physician.

UCSF is trying to design strategies that will allow it to invest in AI on all three time horizons. “It takes a team to do this work well,” Adler-Milstein said. “We are all learning this on the fly as we do it.”
 
Among the things they will learn as they go are:
• Who is the team that needs to be at the table?
• How do you think about the expertise?
• How do they work together?
• What are the structures and functions?
• What is the technical infrastructure that is needed to be able to rapidly deploy different types of AI models?

Because UCSF uses the Epic health IT platform, it started with Epic’s cognitive computing platform. “We pretty quickly realized that it was not going to be sufficient to meet the needs of AI at an academic medical center,” she said. “So we basically built our own platform that we call HIPAC [Health IT Platform for Advanced Computing] that allows a much broader set of data to feed into models in real time.”

Even though UCSF has been working on AI for a while, it has not developed hundreds of models yet. “At an enterprise-level scale, we're still in the early days and we are doing a mix of models that are coming from our EHR vendor, and some models that we have home-grown or self-developed either by people within our health system or by researchers,” Adler-Milstein explained.

A lot of these are focused on operational use cases — things like capacity management or predicting use of blood products, things that are clinically adjacent and relevant, but not predicting diagnoses or directing patients to certain parts of the health system yet. “We're always evaluating new models, mostly driven by the needs of our health system, but with some capacity for what our researchers and front-line clinical faculty are interested in,” she added.

Adler-Milstein sits on an AI governance committee at UCSF Health, which serves as a gatekeeper for which models get deployed to its patients. The committee start with discovery: Is this model trying to solve a real problem? Have they thought about the entirety of the solution — not just does the model has a high predictive value, but do they understand how to integrate it into workflow? If it is implemented it, will the interventions that come out of it be equitable?

There should be a focus on patient-positive interventions, Adler-Milstein stressed. “If we're going to predict patient no-shows, you can either double-book that slot, which is negative for the patient, and if you have an already-biased model is going to further worsen disparities. Or you could say if a patient is predicted not to show that we will provide transportation, and that's a patient-positive intervention.”

She said UCSF officials are trying from the start to make sure that they are thinking about implementation of these models holistically, and how humans will use them and interact with them. “We then do the development and evaluation, moving into pilot or RCT [randomized controlled trials], and then the adoption and ongoing monitoring,” she added. “What’s important about this is it means we're touching these models many, many different times. There is a huge amount of work just to even take one model through this process. What we're trying to figure out now is how do we resource this if we want to be able to put hundreds of models through this process. It's a huge amount of investment.”

In closing, Adler-Milstein said her focus is on making sure that the models are good, and making sure that the humans are good at using the models. “We really have to think about those two pieces together. Yes, we have to think about algorithmic vigilance and algorithmic drift, but we also have to think about clinician vigilance. Is it really realistic to think that if a model gives a clinician bad output that a human's going to be able to recognize that and capture it, and prevent that from getting to the patient? We're really trying to bring these two together and think about measures and methods for both algorithmic vigilance and clinician vigilance.”