Leaders at patient care organizations nationwide are exploring artificial intelligence, including generative artificial intelligence. One leader interviewed for a feature article in our September/October issue was Michael Hasselberg, R.N., Ph.D., the chief digital health officer at the University of Rochester Medical Center (URMC) in Rochester, New York. Hasselberg is leading an ongoing initiative to explore and implement forms of AI to support clinical operations. Below is an excerpt from the recent interview that Editor-in-Chief Mark Hagland conducted with Hasselberg.
Tell me a bit about your role and your organization?
I lead the digital transformation strategy, everything from the patient portal, up through to AI, within our clinical service lines. And we’re a unique health system, because our health system is still fully integrated into its parent university. Most academic health systems are just affiliated, but we roll up to the university; so that gives us some unique advantages. Also, I co-lead the innovation arm to our health system. We have a true innovation incubator: music, engineering, data science, computer science, business, along with the medical school, nursing school, and dental school, are all participating. The innovation arm is called the UR Health Lab.
Until now, concrete progress in developing algorithms and developing generative AI capabilities, has actually been relatively slow so far in healthcare. What’s your perspective on that?
I would say that that, up to this point, we’ve lived on the micro side in AI. And where it’s been done is by bio-researchers and by subspecialists, cardiologists and oncologists, building models to solve very specific problems. That will continue to happen. But why we haven’t scaled those models, is exactly what Aaron said. Is the model going to stay trustworthy and safe? To be frank, our data is really dirty, it’s really noisy. And AI hasn’t done well with really noisy, dirty data. Where it’s been scaled on the clinical side, has been in areas like radiology, where the imaging data is structured and cleaner. So interestingly, had you come to me six months ago to a year ago, I was really pessimistic around the applicability in the near term of AI in healthcare, because we have a data problem in healthcare. And I was brutal towards AI vendors who came to me and said, I’ve got this fabulous solution. And I said, no, you’ve got dirty data! And part of that is that I have data scientists who are not researchers here, on my team. We’ve been trying to build data models for five or six years, and it’s only on the imaging side where we’ve been successful.
Tell me about the origins of your current initiative?
The problem has been MyChart messages [generated inside the patient-facing communications platform inside the Epic Systems Corporation electronic health record system] coming into our clinicians’ in-baskets. We have not had a good system to triage those messages, going to a staff member, nurse, or provider. We’ve pretty much been sending all those patient-generated messages to providers [physicians], and that’s caused chaos.
So three or four years ago, we decided to focus on this to build natural language processing models to reliably and accurately triage messages, in order to send them to the right individuals.” Fast-forwarding to just a few months ago, he reports, the emergence of ChatGPT has turbocharged work on that project. “We’re excited because we’re one of the health systems that have access to GPT4 in Azure; we have our own instance of Azure. And because we have access to GPT4 on that instance of Azure, it’s secure and private.
So we can start to test GPT4. We’re testing Google’s generative AI and large language models as well. And we’ve found that the technology is mind-blowing. And the first thing we did when we turned GPT4 on in Azure, was to try to tune that large language model to very reliably and accurately triage those message, and it worked within two days.
And, once we had tuned the model, and prompted it, we ran it multiple times on our data. We looked at reliability: did it consistently send the same exact message to the same people? We got high 90s-level reliability back. And then we pulled random samples out of each of those buckets and sent them to random PCPs and asked them, should that message have gone to a physician, a nurse, a staff member? And the accuracy rate was somewhere around 85. And the boundary was if it wasn’t clear, send it to the provider. If you’re not sure, that’s the default.
What did you actually do, mechanically speaking?
Azure is a secure cloud instance that we have that we can leverage for putting PHI data on. So we have EHR data there. What we did was to retrospectively look at MyChart messages going back about six months to a year. And we tuned GPT4. We did some prompt engineering, asked it to write questions, said this is what a message should look like going to a physician, to a nurse, to a staff member. And once we tuned the model, and prompted it, we ran it multiple times on our data. We looked at reliability: did it consistently send the same exact message to the same people? We got high 90s reliability back. And then we pulled random samples out of each of those buckets and sent them to random PCPs and asked them, should that message have gone to a physician, a nurse, a staff member? And the accuracy rate was somewhere around 85. And the boundary was if it wasn’t clear, send it to the provider. If you’re not sure, that’s the default.
When did that go live?
It hasn’t yet been turned on yet in production. We’re testing where it fails and where it does really well. There are considerations we need to think about. One is cost. To use these models is not inexpensive. GPT4, because it was trained on a trillion-plus parameters, essentially the whole Internet, it’s expensive. There are costs around the use of tokens and server processing costs. So we’re trying to figure out when you need a sledgehammer—GPT4 is a sledgehammer—and when do we need a pickaxe? Sometimes, we just need a smaller large-language model. So a lot of what we’re doing on the innovation team is understanding which models do well, and at what. So what we’re focusing on is: I want us to start with non-patient-facing applications.
We have a ton of waste in healthcare, and there are a lot of opportunities to transform healthcare, and so I want to solve my provider burden problems such as ambient documentation and filling out forms—all of that is low-hanging fruit. The same with rev cycle and related issues. It’s making the back-end stuff more efficient. We can build stuff that’s more patient-facing; for example, can a model translate a physician’s note at the right literacy level for the patient, to address health equity issues? But it will be several years before we would consider turning such models on into production, because those models do occasionally hallucinate. So that’s our thought process. Right now, we’re just quickly building and seeing what we can do and not, before we turn it on into production for the health system.
One of the things that leaders involved in this work have said to me is, paraphrased, “It’s clear that when it comes to developing AI algorithms, it’s not possible to simply go to Target and buy algorithms off the shelf,” because of the extreme customization of clinical operations in hospital-based organizations nationwide. But having to develop every single use as a custom project could take forever. How do you see things evolving forward, in that regard?
Do I see a day where we can essentially go to Target? Yes. Absolutely. We can’t afford all these point solutions; our tech stacks are getting very complicated and expensive. The technology—it’s never been easier for a health system like mine, with data science resources, to build models. I want to open-source all of this: I want other health systems to be able to pull down that code and test a model and then turn it on and use it. That’s how you’re going to transform healthcare, through open sourcing. That’s where I see the future going.
And you see the need for strong governance in all this, correct?
Yes. If you haven’t already, you need to set up AI governance within your health system. Who sits at that table, and what policies are you applying? There’s a ton of work involved in creating the governance and project prioritization processes that will lead organizations to success in this area.” And he adds that, inevitably, the leaders at many patient care organizations will wait until their electronic health record and analytics vendors develop off-the-shelf systems for use, while others will move to “work with the Microsofts, the Amazons, the Googles of the world, and will look to those big companies to provide those services to them.