In April 2023 a group called the Coalition for Health AI (CHAI) published a blueprint for trustworthy artificial intelligence deployment in healthcare. The organization said the 24-page guide reflects an effort among subject matter experts from leading academic medical centers and the healthcare, technology, and other industry sectors, who collaborated under the observation of several federal agencies over the past year. CHAI co-founder Brian Anderson, M.D., chief digital health physician of MITRE Corp., recently spoke to Healthcare Innovation about what CHAI is hoping to accomplish.
At MITRE, Anderson also is leading an effort to develop a common data model in oncology called mCODE (minimal Common Oncology Data Elements). The initiative is identifying cancer data elements that would be essential for analyzing treatment across electronic health records (EHRs) and cancer practices to improve quality and care coordination.
HCI: Can you describe the Coalition for Health AI and summarize some of the burgeoning issues that brought it together?
Anderson: A few of us got together about a year and a half ago — John Halamka from Mayo Clinic, Michael Pencina from Duke, myself from MITRE. Many people were coming to us asking: what are the best practices around development, implementation and maintenance of AI in healthcare? A quick survey of the landscape showed that there are more than 130 of these standards published out there. There's not a consensus around any shared set of principles. In such a highly consequential area as health, we thought it was really important to come together to begin to establish what some of these principles were.
We recognized that there is this whole spectrum of use cases and applications in AI from health systems to payers to technology vendors. We said let’s start trying to tackle this by establishing a set of shared principles that we essentially distilled down over the course of four or five roundtable sessions open to the public.
HCI: What would be a worst-case scenario if there isn’t a collaborative industry process of developing these guidelines and best practices? Is there a concern about overreach by regulators or that perhaps we'd see a lack of trust by the public?
Anderson: I have found that when you bring together the public sector, the private sector and the community, importantly, in CHAI and in AI in general, it's important to have, the targets of this AI have a voice at the table. We've intentionally included community activists and stakeholders from patient and family advocacy councils. It is really helpful to ensure that the industry understands what those best practice standards are, and that regulators understand what those best-practice standards are because oftentimes the best kind of innovative environment has the appropriate guardrails and regulations, and it's an informed relationship between industry and the public sector and community.
HCI: There is an organization called the Health AI Partnership that is publishing best-practice guides. Is there overlap or duplication of effort?
Anderson: It is not duplicative. We are actually collaborating with them. If you look at some of their roster, there is overlap. Duke and Mayo Clinic are on both. Ideally, if we do it right, it is going to be synergistic. What we're doing in CHAI is what I would almost call an implementation guide that would inform developers, implementers and governance committees. It would have metrics that would help enable developers to assure that the AI is mitigating bias appropriately, it's transparent enough, to manage data drift or performance drift. We're trying to be very specific and technical in nature. The Health AI Partnership is, I think, a little bit higher level than that, and more use case-specific.
HCI: Are the issues around large language models in healthcare different than the ones around the predictive machine learning models?
Anderson: Absolutely. As clinicians we have been trained on how to use these predictive models or expert system models in a classic clinical decision support use case, but generative models are something altogether different, and we are just starting to see what the use cases are. Some of them are in high-risk categories involving patient care. We need to be very careful in how we think about deploying these models in those use cases, because the way that you assure that a large language model is performing well is not the same way that you assure that a predictive model is performing well. How do you control for hallucination? No one really knows what that right framework is. We need to think very carefully about that.
HCI: The Blueprint document makes mention of setting up assurance labs and advisory service infrastructure. Can you describe a little bit about how that would work?
Anderson: The inspiration of this really comes from the AI Bill of Rights. There's a section in the AI Bill of Rights that the White House published that says these principles in the Bill of Rights should be evaluated and assured by independent, neutral third parties. What we envision is a set of assurance labs that would essentially be the evaluators of technology that's deployed and maintained in the health space to assure that it is adherent to those standards and performing appropriately. They would provide the services for things like bias remediation, or have monitoring dashboards that are publicly available to everyone.
HCI: Is there an analogy in another part of healthcare or even in some other industry to what those labs would be?
Anderson: Yes, if you're familiar with the CLIA lab certification process, or even outside of healthcare, the underwriting labs for electrical devices.
HCI: The Blueprint mentions registries of AI tools, analogous to Clinicaltrials.gov. How would that empower patients?
Anderson: I am in Boston. If I'm a patient of the Mass General Brigham healthcare system, I might be able to go to a set of monitoring dashboards of one of these assurance labs and see the models that are being deployed at MGB. How are they performing? How are they trained? What are the inputs that helped in the tuning of them? I might be more empowered as a patient to have a conversation with my provider about what models I want or do not want to have used on me, based on who I am and the unique attributes about me.
HCI: It seems like that would require a certain amount of sophistication and effort on the individuals’ part to go find that information and understand the different aspects of it.
Anderson: You're absolutely right. We don't have all this figured out yet. I mean, if you ask any usability expert, they'll tell you Clinicaltrials.gov is not the easiest thing to navigate. What we aspire to is that this stuff needs to be written in plain language in a way that a person who doesn’t have a Ph.D. can understand. At the end of the day, one of the questions that I as a patient would want to be able to ask is: is this model appropriate to use on me? That's the North Star question that we want to enable patients to answer.
HCI: Are there some tensions to be worked through in terms of competition, financial interests, intellectual property, and the idea of transparency? Did you hear feedback about these issues?
Anderson: Yes. One of the unique things about CHAI is we're trying to create a precompetitive space where these inherently competitive organizations can work together. We have Google and Microsoft right next to each other talking about what the standards are. We have multiple health systems that are in the same geographic region working together. But the idea of transparently displaying how one's models are performing — that is an uncomfortable thing to think about, but it's important not to lose sight that this is a highly consequential space, it's about patients.
Former U.S. Rep Barney Frank once said that government is something that we decide to do together. And I would say that society is clearly coming around to a consensus that we want to collectively understand and ensure that these things are performing safely and accurately. If you know that's the direction we're going to go in society — and I think the technology vendors are appreciating that that's the reality — it’s not going to be black boxes that are touted about performing really well — that's not going to cut it in healthcare.
HCI: The Blueprint also mentioned the potential for CHAI and the National Academy of Medicine to collaborate. What are they working on together?
Anderson: The National Academy is going to develop an initial code of conduct. We in CHAI are going to take that code of conduct, align that with whatever version of the Blueprint we're on, and then that will inform future iterations of these technically specific implementation guides. The feedback that we get from implementation teams is ideally going to feed back into points that need to be clarified in the code of conduct or additional elements that may have been missed. So we envision, ideally, a virtuous cycle where the code of conduct informs the technical implementation, and the actual implementation of it informs future iterations of the code of conduct. We have a set of committees that we’re standing up between our two organizations and we'll be looking to actually execute on that collaboration.
HCI: The Blueprint asked for feedback by May 5. What kinds of things did you hear?
Anderson: I would say in general it was very positive. Some of the challenging feedback that we got was that, right now, the Blueprint is very focused on health system implementation of technology, with a little bit of focus on payers. What we need more of is use cases for device manufacturers, as well as more use cases in the payer space. Right now, we are taking the Blueprint and building out the specific technical practice guides for the next six to 10 months.