Why are racial biases baked into machine learning algorithms used in population health management and what can be done about it? In a recent presentation, Ziad Obermeyer, M.D., the Blue Cross of California Distinguished Associate Professor of Health Policy and Management at the University of California, Berkeley, explained the problem and suggested some solutions.
Obermeyer is a physician and researcher who works at the intersection of machine learning and health. He has experience auditing algorithms for bias. Speaking during a recent panel discussion of the Alliance for Health Policy, he gave a concrete example of a problematic algorithm that has some far-reaching lessons.
He started out by noting that population health efforts are looking for people who are going to get sick in the future, have poor outcomes and cost the health systems a lot of money.
“You’ve got a haystack, and you need to find needles. Algorithms are great at looking into the future,” he said. “They know what you're going to buy on Amazon, and they know how you're going to rate that movie you just watched on Netflix. And that same set of tools can be used to find patients who are going to get sick.” Algorithms are used ubiquitously for this.
Obermeyer told a story about auditing a piece of software sold to a lot of health systems and that's used for 70 million people every year to make an important decision about who gets help. The majority of the U.S. population is being screened through one of these types of algorithms, he noted. “That algorithm score is determining whether you get access to extra help with your health today to prevent these chronic illnesses,” he explained. “We were interested in whether those algorithms were biased. What we found was that these algorithms as a family, but in particular the product we studied, were extremely racially biased. It was prioritizing healthier White patients ahead of sicker Black patients for extra help with their care.”
“As horrifying as that was us, it was very important to get to the bottom of what was going on and figure out where the algorithm was going wrong,” Obermeyer said. “Algorithms are very literal and need to predict a single variable in a data set for a patient. The choice that a lot of people make, both at the company whose product we studied, but at lots of companies in academic groups, and in parts of the federal government, is to measure someone's healthcare needs with their healthcare costs. “
Obermeyer noted that this approach is not unreasonable, because in general, when people get sick, they do go to the hospital and generate costs. “The problem is that not everybody who needs healthcare gets healthcare in this country, and in many other countries. When there's a wedge between needing healthcare and getting healthcare, you get lower costs for people whose costs should be higher, because of lack of access to the healthcare system because of racism and discrimination and how they're treated by that healthcare system,” he said. The algorithm was trained to predict costs, and it saw very accurately that Black patients cost less, and that's what it predicted. So that's where the bias came from. It was the choice to use cost as a proxy for health, when cost is a biased proxy for health.”
Obermeyer listed three lessons from that story:
• We often get angry at algorithms, he said, but the algorithms are just doing what we tell them to do. “We should be getting angry at ourselves for telling the algorithm to do something that encoded racial bias and other kinds of bias,” Obermeyer stressed. “The way we teach algorithms is an instantiation of our values, of what we think is important. When we measure health with healthcare costs, we're teaching the algorithm that the only people whose health needs matter are the people who go in to see a doctor and get care for those health needs.”
Neither we nor the algorithms ever see the needs that never get met and that don't generate dollars. This is true in other places in healthcare as well, he said, pointing to the way that the Cares Act provider relief distributed funding to hospitals. “We actually gave the money out to the hospitals proportional to hospital revenue. What value does that instantiate? Well, hospitals that make more money get more money. And that was our choice, as far as how to distribute these billions and billions of dollars of funding in the midst of a terrible pandemic that was already exacerbating inequalities,” Obermeyer said. “This is not a story about algorithms. Algorithms are just doing what we tell them to do.”
• How we regulate algorithms is an important issue. “The algorithm that we studied was able to get to huge scale, affecting tens of millions of people every year, and no one caught it — not the manufacturer of the algorithm, not the hospitals who bought it, not the doctors or patients who applied it and were affected by it,” Obermeyer said. “There is a clear case for regulation, but we don't have a clear vocabulary for how to regulate algorithms.” Borrowing an example from Sen. Elizabeth Warren he said that when we regulate a toaster oven, we know that that toaster oven should not explode. When we regulate a drug, we know that that drug needs to do more good than harm. What do we regulate when we're regulating an algorithm?
Obermeyer offered a few suggestions about how we should be regulating algorithms.
The algorithm is supposed to affect the decision of who gets help for their health needs. What should the algorithm be providing to that decision maker? “It should be providing them information about someone's health needs. Once we've articulated that, we can say an unbiased algorithm would be providing information in an unbiased way about health needs to the decision maker,” he said. “We can look at the algorithm score, and then we can look at the health needs for Black patients, White patients, poor patients, rich, Asians, whoever we're interested in, and we can hold the algorithm accountable on one simple metric. We don't need to open the black box of the algorithm any more than we need to open the black box of the toaster oven or the drug that's being approved. We just need to know is it doing what it's supposed to be doing? And is it doing that equitably for Black and White patients alike or for any groups that we're interested in?”
• Obermeyer is optimistic about fixing this problem. “Once you articulate what the algorithm should be doing, not only can you hold the algorithm accountable for doing that accurately and equitably, but you can also retrain the algorithm to provide that information,” he said. In that original case study, he and colleagues actually worked with the company that made that algorithm and created a revised version of the algorithm that predicted not someone's costs, but actually a variety of measures of their health needs. “In doing so, we reduced the bias in that algorithm by 84 percent. We found similar results in lots of different settings, with different partners in the healthcare sector, but also outside of health,” he said. “I think that this is the way to think about what should the algorithm be doing. Is it doing that accurately and equitably is really for me the key question that's driven my understanding of algorithms and how to think about them. We're also applying and our work with multiple state attorneys general, as well as in conversations with federal regulators at the FTC and other places that are charged with consumer protection.”