Study: AI Falls Short When Analyzing Data Across Multiple Health Systems

Nov. 7, 2018
Researchers studied whether deep learning models trained to detect pneumonia on chest X-rays from one hospital or one group of hospitals will work equally well at different hospitals.

Artificial intelligence (AI) tools and machine learning technologies hold the promise of transforming healthcare and there is ongoing discuss about how much of an impact AI and machine learning will have on the practice of medicine and on the business of healthcare overall.

In a recent study, researchers from New York City-based Mount Sinai Hospital and Icahn School of Medicine at Mount Sinai found that AI may fall short when analyzing data across multiple health systems. In conclusions, researchers noted that the study findings indicate healthcare organizations should carefully assess AI tools and their real-world performance. The study was published in a recent special issue of PLOS Medicine on machine learning and health care.

As interest in the use of computer system frameworks called convolutional neural networks (CNN) to analyze medical imaging and provide a computer-aided diagnosis grows, recent studies have suggested that AI image classification may not generalize to new data as well as commonly portrayed, the researchers wrote in a press release about the study.

Early results in using CNNs on X-rays to diagnose disease have been promising, but it has not yet been shown that models trained on X-rays from one hospital or one group of hospitals will work equally well at different hospitals, the researchers stated. Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize across a variety of hospital systems, according to the researchers.

The study is timely giving the interest in machine learning, particularly in the area of medical imaging. A survey from Reaction Data found that 84 percent of medical imaging professionals view the technology as being either important or extremely important in medical imaging. What’s more, about 20 percent of medical imaging professionals say they have already adopted machine learning, and about one-third say they will adopt it by 2020.

Breaking it down, 7 percent of respondents said they have just adopted some machine learning and 11 percent say they plan on adopting the technology in the next 12 months. Fourteen percent of respondents said their organizations have been using machine learning for a while. About a quarter of respondents say they plan to adopt machine learning by 2020, and another 25 percent said they are three or more years away from adopting it. Only 16 percent of medical imaging professionals say they have no plans to adopt machine learning.

That survey found that there has been very little adoption by imaging centers, and all of the current adopters are hospitals.

In this particular Mount Sinai study, researchers at the Icahn School of Medicine at Mount Sinai assessed how AI models identified pneumonia in 158,000 chest X-rays across three medical institutions: the National Institutes of Health; The Mount Sinai Hospital; and Indiana University Hospital. Researchers chose to study the diagnosis of pneumonia on chest X-rays for its common occurrence, clinical significance, and prevalence in the research community.

In three out of five comparisons, CNNs’ performance in diagnosing diseases on X-rays from hospitals outside of its own network was significantly lower than on X-rays from the original health system. However, CNNs were able to detect the hospital system where an X-ray was acquired with a high-degree of accuracy, and cheated at their predictive task based on the prevalence of pneumonia at the training institution, according to the study.

Researches concluded that AI tools trained to detect pneumonia on chest X-rays suffered significant decreases in performance when tested on data from outside health systems. What’s more, researchers noted that the difficulty of using deep learning models in medicine is that they use a massive number of parameters, making it challenging to identify specific variables driving predictions, such as the types of CT scanners used at a hospital and the resolution quality of imaging.

“The performance of CNNs in diagnosing diseases on X-rays may reflect not only their ability to identify disease-specific imaging findings on X-rays but also their ability to exploit confounding information,” the researchers wrote in the study. “Estimates of CNN performance based on test data from hospital systems used for model training may overstate their likely real-world performance.”

These findings suggest that artificial intelligence in the medical space must be carefully tested for performance across a wide range of populations; otherwise, the deep learning models may not perform as accurately as expected, the researches stated.

“Our findings should give pause to those considering rapid deployment of artificial intelligence platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed,” senior author Eric Oermann, M.D., instructor in Neurosurgery at the Icahn School of Medicine at Mount Sinai, said in a statement. “Deep learning models trained to perform medical diagnosis can generalize well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions.”

First author John Zech, a medical student at the Icahn School of Medicine at Mount Sinai, said, “If CNN systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios, and carefully assessed to determine how they impact accurate diagnosis.”

Sponsored Recommendations

The Healthcare Provider's Guide to Accelerating Clinician Onboarding

Improve clinician satisfaction and productivity to enhance patient care

ASK THE EXPERT: ServiceNow’s Erin Smithouser on what C-suite healthcare executives need to know about artificial intelligence

Generative artificial intelligence, also known as GenAI, learns from vast amounts of existing data and large language models to help healthcare organizations improve hospital ...

TEST: Ask the Expert: Is Your Patients' Understanding Putting You at Risk?

Effective health literacy in healthcare is essential for ensuring informed consent, reducing medical malpractice risks, and enhancing patient-provider communication. Unfortunately...

From Strategy to Action: The Power of Enterprise Value-Based Care

Ever wonder why your meticulously planned value-based care model hasn't moved beyond the concept stage? You're not alone! Transition from theory to practice with enterprise value...