For Lung Cancer Diagnosis, Machine Learning can Prove More Accurate than Pathologists, Research Finds

Computers, not pathologists, can prove to be more accurate when it comes to assessing slides of lung cancer tissues, according to a new study by researchers at the Stanford University School of Medicine.

The researchers found that a machine learning approach to identifying critical disease-related features accurately differentiated between two types of lung cancers and predicted patient survival times better than the standard approach of pathologists classifying tumors by grade and stage.

“Pathology as it is practiced now is very subjective,” said Michael Snyder, Ph.D., professor and chair of genetics. “Two highly skilled pathologists assessing the same slide will agree only about 60 percent of the time. This approach replaces this subjectivity with sophisticated, quantitative measurements that we feel are likely to improve patient outcomes.”

The research was published Aug. 16 in Nature Communications. Snyder, who directs the Stanford Center for Genomics and Personalized Medicine, shares senior authorship of the study with Daniel Rubin, M.D., assistant professor of radiology and of medicine. Graduate student Kun-Hsing Yu, M.D., is the lead author of the study.

As explained by the Stanford Medicine News Center, for decades, pathologists have assessed the severity, or “grade,” of cancer by using a light microscope to examine thin cross-sections of tumor tissue mounted on glass slides. The more abnormal the tumor tissue appeared—in terms of cell size and shape, among other indicators—the higher the grade. A stage is also assigned based on whether and where the cancer has spread throughout the body.

Often a cancer’s grade and stage can be used to predict how the patient will fare. They also can help clinicians decide how, and how aggressively, to treat the disease. This classification system doesn’t always work well for lung cancer, however. In particular, the lung cancer subtypes of adenocarcinoma and squamous cell carcinoma can be difficult to tell apart when examining tissue culture slides. Furthermore, the stage and grade of a patient’s cancer doesn’t always correlate with their prognosis, which can vary widely. Fifty percent of stage-1 adenocarcinoma patients, for example, die within five years of their diagnosis, while about 15 percent survive more than 10 years.

As such, the researchers used 2,186 images from a national database called the Cancer Genome Atlas obtained from patients with either adenocarcinoma or squamous cell carcinoma. The database also contained information about the grade and stage assigned to each cancer and how long each patient lived after diagnosis.

The researchers then used the images to “train” a computer software program to identify many more cancer-specific characteristics than can be detected by the human eye—nearly 10,000 individual traits, versus the several hundred usually assessed by pathologists. These characteristics included not just cell size and shape, but also the shape and texture of the cells’ nuclei and the spatial relations among neighboring tumor cells.

“We began the study without any preconceived ideas, and we let the software determine which characteristics are important,” said Snyder. “In hindsight, everything makes sense. And the computers can assess even tiny differences across thousands of samples many times more accurately and rapidly than a human.”

The researchers homed in on a subset of cellular characteristics identified by the software that could best be used to differentiate tumor cells from the surrounding noncancerous tissue, identify the cancer subtype, and predict how long each patient would survive after diagnosis. They then validated the ability of the software to accurately distinguish short-term survivors from those who lived significantly longer on another dataset of 294 lung cancer patients from the Stanford Tissue Microarray Database.

Snyder anticipates that the machine-learning system described in this study will be able to complement the emerging fields of cancer genomics, transcriptomics and proteomics. Cancer researchers in these fields study the DNA mutations and the gene and protein expression patterns that lead to disease.

Although the current study focused on lung cancer, the researchers believe that a similar approach could be used for many other types of cancer. “Ultimately this technique will give us insight into the molecular mechanisms of cancer by connecting important pathological features with outcome data,” said Snyder.