Truveta Trains Large-Language Model on EHR Data
There has been a lot of buzz in healthcare recently about the potential of large language models such as OpenAI's GPT-3. Truveta, a big-data company with 28 large health system members, has developed its own Truveta Language Model (TLM), a large-language, multi-modal AI model for transforming electronic health record data for research on patient outcomes.
The company’s health system members, including Providence, Advocate Health, Trinity Health, Tenet Healthcare, Northwell Health and AdventHealth, provide 16 percent of patient care in the United States in more than 20,000 clinics and 700 hospitals. De-identified data from this care is provided to Truveta daily.
Truveta said that TLM’s healthcare expertise is trained on the largest collection of complete medical records representing the full diversity of the United States. It is the first large-language model specifically designed to empower researchers to accurately study patient care and outcomes.
In a video presentation, Truveta executives explained what makes the model unique. “From my experience, any AI model is only as good as the data that it's trained on,” said Jay Nanduri, Truveta chief technology officer. “Thanks to our members, Truveta has the most diverse and unprecedented data set in the healthcare domain. And when we train our models with this data in a de-identified and responsible way, it can yield unimaginable results.”
“The Truveta Language Model is a large language model for multimodal data that enables us to pull the rich content from the electronic health record, normalize that data, and make it super clean and useful for research at scale,” said Lisa Gurry, Truveta’s chief operating officer.
Truveta executives said that general language models are inaccurate within the medical domain because they are trained on limited non-diverse public data.
“Accurate AI requires advanced technology that's trained over volumes of clinical data labeled by medical experts. Because the Truveta Language Model has been trained on billions of data points from electronic health records by clinical experts, it's already outperforming humans on making this data useful,” said Michael Lucas, principal machine learning engineer at Truveta.
Truveta data combines this EHR data with social drivers of health (SDOH) data, claims, and mortality data for research. Using this data, Truveta’s clinical expert annotators label tens of thousands of raw clinical terms to train TLM to normalize healthcare data for clinical accuracy, and then check the results of the model as it runs.
In a blog post, Truveta CEO Terry Myerson said that with TLM, Truveta’s community of healthcare and life science customers are studying concepts previously inaccessible in messy clinician notes but now structured for analytics, such as seizure frequency, changes in treatment regimen, and adverse reactions to medication.
“By using clinical expert-led AI to unlock the power of rich healthcare data,” Myerson wrote, “researchers can now ask and answer complex medical questions of a real-time, fully transparent view of U.S. health.”