Stanford University researchers have developed a foundation model that outperforms traditional methods across 16 cancers, enhancing personalised prognoses and treatment decisions.

From Stanford Medicine 14/01/25 (first released 08/01/25)

Illustration by Superinnovators x AI

The melding of visual information (microscopic and X-ray images, CT and MRI scans, for example) with text (exam notes, communications between physicians of varying specialties) is a key component of cancer care.

But while artificial intelligence helps doctors review images and home in on disease-associated anomalies like abnormally shaped cells, it’s been difficult to develop computerized models that can incorporate multiple types of data.

Now researchers at Stanford Medicine have developed an AI model able to incorporate visual and language-based information.

After training on 50 million medical images of standard pathology slides and more than 1 billion pathology-related texts, the model outperformed standard methods in its ability to predict the prognoses of thousands of people with diverse types of cancer, to identify which people with lung or gastroesophageal cancers are likely to benefit from immunotherapy, and to pinpoint people with melanoma who are most likely to experience a recurrence of their cancer.

The researchers named the model MUSK, for multimodal transformer with unified mask modeling.

MUSK represents a marked deviation from the way artificial intelligence is currently used in clinical care settings, and the researchers believe it stands to transform how artificial intelligence can guide patient care.

“MUSK can accurately predict the prognoses of people with many different kinds and stages of cancer,” said Ruijiang Li, MD, an associate professor of radiation oncology.

“We designed MUSK because, in clinical practice, physicians never rely on just one type of data to make clinical decisions.”

“We wanted to leverage multiple types of data to gain more insight and get more precise predictions about patient outcomes.”

Li, who is a member of the Stanford Cancer Institute, is the senior author of the study, which was published Jan. 8 in Nature.

Postdoctoral scholars Jinxi Xiang, PhD, and Xiyue Wang, PhD, are the lead authors of the research.

Although artificial intelligence tools have been increasingly used in the clinic, they have been primarily for diagnostics (does this microscope image or scan show signs of cancer?) rather than for prognosis (what is this person’s likely clinical outcome, and which therapy is most effective for an individual?).

Part of the challenge is the need to train the models on large amounts of labeled data (this is a microscope slide of a slice of lung tissue with a cancerous tumor, for example) and paired data (here are the clinical notes about the patient from whom the tumor was obtained).

But carefully curated and annotated datasets are hard to come by.

Off-the-shelf tool

In artificial intelligence terms, MUSK is what’s called a foundation model.

Foundation models pretrained on vast amounts of data can be customized with additional training to perform specific tasks.

Because the researchers designed MUSK to use unpaired multimodal data that doesn’t meet the traditional requirements for training artificial intelligence, the pool of data that the computer can use to “learn” during its initial training is expanded by several orders of magnitude.

With this head start, any subsequent training is accomplished with much smaller, more specialized sets of data.

In effect, MUSK is an off-the-shelf tool that doctors can fine-tune to help answer specific clinical questions.

“The biggest unmet clinical need is for models that physicians can use to guide patient treatment,” Li said.

“Does this patient need this drug?”

“Or should we instead focus on another type of therapy?”

“Currently, physicians use information like disease staging and specific genes or proteins to make these decisions, but that’s not always accurate.”

The researchers collected microscopic slides of tissue sections, the associated pathology reports and follow-up data (including how the patients fared) from the national database The Cancer Genome Atlas for people with 16 major types of cancer, including breast, lung, colorectal, pancreas, kidney, bladder, head and neck.

They used the information to train MUSK to predict disease-specific survival, or the percentage of people who have not died from a specific disease during a defined time period.

For all cancer types, MUSK accurately predicted the disease-specific survival of a patient 75% of the time.

In contrast, standard predictions based on a person’s cancer stage and other clinical risk factors were correct 64% of the time.

In another example, the researchers trained MUSK to use thousands of bits of information to predict which patients with cancers of the lung or of the gastric and esophageal tracts are most likely to benefit from immunotherapy.

“Currently, the major determination about whether to give a patient a particular type of immunotherapy rests on whether that person’s tumor expresses a protein called PD-L1,” Li said.

“That’s a biomarker made of just one protein.”

“In contrast, if we can use artificial intelligence to assess hundreds or thousands of bits of many types of data, including tissue imaging, as well as patient demographics, medical history, past treatments and laboratory tests gathered from clinical notes, we can much more accurately determine who might benefit.”

For non-small cell lung cancer, MUSK correctly identified patients who benefited from immunotherapy treatment about 77% of the time.

In contrast, the standard method of predicting immunotherapy response based on PD-L1 expression was correct only about 61% of the time.

Similar results were obtained when the researchers trained MUSK to identify which people with melanoma were most likely to relapse within five years after their initial treatment.

In this case the model was correct about 83% of the time, which is about 12% more accurate than the predictions generated by other foundation models.

“What’s unique about MUSK is the ability to incorporate unpaired multimodal data into pretraining, which substantially increases the scale of data compared with paired data required by other models,” Li said.

“We observed that for all clinical prediction tasks, models that integrate multiple types of data consistently outperform those based on imaging or text data alone.”

“Leveraging these types of unpaired multimodal data with artificial intelligence models like MUSK will be a major advance in the ability of artificial intelligence to aid doctors to improve patient care.”

Researchers from Harvard Medical School contributed to the work.

The study was funded by the National Institutes of Health (grants R01CA222512, R01CA233578, R01CA269599, R01CA285456, R01CA290715 and R01DE030894), and the Stanford Institute for Human-Centered Artificial Intelligence.

More info

Paper

You may also be curious about:

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to our weekly newsletter

Recieve the latest innovation, emerging tech, research, science and engineering news from Superinnovators.