Sort:  

The overall approach resembles the way ChatGPT and other popular “foundation” models work.

These systems use a set of training data to identify underlying rules, the grammar of language, and then apply those inferred rules to new situations.

“Here it’s exactly the same thing: we learn the grammar in many different cellular states, and then we go into a particular condition—

it can be a diseased or it can be a normal cell type—and we can try to see how well we predict patterns from this information,” says Rabadan.

Fu and Rabadan soon enlisted a team of collaborators, including co-first authors Alejandro Buendia, now a Stanford PhD student formerly in the Rabadan lab, and Shentong Mo of Carnegie Mellon, to train and test the new model.

After training on data from more than 1.3 million human cells, the system became accurate enough to predict gene expression in cell types it had never seen, yielding results that agreed closely with experimental data.

New AI methods reveal drivers of a pediatric cancer

Next, the investigators showed the power of their AI system when they asked it to uncover still hidden biology of diseased cells, in this case, an inherited form of pediatric leukemia.

“These kids inherit a gene that is mutated, and it was unclear exactly what it is these mutations are doing,” says Rabadan, who also co-directs the

cancer genomics and epigenomics research program at Columbia’s Herbert Irving Comprehensive Cancer Center.

With AI, the researchers predicted that the mutations disrupt the interaction between two different transcription factors that determine the fate of leukemic cells.

Laboratory experiments confirmed AI’s prediction. Understanding the effect of these mutations uncovers specific mechanisms that drive this disease.

AI could reveal “dark matter” in genome