You are viewing a single comment's thread from:

RE: LeoThread 2025-01-08 11:41

in LeoFinance16 days ago

AI Reveals Gene Activity in Human Cells - Neuroscience News
by Neuroscience News

Artificial Intelligence / 2025-01-08 18:37

Sort:  

Summary: Researchers have developed an AI model that accurately predicts gene activity in any human cell, providing insights into cellular functions and disease mechanisms.

Trained on data from over 1.3 million cells, the model can predict gene expression in unseen cell types with high accuracy.

It has already uncovered mechanisms driving a pediatric leukemia and may help explore the genome’s “dark matter,” where most cancer mutations occur.

Key Facts

AI and Gene Activity: The AI model predicts gene expression in unseen cell types using genomic and expression data, enabling insights into cellular functions.

Pediatric Cancer Discovery: The system identified how specific mutations disrupt transcription factors in inherited pediatric leukemia, confirmed by lab experiments.

Exploring Genome “Dark Matter”: The model offers tools to study non-coding genome regions, illuminating the role of unexplored mutations in cancer and disease.

Using a new artificial intelligence method, researchers at Columbia University Vagelos College of Physicians and Surgeons can accurately predict the activity of genes within any human cell, essentially revealing

the cell’s inner mechanisms.

The system, described in the current issue of Nature, could transform the way scientists work to understand everything from cancer to genetic diseases.

“Predictive generalizable computational models allow to uncover biological processes in a fast and accurate way.

These methods can effectively conduct large-scale computational experiments, boosting and guiding traditional experimental approaches,” says Raul Rabadan, professor of systems biology and senior author of the new paper.

Traditional research methods in biology are good at revealing how cells perform their jobs or react to disturbances. But they cannot make predictions about how cells work or how cells will react to change, like a cancer-causing mutation.

“Having the ability to accurately predict a cell’s activities would transform our understanding of fundamental biological processes,” Rabadan says.

“It would turn biology from a science that describes seemingly random processes into one that can predict the underlying systems that govern cell behavior.”

In recent years, the accumulation of massive amounts of data from cells and more powerful AI models are starting to transform biology into a more predictive science.

The 2024 Nobel Prize in Chemistry was awarded to researchers for their groundbreaking work in using AI to predict protein structures.

But the use of AI methods to predict the activities of genes and proteins inside cells has proven more difficult.

New AI method predicts gene expression in any cell

In the new study, Rabadan and his colleagues tried to use AI to predict which genes are active within specific cells

Such information about gene expression can tell researchers the identity of the cell and how the cell performs its functions.

“Previous models have been trained on data in particular cell types, usually cancer cell lines or something else that has little resemblance to normal cells,” Rabadan says.

Xi Fu, a graduate student in Rabadan’s lab, decided to take a different approach, training a machine learning model on gene expression data from millions of cells obtained from normal human tissues.

The inputs consisted of genome sequences and data showing which parts of the genome are accessible and expressed.

The overall approach resembles the way ChatGPT and other popular “foundation” models work.

These systems use a set of training data to identify underlying rules, the grammar of language, and then apply those inferred rules to new situations.

“Here it’s exactly the same thing: we learn the grammar in many different cellular states, and then we go into a particular condition—

it can be a diseased or it can be a normal cell type—and we can try to see how well we predict patterns from this information,” says Rabadan.

Fu and Rabadan soon enlisted a team of collaborators, including co-first authors Alejandro Buendia, now a Stanford PhD student formerly in the Rabadan lab, and Shentong Mo of Carnegie Mellon, to train and test the new model.

After training on data from more than 1.3 million human cells, the system became accurate enough to predict gene expression in cell types it had never seen, yielding results that agreed closely with experimental data.

New AI methods reveal drivers of a pediatric cancer

Next, the investigators showed the power of their AI system when they asked it to uncover still hidden biology of diseased cells, in this case, an inherited form of pediatric leukemia.

“These kids inherit a gene that is mutated, and it was unclear exactly what it is these mutations are doing,” says Rabadan, who also co-directs the

cancer genomics and epigenomics research program at Columbia’s Herbert Irving Comprehensive Cancer Center.

With AI, the researchers predicted that the mutations disrupt the interaction between two different transcription factors that determine the fate of leukemic cells.

Laboratory experiments confirmed AI’s prediction. Understanding the effect of these mutations uncovers specific mechanisms that drive this disease.

AI could reveal “dark matter” in genome

The new computational methods should also allow researchers to start exploring the role of genome’s “dark matter”—

a term borrowed from cosmology that refers to the vast majority of the genome, which does not encode known genes—in cancer and other diseases.

“The vast majority of mutations found in cancer patients are in so-called dark regions of the genome. These mutations do not affect the function of a protein and have remained mostly unexplored. says Rabadan.

“The idea is that using these models, we can look at mutations and illuminate that part of the genome.”

Already, Rabadan is working with researchers at Columbia and other universities, exploring different cancers, from brain to blood cancers, learning the grammar of regulation in normal cells,

and how cells change in the process of cancer development.

The work also opens new avenues for understanding many diseases beyond cancer and potentially identifying targets for new treatments

By presenting novel mutations to the computer model, researchers can now gain deep insights and predictions about exactly how those mutations affect a cell.

Coming on the heels of other recent advances in artificial intelligence for biology, Rabadan sees the work as part of a major trend:

It’s really a new era in biology that is extremely exciting; transforming biology into a predictive science.”

The paper, titled “A foundational model of transcription across human cell types,” was published Jan. 8 in Nature.

Authors (all from Columbia except where noted): Xi Fu, Shentong Mo (Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi,

UAE, and Carnegie Mellon University, Pittsburgh, PA), Alejandro Buendia, Anouchka P. Laurent, Anqi Shao, Maria del Mar Alvarez-Torres, Tianji Yu, Jimin Tan (New York University Grossman School of Medicine,

New York, NY), Jiayu Su, Romella Sagatelian, Adolfo A. Ferrando (Columbia and Regeneron, Tarrytown, NY), Alberto Ciccia, Yanyan Lan (Tsinghua University, Beijing, China),

David M. Owens Teresa Palomero, Eric P. Xing (Mohamed bin Zayed University of Artificial Intelligence and Carnegie Mellon University), and Raul Rabadan.

Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes.

Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions.

Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types.

Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types.

GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions,

and uncovers universal and cell-type-specific transcription factor interaction networks.

We evaluated its performance in prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors

and found that it outperforms current models in predicting lentivirus-based massively parallel reporter assay readout.

In fetal erythroblasts, we identified distal (greater than 1 Mbp) regulatory regions that were missed by previous models, and, in B cells, we identified a lymphocyte-specific

transcription factor–transcription factor interaction that explains the functional significance of a leukaemia risk predisposing germline mutation.

In sum, we provide a generalizable and accurate model for transcription together with catalogues of gene regulation and transcription factor interactions, all with cell type specificity.