NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN

https://doi.org/10.1038/s41592-024-02191-z

Rosen, Yanay; Brbić, Maria; Roohani, Yusuf; Swanson, Kyle; Li, Ziang; Leskovec, Jure (February 2024, Nature Methods)

Abstract Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes’ biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN can detect functionally related genes coexpressed across species, redefining differential expression for cross-species analysis. Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene functions between glaucoma-associated genes in humans and four other species.
more » « less
Full Text Available
Predicting transcriptional outcomes of novel multigene perturbations with GEARS

https://doi.org/10.1038/s41587-023-01905-6

Roohani, Yusuf; Huang, Kexin; Leskovec, Jure (August 2023, Nature Biotechnology)

Abstract Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. However, the combinatorial explosion in the number of possible multigene perturbations severely limits experimental interrogation. Here, we present graph-enhanced gene activation and repression simulator (GEARS), a method that integrates deep learning with a knowledge graph of gene–gene relationships to predict transcriptional responses to both single and multigene perturbations using single-cell RNA-sequencing data from perturbational screens. GEARS is able to predict outcomes of perturbing combinations consisting of genes that were never experimentally perturbed. GEARS exhibited 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen and identified the strongest interactions twice as well as prior approaches. Overall, GEARS can predict phenotypically distinct effects of multigene perturbations and thus guide the design of perturbational experiments.
more » « less
Full Text Available
Human mobility networks reveal increased segregation in large cities

https://doi.org/10.1038/s41586-023-06757-3

Nilforoshan, Hamed; Looi, Wenli; Pierson, Emma; Villanueva, Blanca; Fishman, Nic; Chen, Yiling; Sholar, John; Redbird, Beth; Grusky, David; Leskovec, Jure (December 2023, Nature)

Abstract A long-standing expectation is that large, dense and cosmopolitan areas support socioeconomic mixing and exposure among diverse individuals^1–6. Assessing this hypothesis has been difficult because previous measures of socioeconomic mixing have relied on static residential housing data rather than real-life exposures among people at work, in places of leisure and in home neighbourhoods^7,8. Here we develop a measure of exposure segregation that captures the socioeconomic diversity of these everyday encounters. Using mobile phone mobility data to represent 1.6 billion real-world exposures among 9.6 million people in the United States, we measure exposure segregation across 382 metropolitan statistical areas (MSAs) and 2,829 counties. We find that exposure segregation is 67% higher in the ten largest MSAs than in small MSAs with fewer than 100,000 residents. This means that, contrary to expectations, residents of large cosmopolitan areas have less exposure to a socioeconomically diverse range of individuals. Second, we find that the increased socioeconomic segregation in large cities arises because they offer a greater choice of differentiated spaces targeted to specific socioeconomic groups. Third, we find that this segregation-increasing effect is countered when a city’s hubs (such as shopping centres) are positioned to bridge diverse neighbourhoods and therefore attract people of all socioeconomic statuses. Our findings challenge a long-standing conjecture in human geography and highlight how urban design can both prevent and facilitate encounters among diverse individuals.
more » « less
Full Text Available
RELBENCH: A Benchmark for Deep Learning on Relational Databases

Robinson, Joshua; Ranjan, Rishabh; Hu, Weihua; Huang, Kexin; Han, Jiaqi; Dobles, Alejandro; Fey, Matthias; Lenssen, Jan E; Yuan, Yiwen; Zhang, Zecheng; et al (December 2024, Advances in neural information processing systems)

Free, publicly-accessible full text available December 10, 2025
STARK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

Wu, Shirley; Zhao, Shiyu; Yasunaga, Michihiro; Huang, Kexin; Cao, Kaidi; Huang, Qian; Ioannidis, Vassilis N; Subbian, Karthik; Zou, James; Leskovec, Jure (December 2024, Advances in neural information processing systems)

Free, publicly-accessible full text available December 10, 2025
Avatar: Optimizing llm agents for tool usage via contrastive reasoning

Wu, Shirley; Zhao, Shiyu; Huang, Qian; Huang, Kexin; Yasunaga, Michihiro; Cao, Kaidi; Ioannidis, Vassilis N; Subbian, Karthik; Leskovec, Jure; Zou, James (December 2024, Advances in neural information processing systems)

Free, publicly-accessible full text available December 10, 2025
Aligning target-aware molecule diffusion models with exact energy optimization

Gu, Siyi; Xu, Minkai; Powers, Alexander; Nie, Weili; Geffner, Tomas; Kreis, Karsten; Leskovec, Jure; Vahdat, Arash; Ermon, Stefano (December 2024, Advances in neural information processing systems)

Free, publicly-accessible full text available December 10, 2025
How to build the virtual cell with artificial intelligence: Priorities and opportunities

https://doi.org/10.1016/j.cell.2024.11.015

Bunne, Charlotte; Roohani, Yusuf; Rosen, Yanay; Gupta, Ankit; Zhang, Xikun; Roed, Marcel; Alexandrov, Theo; AlQuraishi, Mohammed; Brennan, Patricia; Burkhardt, Daniel B; et al (December 2024, Cell)

Free, publicly-accessible full text available December 1, 2025
Context-Aware Meta-Learning

Fifty, Christopher; Duan, Dennis; Junkins, Ronald; Amid, Ehsan; Leskovec, Jure; Ré, Christopher; Thrun, Sebastian (May 2024, International Conference on Learning Representations (ICLR))

Full Text Available
Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting

Chang, Serina; Koehler, Frederic; Qu, Zhaonan; Leskovec, Jure; Ugander, Johan (May 2024, Proceedings of Machine Learning Research)

A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn’s algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF’s parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.
more » « less
Full Text Available

« Prev Next »

Search for: All records