NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PMF-GRN: a variational inference approach to single-cell gene regulatory network inference using probabilistic matrix factorization

https://doi.org/10.1186/s13059-024-03226-6

Skok Gibbs, Claudia; Mahmood, Omar; Bonneau, Richard; Cho, Kyunghyun (April 2024, Genome Biology)

Abstract Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.
more » « less
A Variational Inference Approach to Single-Cell Gene Regulatory Network Inference using Probabilistic Matrix Factorization

https://doi.org/10.1101/2022.09.09.507305

Mahmood, Omar; Skok Gibbs, Claudia; Bonneau, Richard; Cho, Kyunghyun (March 2023, ICML 2023)

Inferring gene regulatory networks (GRNs) from single-cell gene expression datasets is a challenging task. Existing methods are often designed heuristically for specific datasets and lack the flexibility to incorporate additional information or compare against other algorithms. Further, current GRN inference methods do not provide uncertainty estimates with respect to the interactions that they predict, making inferred networks challenging to interpret. To overcome these challenges, we introduce Probabilistic Matrix Factorization for Gene Regulatory Network inference (PMF-GRN). PMF-GRN uses single-cell gene expression data to learn latent factors representing transcription factor activity as well as regulatory relationships between transcription factors and their target genes. This approach incorporates available experimental evidence into prior distributions over latent factors and scales well to single-cell gene expression datasets. By utilizing variational inference, we facilitate hyperparameter search for principled model selection and direct comparison to other generative models. To assess the accuracy of our method, we evaluate PMF-GRN using the model organisms Saccharomyces cerevisiae and Bacillus subtilis, benchmarking against database-derived gold standard interactions. We discover that, on average, PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods. Moreover, our PMF-GRN approach offers well-calibrated uncertainty estimates, as it performs gene regulatory network (GRN) inference in a probabilistic setting. These estimates are valuable for validation purposes, particularly when validated interactions are limited or a gold standard is incomplete.
more » « less
Full Text Available
Coordinated control of neuronal differentiation and wiring by sustained transcription factors

https://doi.org/10.1126/science.add1884

Özel, Mehmet Neset; Gibbs, Claudia Skok; Holguera, Isabel; Soliman, Mennah; Bonneau, Richard; Desplan, Claude (December 2022, Science)

INTRODUCTION Neurons are by far the most diverse of all cell types in animals, to the extent that “cell types” in mammalian brains are still mostly heterogeneous groups, and there is no consensus definition of the term. The Drosophila optic lobes, with approximately 200 well-defined cell types, provides a tractable system with which to address the genetic basis of neuronal type diversity. We previously characterized the distinct developmental gene expression program of each of these types using single-cell RNA sequencing (scRNA-seq), with one-to-one correspondence to the known morphological types. RATIONALE The identity of fly neurons is determined by temporal and spatial patterning mechanisms in stem cell progenitors, but it remained unclear how these cell fate decisions are implemented and maintained in postmitotic neurons. It was proposed in Caenorhabditis elegans that unique combinations of terminal selector transcription factors (TFs) that are continuously expressed in each neuron control nearly all of its type-specific gene expression. This model implies that it should be possible to engineer predictable and complete switches of identity between different neurons just by modifying these sustained TFs. We aimed to test this prediction in the Drosophila visual system. RESULTS Here, we used our developmental scRNA-seq atlases to identify the potential terminal selector genes in all optic lobe neurons. We found unique combinations of, on average, 10 differentially expressed and stably maintained (across all stages of development) TFs in each neuron. Through genetic gain- and loss-of-function experiments in postmitotic neurons, we showed that modifications of these selector codes are sufficient to induce predictable switches of identity between various cell types. Combinations of terminal selectors jointly control both developmental (e.g., morphology) and functional (e.g., neurotransmitters and their receptors) features of neurons. The closely related Transmedullary 1 (Tm1), Tm2, Tm4, and Tm6 neurons (see the figure) share a similar code of terminal selectors, but can be distinguished from each other by three TFs that are continuously and specifically expressed in one of these cell types: Drgx in Tm1, Pdm3 in Tm2, and SoxN in Tm6. We showed that the removal of each of these selectors in these cell types reprograms them to the default Tm4 fate. We validated these conversions using both morphological features and molecular markers. In addition, we performed scRNA-seq to show that ectopic expression of pdm3 in Tm4 and Tm6 neurons converts them to neurons with transcriptomes that are nearly indistinguishable from that of wild-type Tm2 neurons. We also show that Drgx expression in Tm1 neurons is regulated by Klumpfuss, a TF expressed in stem cells that instructs this fate in progenitors, establishing a link between the regulatory programs that specify neuronal fates and those that implement them. We identified an intronic enhancer in the Drgx locus whose chromatin is specifically accessible in Tm1 neurons and in which Klu motifs are enriched. Genomic deletion of this region knocked down Drgx expression specifically in Tm1 neurons, leaving it intact in the other cell types that normally express it. We further validated this concept by demonstrating that ectopic expression of Vsx (visual system homeobox) genes in Mi15 neurons not only converts them morphologically to Dm2 neurons, but also leads to the loss of their aminergic identity. Our results suggest that selector combinations can be further sculpted by receptor tyrosine kinase signaling after neurogenesis, providing a potential mechanism for postmitotic plasticity of neuronal fates. Finally, we combined our transcriptomic datasets with previously generated chromatin accessibility datasets to understand the mechanisms that control brain wiring downstream of terminal selectors. We built predictive computational models of gene regulatory networks using the Inferelator framework. Experimental validations of these networks revealed how selectors interact with ecdysone-responsive TFs to activate a large and specific repertoire of cell surface proteins and other effectors in each neuron at the onset of synapse formation. We showed that these network models can be used to identify downstream effectors that mediate specific cellular decisions during circuit formation. For instance, reduced levels of cut expression in Tm2 neurons, because of its negative regulation by pdm3 , controls the synaptic layer targeting of their axons. Knockdown of cut in Tm1 neurons is sufficient to redirect their axons to the Tm2 layer in the lobula neuropil without affecting other morphological features. CONCLUSION Our results support a model in which neuronal type identity is primarily determined by a relatively simple code of continuously expressed terminal selector TFs in each cell type throughout development. Our results provide a unified framework of how specific fates are initiated and maintained in postmitotic neurons and open new avenues to understanding synaptic specificity through gene regulatory networks. The conservation of this regulatory logic in both C. elegans and Drosophila makes it likely that the terminal selector concept will also be useful in understanding and manipulating the neuronal diversity of mammalian brains. Terminal selectors enable predictive cell fate reprogramming. Tm1, Tm2, Tm4, and Tm6 neurons of the Drosophila visual system share a core set of TFs continuously expressed by each cell type (simplified). The default Tm4 fate is overridden by the expression of a single additional terminal selector to generate Tm1 ( Drgx ), Tm2 ( pdm3 ), or Tm6 ( SoxN ) fates.
more » « less
Full Text Available
Tuning a coiled-coil hydrogel via computational design of supramolecular fiber assembly

https://doi.org/10.1039/D2ME00153E

Britton, Dustin; Meleties, Michael; Liu, Chengliang; Jia, Sihan; Mahmoudinobar, Farbod; Renfrew, P. Douglas; Bonneau, Richard; Montclare, Jin Kim (February 2023, Molecular Systems Design & Engineering)

The previously reported Q is a thermoresponsive coiled-coil protein capable of higher-order supramolecular assembly into fibers and hydrogels with upper critical solution temperature (UCST) behavior. Here, we introduce a new coiled-coil protein that is redesigned to disfavor lateral growth of its fibers and thus achieve a higher crosslinking density within the formed hydrogel. We also introduce a favorable hydrophobic mutation to the pore of the coiled-coil domain for increased thermostability of the protein. We note that an increase in storage modulus of the hydrogel and crosslinking density is coupled with a decrease in fiber diameter. We further fully characterize our α-helical coiled-coil (Q2) hydrogel for its structure, nano-assembly, and rheology relative to our previous single domain protein, Q, over the time of its gelation demonstrating the nature of our hydrogel self-assembly system. In this vein, we also characterize the ability of Q2 to encapsulate the small hydrophobic small molecule, curcumin, and its impact on the mechanical properties of Q2. The design parameters here not only show the importance of electrostatic potential in self-assembly but also provide a step towards predictable design of electrostatic protein interactions.
more » « less
Full Text Available
OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

https://doi.org/10.1038/s41592-024-02272-z

Ahdritz, Gustaf; Bouatta, Nazim; Floristean, Christina; Kadyan, Sachin; Xia, Qinghui; Gerecke, William; O’Donnell, Timothy J; Berenberg, Daniel; Fisk, Ian; Zanichelli, Niccolò; et al (August 2024, Nature Methods)

Full Text Available
Evaluating the Conformations and Dynamics of Peptoid Macrocycles

https://doi.org/10.1021/acs.jpcb.2c01669

Eastwood, James R.; Jiang, Linhai; Bonneau, Richard; Kirshenbaum, Kent; Renfrew, P. Douglas (July 2022, The Journal of Physical Chemistry B)

Full Text Available
Fluorescent azobenzene-confined coiled-coil mesofibers

https://doi.org/10.1039/D2SM01578A

Punia, Kamia; Britton, Dustin; Hüll, Katharina; Yin, Liming; Wang, Yifei; Renfrew, P. Douglas; Gilchrist, M. Lane; Bonneau, Richard; Trauner, Dirk; Montclare, Jin K. (January 2023, Soft Matter)

Fluorescent protein biomaterials have important applications such as bioimaging in pharmacological studies. Self-assembly of proteins, especially into fibrils, is known to produce fluorescence in the blue band. Capable of self-assembly into nanofibers, we have shown we can modulate its aggregation into mesofibers by encapsulation of a small hydrophobic molecule. Conversely, azobenzenes are hydrophobic small molecules that are virtually non-fluorescent in solution due to their highly efficient photoisomerization. However, they demonstrate fluorogenic properties upon confinement in nanoscale assemblies by reducing the non-radiative photoisomerization. Here, we report the fluorescence of a hybrid protein-small molecule system in which azobenzene is confined in our protein assembly leading to fiber thickening and increased fluorescence. We show our engineered protein Q encapsulates AzoCholine, bearing a photoswitchable azobenzene moiety, in the hydrophobic pore to produce fluorescent mesofibers. This study further investigates the photocontrol of protein conformation as well as fluorescence of an azobenze-containing biomaterial.
more » « less
Full Text Available
Supramolecular Assembly and Small-Molecule Binding by Protein-Engineered Coiled-Coil Fibers

https://doi.org/10.1021/acs.biomac.2c01031

Britton, Dustin; Monkovic, Julia; Jia, Sihan; Liu, Chengliang; Mahmoudinobar, Farbod; Meleties, Michael; Renfrew, P. Douglas; Bonneau, Richard; Montclare, Jin Kim (October 2022, Biomacromolecules)

The ability to engineer a solvent-exposed surface of self-assembling coiled coils allows one to achieve a higher-order hierarchical assembly such as nano- or microfibers. Currently, these materials are being developed for a range of biomedical applications, including drug delivery systems; however, ways to mechanistically optimize the coiled-coil structure for drug binding are yet to be explored. Our laboratory has previously leveraged the functional properties of the naturally occurring cartilage oligomeric matrix protein coiled coil (C), not only for its favorable motif but also for the presence of a hydrophobic pore to allow for small molecule binding. This includes the development of Q, a rationally designed pentameric coiled coil derived from C. Here, we present a small library of protein microfibers derived from the parent sequences of C and Q bearing various electrostatic potentials with the aim to investigate the influence of higher-order assembly and encapsulation of candidate small molecule, curcumin. The supramolecular fiber size appears to be well-controlled by sequence-imbued electrostatic surface potential, and protein stability upon curcumin binding is well correlated to relative structure loss, which can be predicted by in silico docking.
more » « less
Full Text Available
NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity

https://doi.org/10.1093/bioinformatics/btab098

Barot, Meet; Gligorijević, Vladimir; Cho, Kyunghyun; Bonneau, Richard (February 2021, Bioinformatics)
Martelli, Pier Luigi (Ed.)
Abstract Motivation Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. Results In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. Availability and implementation The code is freely available at https://github.com/nowittynamesleft/NetQuilt. The data, including sequences, PPI networks and GO annotations are available at https://string-db.org/. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Masked graph modeling for molecule generation

https://doi.org/10.1038/s41467-021-23415-2

Mahmood, Omar; Mansimov, Elman; Bonneau, Richard; Cho, Kyunghyun (May 2021, Nature Communications)

Abstract De novo, in-silico design of molecules is a challenging problem with applications in drug discovery and material design. We introduce a masked graph model, which learns a distribution over graphs by capturing conditional distributions over unobserved nodes (atoms) and edges (bonds) given observed ones. We train and then sample from our model by iteratively masking and replacing different parts of initialized graphs. We evaluate our approach on the QM9 and ChEMBL datasets using the GuacaMol distribution-learning benchmark. We find that validity, KL-divergence and Fréchet ChemNet Distance scores are anti-correlated with novelty, and that we can trade off between these metrics more effectively than existing models. On distributional metrics, our model outperforms previously proposed graph-based approaches and is competitive with SMILES-based approaches. Finally, we show our model generates molecules with desired values of specified properties while maintaining physiochemical similarity to the training distribution.
more » « less

« Prev Next »

Search for: All records