Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Abstract MotivationMultiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for. ResultsHere, we implement a smooth and differentiable version of the Smith–Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood. Availability and implementationOur code and examples are available at: https://github.com/spetti/SMURF. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
- 
            Abstract Condensation by phase separation has recently emerged as a mechanism underlying many nuclear compartments essential for cellular functions. Nuclear condensates enrich nucleic acids and proteins, localize to specific genomic regions, and often promote gene expression. How diverse properties of nuclear condensates are shaped by gene organization and activity is poorly understood. Here, we develop a physics-based model to interrogate how spatially-varying transcription activity impacts condensate properties and dynamics. Our model predicts that spatial clustering of active genes can enable precise localization and de novo nucleation of condensates. Strong clustering and high activity results in aspherical condensate morphologies. Condensates can flow towards distant gene clusters and competition between multiple clusters lead to stretched morphologies and activity-dependent repositioning. Overall, our model predicts and recapitulates morphological and dynamical features of diverse nuclear condensates and offers a unified mechanistic framework to study the interplay between non-equilibrium processes, spatially-varying transcription, and multicomponent condensates in cell biology.more » « less
- 
            Abstract Correlation among multiple phenotypes across related individuals may reflect some pattern of shared genetic architecture: individual genetic loci affect multiple phenotypes (an effect known as pleiotropy), creating observable relationships between phenotypes. A natural hypothesis is that pleiotropic effects reflect a relatively small set of common “core” cellular processes: each genetic locus affects one or a few core processes, and these core processes in turn determine the observed phenotypes. Here, we propose a method to infer such structure in genotype–phenotype data. Our approach, sparse structure discovery (SSD) is based on a penalized matrix decomposition designed to identify latent structure that is low-dimensional (many fewer core processes than phenotypes and genetic loci), locus-sparse (each locus affects few core processes), and/or phenotype-sparse (each phenotype is influenced by few core processes). Our use of sparsity as a guide in the matrix decomposition is motivated by the results of a novel empirical test indicating evidence of sparse structure in several recent genotype–phenotype datasets. First, we use synthetic data to show that our SSD approach can accurately recover core processes if each genetic locus affects few core processes or if each phenotype is affected by few core processes. Next, we apply the method to three datasets spanning adaptive mutations in yeast, genotoxin robustness assay in human cell lines, and genetic loci identified from a yeast cross, and evaluate the biological plausibility of the core process identified. More generally, we propose sparsity as a guiding prior for resolving latent structure in empirical genotype–phenotype maps.more » « less
- 
            Abstract The blastoderm is a broadly conserved stage of early animal development, wherein cells form a layer at the embryo’s periphery. The cellular behaviors underlying blastoderm formation are varied and poorly understood. In most insects, the pre-blastoderm embryo is a syncytium: nuclei divide and move throughout the shared cytoplasm, ultimately reaching the cortex. InDrosophila melanogaster, some early nuclear movements result from pulsed cytoplasmic flows that are coupled to synchronous divisions. Here, we show that the cricketGryllus bimaculatushas a different solution to the problem of creating a blastoderm. We quantified nuclear dynamics during blastoderm formation inG. bimaculatusembryos, finding that: (1) cytoplasmic flows are unimportant for nuclear movement, and (2) division cycles, nuclear speeds, and the directions of nuclear movement are not synchronized, instead being heterogeneous in space and time. Moreover, nuclear divisions and movements co-vary with local nuclear density. We show that several previously proposed models for nuclear movements inD. melanogastercannot explain the dynamics ofG. bimaculatusnuclei. We introduce a geometric model based on asymmetric pulling forces on nuclei, which recapitulates the patterns of nuclear speeds and orientations of both unperturbedG. bimaculatusembryos, and of embryos physically manipulated to have atypical nuclear densities.more » « less
- 
            Abstract As the SARS-CoV-2 pandemic is rapidly progressing, the need for the development of an effective vaccine is critical. A promising approach for vaccine development is to generate, through codon pair deoptimization, an attenuated virus. This approach carries the advantage that it only requires limited knowledge specific to the virus in question, other than its genome sequence. Therefore, it is well suited for emerging viruses, for which we may not have extensive data. We performed comprehensive in silico analyses of several features of SARS-CoV-2 genomic sequence (e.g., codon usage, codon pair usage, dinucleotide/junction dinucleotide usage, RNA structure around the frameshift region) in comparison with other members of the coronaviridae family of viruses, the overall human genome, and the transcriptome of specific human tissues such as lung, which are primarily targeted by the virus. Our analysis identified the spike (S) and nucleocapsid (N) proteins as promising targets for deoptimization and suggests a roadmap for SARS-CoV-2 vaccine development, which can be generalizable to other viruses.more » « less
- 
            Abstract Droplet‐based single cell sequencing technologies, such as inDrop, Drop‐seq, and 10X Genomics, are catalyzing a revolution in the understanding of biology. Barcoding beads are key components for these technologies. What is limiting today are barcoding beads that are easy to fabricate, can efficiently deliver primers into drops, and thus achieve high detection efficiency. Here, this work reports an approach to fabricate dissolvable polyacrylamide beads, by crosslinking acrylamide with disulfide bridges that can be cleaved with dithiothreitol. The beads can be rapidly dissolved in drops and release DNA barcode primers. The dissolvable beads are easy to synthesize, and the primer cost for the beads is significantly lower than that for the previous barcoding beads. Furthermore, the dissolvable beads can be loaded into drops with >95% loading efficiency of a single bead per drop and the dissolution of beads does not influence reverse transcription or the polymerase chain reaction (PCR) in drops. Based on this approach, the dissolvable beads are used for single cell RNA and protein analysis.more » « less
- 
            Enzymes catalyze biochemical reactions through precise positioning of substrates, cofactors, and amino acids to modulate the transition-state free energy. However, the role of conformational dynamics remains poorly understood due to poor experimental access. This shortcoming is evident withEscherichia colidihydrofolate reductase (DHFR), a model system for the role of protein dynamics in catalysis, for which it is unknown how the enzyme regulates the different active site environments required to facilitate proton and hydride transfer. Here, we describe ligand-, temperature-, and electric-field-based perturbations during X-ray diffraction experiments to map the conformational dynamics of the Michaelis complex of DHFR. We resolve coupled global and local motions and find that these motions are engaged by the protonated substrate to promote efficient catalysis. This result suggests a fundamental design principle for multistep enzymes in which pre-existing dynamics enable intermediates to drive rapid electrostatic reorganization to facilitate subsequent chemical steps.more » « less
- 
            Two robust rules have been discovered about animal hybrids: Heterogametic hybrids are more unfit (Haldane’s rule), and sex chromosomes are disproportionately involved in hybrid incompatibility (the large-X/Z effect). The exact mechanisms causing these rules in female heterogametic taxa such as butterflies are unknown but are suggested by theory to involve dominance on the sex chromosome. We investigate hybrid incompatibilities adhering to both rules inPapilioandHeliconiusbutterflies and show that dominance theory cannot explain our data. Instead, many defects coincide with unbalanced multilocus introgression between the Z chromosome and all autosomes. Our polygenic explanation predicts both rules because the imbalance is likely greater in heterogametic females, and the proportion of introgressed ancestry is more variable on the Z chromosome. We also show that mapping traits polygenic on a single chromosome in backcrosses can generate spurious large-effect QTLs. This mirage is caused by statistical linkage among polygenes that inflates estimated effect sizes. By controlling for statistical linkage, most incompatibility QTLs in our hybrid crosses are consistent with a polygenic basis. Since the two genera are very distantly related, polygenic hybrid incompatibilities are likely common in butterflies.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
