skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Ellington, Andrew"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies. 
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  2. Free, publicly-accessible full text available February 1, 2026
  3. RNA oxidation, predominantly through the accumulation of 8-oxo-7,8-dihydroguanosine (8-oxo-rG), represents an important biomarker for cellular oxidative stress. Polynucleotide phosphorylase (PNPase) is a 3′-5′ exoribonuclease that has been shown to preferentially recognize 8-oxo-rG-containing RNA and protect Escherichia coli cells from oxidative stress. However, the impact of 8-oxo-rG on PNPase-mediated RNA degradation has not been studied. Here, we show that the presence of 8-oxo-rG in RNA leads to catalytic stalling of E. coli PNPase through in vitro RNA degradation experiments and electrophoretic analysis. We also link this stalling to the active site of the enzyme through resolution of single-particle cryo-EM structures for PNPase in complex with singly or doubly oxidized RNA oligonucleotides. Following identification of Arg399 as a key residue in recognition of both single and sequential 8-oxo-rG nucleotides, we perform follow-up in vitro analysis to confirm the importance of this residue in 8-oxo-rG-specific PNPase stalling. Finally, we investigate the effects of mutations to active site residues implicated in 8-oxo-rG binding through E. coli cell growth experiments under H2O2-induced oxidative stress. Specifically, Arg399 mutations show significant effects on cell growth under oxidative stress. Overall, we demonstrate that 8-oxo-rG-specific stalling of PNPase is relevant to bacterial survival under oxidative stress and speculate that this enzyme might associate with other cellular factors to mediate this stress. 
    more » « less
    Free, publicly-accessible full text available November 12, 2025
  4. DNA is an incredibly dense storage medium for digital data. However, computing on the stored information is expensive and slow, requiring rounds of sequencing, in silico computation, and DNA synthesis. Prior work on accessing and modifying data using DNA hybridization or enzymatic reactions had limited computation capabilities. Inspired by the computational power of “DNA strand displacement,” we augment DNA storage with “in-memory” molecular computation using strand displacement reactions to algorithmically modify data in a parallel manner. We show programs for binary counting and Turing universal cellular automaton Rule 110, the latter of which is, in principle, capable of implementing any computer algorithm. Information is stored in the nicks of DNA, and a secondary sequence-level encoding allows high-throughput sequencing-based readout. We conducted multiple rounds of computation on 4-bit data registers, as well as random access of data (selective access and erasure). We demonstrate that large strand displacement cascades with 244 distinct strand exchanges (sequential and in parallel) can use naturally occurring DNA sequence from M13 bacteriophage without stringent sequence design, which has the potential to improve the scale of computation and decrease cost. Our work merges DNA storage and DNA computing, setting the foundation of entirely molecular algorithms for parallel manipulation of digital information preserved in DNA.< 
    more » « less
  5. The molecular basis of protein thermal stability is only partially understood and has major significance for drug and vaccine discovery. The lack of datasets and standardized benchmarks considerably limits learning-based discovery methods. We present \texttt{HotProtein}, a large-scale protein dataset with \textit{growth temperature} annotations of thermostability, containing K amino acid sequences and K folded structures from different species with a wide temperature range. Due to functional domain differences and data scarcity within each species, existing methods fail to generalize well on our dataset. We address this problem through a novel learning framework, consisting of () Protein structure-aware pre-training (SAP) which leverages 3D information to enhance sequence-based pre-training; () Factorized sparse tuning (FST) that utilizes low-rank and sparse priors as an implicit regularization, together with feature augmentations. Extensive empirical studies demonstrate that our framework improves thermostability prediction compared to other deep learning models. Finally, we introduce a novel editing algorithm to efficiently generate positive amino acid mutations that improve thermostability. Codes are available in https://github.com/VITA-Group/HotProtein. 
    more » « less
  6. We demonstrate in vitro incorporation of cyclic β-amino acids into peptides by the ribosome through genetic code reprogramming. Further, we show that incorporation efficiency can be increased through the addition of elongation factor P. 
    more » « less