Antibodies are important biomolecules that are often designed to recognize target antigens. However, they are expensive to produce and their relatively large size prevents their transport across lipid membranes. An alternative to antibodies is aptamers, short ([Formula: see text] bp) oligonucleotides (and amino acid sequences) with specific secondary and tertiary structures that govern their affinity to specific target molecules. Aptamers are typically generated via solid phase oligonucleotide synthesis before selection and amplification through Systematic Evolution of Ligands by EXponential enrichment (SELEX), a process based on competitive binding that enriches the population of certain strands while removing unwanted sequences, yielding aptamers with high specificity and affinity to a target molecule. Mathematical analyses of SELEX have been formulated in the mass action limit, which assumes large system sizes and/or high aptamer and target molecule concentrations. In this paper, we develop a fully discrete stochastic model of SELEX. While converging to a mass-action model in the large system-size limit, our stochastic model allows us to study statistical quantities when the system size is small, such as the probability of losing the best-binding aptamer during each round of selection. Specifically, we find that optimal SELEX protocols in the stochastic model differ from those predicted by a deterministic model.
more »
« less
Generative and interpretable machine learning for aptamer design and analysis of in vitro sequence selection
Selection protocols such as SELEX, where molecules are selected over multiple rounds for their ability to bind to a target of interest, are popular methods for obtaining binders for diagnostic and therapeutic purposes. We show that Restricted Boltzmann Machines (RBMs), an unsupervised two-layer neural network architecture, can successfully be trained on sequence ensembles from single rounds of SELEX experiments for thrombin aptamers. RBMs assign scores to sequences that can be directly related to their fitnesses estimated through experimental enrichment ratios. Hence, RBMs trained from sequence data at a given round can be used to predict the effects of selection at later rounds. Moreover, the parameters of the trained RBMs are interpretable and identify functional features contributing most to sequence fitness. To exploit the generative capabilities of RBMs, we introduce two different training protocols: one taking into account sequence counts, capable of identifying the few best binders, and another based on unique sequences only, generating more diverse binders. We then use RBMs model to generate novel aptamers with putative disruptive mutations or good binding properties, and validate the generated sequences with gel shift assay experiments. Finally, we compare the RBM’s performance with different supervised learning approaches that include random forests and several deep neural network architectures.
more »
« less
- Award ID(s):
- 2155095
- PAR ID:
- 10402199
- Editor(s):
- Li, Jinyan
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 18
- Issue:
- 9
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1010561
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Discovery of target‐binding molecules, such as aptamers and peptides, is usually performed with the use of high‐throughput experimental screening methods. These methods typically generate large datasets of sequences of target‐binding molecules, which can be enriched with high affinity binders. However, the identification of the highest affinity binders from these large datasets often requires additional low‐throughput experiments or other approaches. Bioinformatics‐based analyses could be helpful to better understand these large datasets and identify the parts of the sequence space enriched with high affinity binders.BinderSpaceis an open‐source Python package that performs motif analysis, sequence space visualization, clustering analyses, and sequence extraction from clusters of interest. The motif analysis, resulting in text‐based and visual output of motifs, can also provide heat maps of previously measured user‐defined functional properties for all the motif‐containing molecules. Users can also run principal component analysis (PCA) and t‐distributed stochastic neighbor embedding (t‐SNE) analyses on whole datasets and on motif‐related subsets of the data. Functionally important sequences can also be highlighted in the resulting PCA and t‐SNE maps. If points (sequences) in two‐dimensional maps in PCA or t‐SNE space form clusters, users can perform clustering analyses on their data, and extract sequences from clusters of interest. We demonstrate the use ofBinderSpaceon a dataset of oligonucleotides binding to single‐wall carbon nanotubes in the presence and absence of a bioanalyte, and on a dataset of cyclic peptidomimetics binding to bovine carbonic anhydrase protein.BinderSpaceis openly accessible to the public via the GitHub website:https://github.com/vukoviclab/BinderSpace.more » « less
-
null (Ed.)Rapid and accurate diagnosis of various biomarkers associated with medical conditions including early detection of viruses and bacteria with highly sensitive biosensors is currently a research priority. Aptamer is a chemically derived recognition molecule capable of detecting and binding small molecules with high specificity and its fast preparation time, cost effectiveness, ease of modification, stability at high temperature and pH are some of the advantages it has over traditional detection methods such as High Performance Liquid Chromatography (HPLC), Enzyme-linked Immunosorbent Assay (ELISA), Polymerase Chain Reaction (PCR). Higher sensitivity and selectivity can further be achieved via coupling of aptamers with nanomaterials and these conjugates called “aptasensors” are receiving greater attention in early diagnosis and therapy. This review will highlight the selection protocol of aptamers based on Traditional Systematic Evolution of Ligands by EXponential enrichment (SELEX) and the various types of modified SELEX. We further identify both the advantages and drawbacks associated with the modified version of SELEX. Furthermore, we describe the current advances in aptasensor development and the quality of signal types, which are dependent on surface area and other specific properties of the selected nanomaterials, are also reviewed.more » « less
-
George Bebis, Terry Gaasterland (Ed.)Major Histocompability Complex (MHC) Class I molecules provide a pathway for cells to present endogenous peptides to the immune system, allowing it to distinguish healthy cells from those infected by pathogens. Software tools based on neural networks such as NetMHC and NetMHCpan predict whether peptides will bind to variants of MHC molecules. These tools are trained with experimental data, consisting of the amino acid sequence of peptides and their observed binding strength. Such tools generally do not explicitly consider hydrophobicity, a significant biochemical factor relevant to peptide binding. It was observed that these tools predict that some highly hydrophobic peptides will be strong binders, which biochemical factors suggest is incorrect. This paper investigates the correlation of the hydrophobicity of 9-mer peptides with their predicted binding strength to the MHC variant HLA-A*0201 for these software tools. Two studies were performed, one using the data that the neural networks were trained on and the other using a sample of the human proteome. A significant bias within NetMHC-4.0 towards predicting highly hydrophobic peptides as strong binders was observed in both studies. This suggests that hydrophobicity should be included in the training data of the neural networks. Retraining the neural networks with such biochemical annotations of hydrophobicity could increase the accuracy of their predictions, increasing their impact in applications such as vaccine design and neoantigen identification.more » « less
-
We propose a Hierarchical Convolution Neural Network (HCNN) for mitosis event detection in time-lapse phase contrast microscopy. Our method contains two stages: first,we extract candidate spatial-temporal patch sequences in the input image sequences which potentially contain mitosis events. Then,we identify if each patch sequence contains mitosis event or not using a hieratical convolutional neural network. In the experiments,we validate the design of our proposed architecture and evaluate the mitosis event detection performance. Our method achieves 99.1% precision and 97.2% recall in very challenging image sequences of multipolar-shaped C3H10T1/2 mesenchymal stem cells and outperforms other state-of-the-art methods. Furthermore,the proposed method does not depend on hand-crafted feature design or cell tracking. It can be straightforwardly adapted to event detection of other different cell types.more » « less