Single-molecule and single-particle tracking experiments are typically unable to resolve fine details of thermal motion at short timescales where trajectories are continuous. We show that, when a diffusive trajectory [Formula: see text] is sampled at finite time intervals δt, the resulting error in measuring the first passage time to a given domain can exceed the time resolution of the measurement by more than an order of magnitude. Such surprisingly large errors originate from the fact that the trajectory may enter and exit the domain while being unobserved, thereby lengthening the apparent first passage time by an amount that is larger than δt. Such systematic errors are particularly important in single-molecule studies of barrier crossing dynamics. We show that the correct first passage times, as well as other properties of the trajectories such as splitting probabilities, can be recovered via a stochastic algorithm that reintroduces unobserved first passage events probabilistically.
more »
« less
Top-down machine learning approach for high-throughput single-molecule analysis
Single-molecule approaches provide enormous insight into the dynamics of biomolecules, but adequately sampling distributions of states and events often requires extensive sampling. Although emerging experimental techniques can generate such large datasets, existing analysis tools are not suitable to process the large volume of data obtained in high-throughput paradigms. Here, we present a new analysis platform (DISC) that accelerates unsupervised analysis of single-molecule trajectories. By merging model-free statistical learning with the Viterbi algorithm, DISC idealizes single-molecule trajectories up to three orders of magnitude faster with improved accuracy compared to other commonly used algorithms. Further, we demonstrate the utility of DISC algorithm to probe cooperativity between multiple binding events in the cyclic nucleotide binding domains of HCN pacemaker channel. Given the flexible and efficient nature of DISC, we anticipate it will be a powerful tool for unsupervised processing of high-throughput data across a range of single-molecule experiments.
more »
« less
- Award ID(s):
- 1856518
- PAR ID:
- 10545470
- Publisher / Repository:
- eLife
- Date Published:
- Journal Name:
- eLife
- Volume:
- 9
- ISSN:
- 2050-084X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The glomerulus is a multicellular functional tissue unit (FTU) of the nephron that is responsible for blood filtration. Each glomerulus contains multiple substructures and cell types that are crucial for their function. To understand normal aging and disease in kidneys, methods for high spatial resolution molecular imaging within these FTUs across whole slide images is required. Here we demonstrate a workflow using microscopy-driven selected sampling to enable 5 μm pixel size matrix-assisted laser desorption/ionization imaging mass spectrometry (MALDI IMS) of all glomeruli within whole slide human kidney tissues. Such high spatial resolution imaging entails large numbers of pixels, increasing the data acquisition times. Automating FTU-specific tissue sampling enables high-resolution analysis of critical tissue structures, while concurrently maintaining throughput. Glomeruli were automatically segmented using coregistered autofluorescence microscopy data, and these segmentations were translated into MALDI IMS measurement regions. This allowed high-throughput acquisition of 268 glomeruli from a single whole slide human kidney tissue section. Unsupervised machine learning methods were used to discover molecular profiles of glomerular subregions and differentiate between healthy and diseased glomeruli. Average spectra for each glomerulus were analyzed using Uniform Manifold Approximation and Projection (UMAP) and k-means clustering, yielding 7 distinct groups of differentiated healthy and diseased glomeruli. Pixel-wise k-means clustering was applied to all glomeruli, showing unique molecular profiles localized to subregions within each glomerulus. Automated microscopy-driven, FTU-targeted acquisition for high spatial resolution molecular imaging maintains high-throughput and enables rapid assessment of whole slide images at cellular resolution and identification of tissue features associated with normal aging and disease.more » « less
-
Abstract In vitro aptamer isolation methods can yield hundreds of potential candidates, but selecting the optimal aptamer for a given application is challenging and laborious. Existing aptamer characterization methods either entail low-throughput analysis with sophisticated instrumentation, or offer the potential for higher throughput at the cost of providing a relatively increased risk of false-positive or -negative results. Here, we describe a novel method for accurately and sensitively evaluating the binding between DNA aptamers and small-molecule ligands in a high-throughput format without any aptamer engineering or labeling requirements. This approach is based on our new finding that ligand binding inhibits aptamer digestion by T5 exonuclease, where the extent of this inhibition correlates closely with the strength of aptamer-ligand binding. Our assay enables accurate and efficient screening of the ligand-binding profiles of individual aptamers, as well as the identification of the best target binders from a batch of aptamer candidates, independent of the ligands in question or the aptamer sequence and structure. We demonstrate the general applicability of this assay with a total of 106 aptamer-ligand pairs and validate these results with a gold-standard method. We expect that our assay can be readily expanded to characterize small-molecule-binding aptamers in an automated, high-throughput fashion.more » « less
-
Generation of molecules with desired chemical and biological properties such as high drug-likeness, high binding affinity to target proteins, is critical for drug discovery. In this paper, we propose a probabilistic generative model to capture the joint distribution of molecules and their properties. Our model assumes an energy-based model (EBM) in the latent space. Conditional on the latent vector, the molecule and its properties are modeled by a molecule generation model and a property regression model respectively. To search for molecules with desired properties, we propose a sampling with gradual distribution shifting (SGDS) algorithm, so that after learning the model initially on the training data of existing molecules and their properties, the proposed algorithm gradually shifts the model distribution towards the region supported by molecules with desired values of properties. Our experiments show that our method achieves very strong performances on various molecule design tasks.more » « less
-
Abstract Discovery of target‐binding molecules, such as aptamers and peptides, is usually performed with the use of high‐throughput experimental screening methods. These methods typically generate large datasets of sequences of target‐binding molecules, which can be enriched with high affinity binders. However, the identification of the highest affinity binders from these large datasets often requires additional low‐throughput experiments or other approaches. Bioinformatics‐based analyses could be helpful to better understand these large datasets and identify the parts of the sequence space enriched with high affinity binders.BinderSpaceis an open‐source Python package that performs motif analysis, sequence space visualization, clustering analyses, and sequence extraction from clusters of interest. The motif analysis, resulting in text‐based and visual output of motifs, can also provide heat maps of previously measured user‐defined functional properties for all the motif‐containing molecules. Users can also run principal component analysis (PCA) and t‐distributed stochastic neighbor embedding (t‐SNE) analyses on whole datasets and on motif‐related subsets of the data. Functionally important sequences can also be highlighted in the resulting PCA and t‐SNE maps. If points (sequences) in two‐dimensional maps in PCA or t‐SNE space form clusters, users can perform clustering analyses on their data, and extract sequences from clusters of interest. We demonstrate the use ofBinderSpaceon a dataset of oligonucleotides binding to single‐wall carbon nanotubes in the presence and absence of a bioanalyte, and on a dataset of cyclic peptidomimetics binding to bovine carbonic anhydrase protein.BinderSpaceis openly accessible to the public via the GitHub website:https://github.com/vukoviclab/BinderSpace.more » « less
An official website of the United States government

