skip to main content


Title: Automatic classification and segmentation of single-molecule fluorescence time traces with deep learning
Abstract

Traces from single-molecule fluorescence microscopy (SMFM) experiments exhibit photophysical artifacts that typically necessitate human expert screening, which is time-consuming and introduces potential for user-dependent expectation bias. Here, we use deep learning to develop a rapid, automatic SMFM trace selector, termed AutoSiM, that improves the sensitivity and specificity of an assay for a DNA point mutation based on single-molecule recognition through equilibrium Poisson sampling (SiMREPS). The improved performance of AutoSiM is based on accepting both more true positives and fewer false positives than the conventional approach of hidden Markov modeling (HMM) followed by hard thresholding. As a second application, the selector is used for automated screening of single-molecule Förster resonance energy transfer (smFRET) data to identify high-quality traces for further analysis, and achieves ~90% concordance with manual selection while requiring less processing time. Finally, we show that AutoSiM can be adapted readily to novel datasets, requiring only modest Transfer Learning.

 
more » « less
NSF-PAR ID:
10202067
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
11
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. End-to-end flow correlation attacks are among the oldest known attacks on low-latency anonymity networks, and are treated as a core primitive for traffic analysis of Tor. However, despite recent work showing that individual flows can be correlated with high accuracy, the impact of even these state-of-the-art attacks is questionable due to a central drawback: their pairwise nature, requiring comparison between N2 pairs of flows to deanonymize N users. This results in a combinatorial explosion in computational requirements and an asymptotically declining base rate, leading to either high numbers of false positives or vanishingly small rates of successful correlation. In this paper, we introduce a novel flow correlation attack, DeepCoFFEA, that combines two ideas to overcome these drawbacks. First, DeepCoFFEA uses deep learning to train a pair of feature embedding networks that respectively map Tor and exit flows into a single low-dimensional space where correlated flows are similar; pairs of embedded flows can be compared at lower cost than pairs of full traces. Second, DeepCoFFEA uses amplification, dividing flows into short windows and using voting across these windows to significantly reduce false positives; the same embedding networks can be used with an increasing number of windows to independently lower the false positive rate. We conduct a comprehensive experimental analysis showing that DeepCoFFEA significantly outperforms state-of-the-art flow correlation attacks on Tor, e.g. 93% true positive rate versus at most 13% when tuned for high precision, with two orders of magnitude speedup over prior work. We also consider the effects of several potential countermeasures on DeepCoFFEA, finding that existing lightweight defenses are not sufficient to secure anonymity networks from this threat. 
    more » « less
  2. Summary

    Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log(p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log(p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

     
    more » « less
  3. Abstract

    Staphylococcus aureusis a major foodborne bacterial pathogen. Early detection ofS. aureusis crucial to prevent infections and ensure food quality. The iron‐regulated surface determinant protein A (IsdA) ofS. aureusis a unique surface protein necessary for sourcing vital iron from host cells for the survival and colonization of the bacteria. The function, structure, and location of the IsdA protein make it an important protein for biosensing applications relating to the pathogen. Here, we report an in‐silico approach to develop and validate high‐affinity binding aptamers for the IsdA protein detection using custom‐designed in‐silico tools and single‐molecule Fluorescence Resonance Energy Transfer (smFRET) measurements. We utilized in‐silico oligonucleotide screening methods and metadynamics‐based methods to generate 10 aptamer candidates and characterized them based on the Dissociation Free Energy (DFE) of the IsdA‐aptamer complexes. Three of the aptamer candidates were shortlisted for smFRET experimental analysis of binding properties. Limits of detection in the low picomolar range were observed for the aptamers, and the results correlated well with the DFE calculations, indicating the potential of the in‐silico approach to support aptamer discovery.

    This study showcases a computational SELEX method in combination with single‐molecule binding studies deciphering effective aptamers againstS. aureus IsdA, protein. The established approach demonstrates the ability to expedite aptamer discovery that has the potential to cut costs and predict binding efficacy. The application can be extended to designing aptamers for various protein targets, enhancing molecular recognition, and facilitating the development of high‐affinity aptamers for multiple uses.

     
    more » « less
  4. null (Ed.)
    Abstract Background Continuous enzyme kinetic assays are often used in high-throughput applications, as they allow rapid acquisition of large amounts of kinetic data and increased confidence compared to discontinuous assays. However, data analysis is often rate-limiting in high-throughput enzyme assays, as manual inspection and selection of a linear range from individual kinetic traces is cumbersome and prone to user error and bias. Currently available software programs are specialized and designed for the analysis of complex enzymatic models. Despite the widespread use of initial rate determination for processing kinetic data sets, no simple and automated program existed for rapid analysis of initial rates from continuous enzyme kinetic traces. Results An Interactive Continuous Enzyme Kinetics Analysis Tool (ICEKAT) was developed for semi-automated calculation of initial rates from continuous enzyme kinetic traces with particular application to the evaluation of Michaelis-Menten and EC 50 /IC 50 kinetic parameters, as well as the results of high-throughput screening assays. ICEKAT allows users to interactively fit kinetic traces using convenient browser-based selection tools, ameliorating tedious steps involved in defining ranges to fit in general purpose programs like Microsoft Excel and Graphpad Prism, while still maintaining simplicity in determining initial rates. As a test case, we quickly analyzed over 500 continuous enzyme kinetic traces resulting from experimental data on the response of the protein lysine deacetylase SIRT1 to small-molecule activators. Conclusions ICEKAT allows simultaneous visualization of individual initial rate fits and the resulting Michaelis-Menten or EC 50 /IC 50 kinetic model fits, as well as hits from high-throughput screening assays. In addition to serving as a convenient program for practicing enzymologists, ICEKAT is also a useful teaching aid to visually demonstrate in real-time how incorrect initial rate fits can affect calculated Michaelis-Menten or EC 50 /IC 50 kinetic parameters. For the convenience of the research community, we have made ICEKAT freely available online at https://icekat.herokuapp.com/icekat . 
    more » « less
  5. Abstract Background

    Genome-wide association studies (GWAS) seek to identify single nucleotide polymorphisms (SNPs) that cause observed phenotypes. However, with highly correlated SNPs, correlated observations, and the number of SNPs being two orders of magnitude larger than the number of observations, GWAS procedures often suffer from high false positive rates.

    Results

    We propose BGWAS, a novel Bayesian variable selection method based on nonlocal priors for linear mixed models specifically tailored for genome-wide association studies. Our proposed method BGWAS uses a novel nonlocal prior for linear mixed models (LMMs). BGWAS has two steps: screening and model selection. The screening step scans through all the SNPs fitting one LMM for each SNP and then uses Bayesian false discovery control to select a set of candidate SNPs. After that, a model selection step searches through the space of LMMs that may have any number of SNPs from the candidate set. A simulation study shows that, when compared to popular GWAS procedures, BGWAS greatly reduces false positives while maintaining the same ability to detect true positive SNPs. We show the utility and flexibility of BGWAS with two case studies: a case study on salt stress in plants, and a case study on alcohol use disorder.

    Conclusions

    BGWAS maintains and in some cases increases the recall of true SNPs while drastically lowering the number of false positives compared to popular SMA procedures.

     
    more » « less