skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ensemble Detection of DNA Engineering Signatures
Synthetic biology is creating genetically engineered organisms at an increasing rate for many potentially valuable applications, but this potential comes with the risk of misuse or accidental release. To begin to address this issue, we have developed a system called GUARDIAN that can automatically detect signatures of engineering in DNA sequencing data, and we have conducted a blinded test of this system using a curated Test and Evaluation (T&E) data set. GUARDIAN uses an ensemble approach based on the guiding principle that no single approach is likely to be able to detect engineering with perfect accuracy. Critically, ensembling enables GUARDIAN to detect sequence inserts in 13 target organisms with a high degree of specificity that requires no subject matter expert (SME) review.  more » « less
Award ID(s):
1935355
PAR ID:
10655418
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  more » ;  ;  ;   « less
Publisher / Repository:
ACS Publications
Date Published:
Journal Name:
ACS Synthetic Biology
Volume:
13
Issue:
4
ISSN:
2161-5063
Page Range / eLocation ID:
1105 to 1115
Subject(s) / Keyword(s):
artificial intelligence bioinformatics biosecurity engineering detection machine learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Traditional low cost scan based structural tests no longer suffice for delivering acceptable defect levels in many processor SOCs, especially those targeting low power applications. Expensive functional system level tests (SLTs) have become an additional and necessary final test screen. Efforts to eliminate or minimize the use of SLTs have focused on new fault models and improved test generation methods to improve the effectiveness of scan tests. In this paper we argue that given the limitations of scan timing tests, such an approach may not be sufficient to detect all the low voltage failures caused by circuit timing variability that appear to dominate SLT fallout. Instead, we propose an alternate approach for meaningful cost savings that adaptively avoids SLT tests for a subset of the manufactured parts. This is achieved by using parametric and scan tests results from earlier in the test flow to identify low delay variability parts that can avoid SLT with minimal impact on DPPM. Extensive SPICE simulations support the viability of our proposed approach. We also show that such an adaptive test flow is also very well suited to real time optimization during the using machine-learning techniques. 
    more » « less
  2. With the advances in deep learning, speaker verification has achieved very high accuracy and is gaining popularity as a type of biometric authentication option in many scenes of our daily life, especially the growing market of web services. Compared to traditional passwords, “vocal passwords” are much more convenient as they relieve people from memorizing different passwords. However, new machine learning attacks are putting these voice authentication systems at risk. Without a strong security guarantee, attackers could access legitimate users’ web accounts by fooling the deep neural network (DNN) based voice recognition models. In this article, we demonstrate an easy-to-implement data poisoning attack to the voice authentication system, which cannot be captured effectively by existing defense mechanisms. Thus, we also propose a more robust defense method called Guardian, a convolutional neural network-based discriminator. The Guardian discriminator integrates a series of novel techniques including bias reduction, input augmentation, and ensemble learning. Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy. 
    more » « less
  3. The ability to detect sparse signals from noisy, high-dimensional data is a top priority in modern science and engineering. It is well known that a sparse solution of the linear system A ρ = b 0 can be found efficiently with an ℓ 1 -norm minimization approach if the data are noiseless. However, detection of the signal from data corrupted by noise is still a challenging problem as the solution depends, in general, on a regularization parameter with optimal value that is not easy to choose. We propose an efficient approach that does not require any parameter estimation. We introduce a no-phantom weight τ and the Noise Collector matrix C and solve an augmented system A ρ + C η = b 0 + e , where e is the noise. We show that the ℓ 1 -norm minimal solution of this system has zero false discovery rate for any level of noise, with probability that tends to one as the dimension of b 0 increases to infinity. We obtain exact support recovery if the noise is not too large and develop a fast Noise Collector algorithm, which makes the computational cost of solving the augmented system comparable with that of the original one. We demonstrate the effectiveness of the method in applications to passive array imaging. 
    more » « less
  4. Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides data-driven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties. This results in models that often do not capture any underlying continuous dynamics—either of the system of interest, or indeed of any related system. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates an underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications. 
    more » « less
  5. Abstract Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the F -statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic F 1 with much higher statistical power. We show how to rigorously compute a reference distribution for the F 1 statistic and give an algorithm that outputs accurate p -values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect. 
    more » « less