skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions
Machine learning (ML) applications have become an integral part of our lives. ML applications extensively use floating-point computation and involve very large/small numbers; thus, maintaining the numerical stability of such complex computations remains an important challenge. Numerical bugs can lead to system crashes, incorrect output, and wasted computing resources. In this paper, we introduce a novel idea, namelysoft assertions (SA), to encode safety/error conditions for the places where numerical instability can occur. A soft assertion is an ML model automatically trained using the dataset obtained during unit testing of unstable functions. Given the values at the unstable function in an ML application, a soft assertion reports how to change these values in order to trigger the instability. We then use the output of soft assertions as signals to effectively mutate inputs to trigger numerical instability in ML applications. In the evaluation, we used the GRIST benchmark, a total of 79 programs, as well as 15 real-world ML applications from GitHub. We compared our tool with 5 state-of-the-art (SOTA) fuzzers. We found all the GRIST bugs and outperformed the baselines. We found 13 numerical bugs in real-world code, one of which had already been confirmed by the GitHub developers. While the baselines mostly found the bugs that report NaN and INF, our tool found numerical bugs with incorrect output. We showed one case where theTumor Detection Model, trained on Brain MRI images, should have predicted ”tumor”, but instead, it incorrectly predicted ”no tumor” due to the numerical bugs. Our replication package is located at https://figshare.com/s/6528d21ccd28bea94c32.  more » « less
Award ID(s):
2313054
PAR ID:
10629703
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Proceedings of the ACM on Software Engineering
Date Published:
Journal Name:
Proceedings of the ACM on Software Engineering
Volume:
2
Issue:
FSE
ISSN:
2994-970X
Page Range / eLocation ID:
2806 to 2827
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Optimizing compilers, such as LLVM, generatedebug informationin machine code to aid debugging. This information is particularly important when debugging optimized code, as modern software is often compiled with optimization enabled. However, properly updating debug information to reflect code transformations during optimization is a complex task that often relies on manual effort. This complexity makes the process prone to errors, which can lead to incorrect or lost debug information. Finding and fixing potential debug information update errors is vital to maintaining the accuracy and reliability of the overall debugging process. To our knowledge, no existing techniques can rectify debug information update errors in LLVM. While black-box testing approaches can find such bugs, they can neither pinpoint the root causes nor suggest fixes. To fill the gap, we propose thefirsttechnique torobustifydebug information updates in LLVM. In particular, our robustification approach can find and fix incorrect debug location updates. Central to our approach is the observation that the debug locations in the original and optimized programs must satisfy aconformance relation. The relation ensures that LLVM optimizations do not introduce extraneous debug location information on the control-flow paths of the optimized programs. We introducecontrol-flow conformance analysis, a novel approach that determines the reference updates ensuring the conformance relation by observing the execution of LLVM optimization passes and analyzing the debug locations in the control-flow graphs of programs under optimization. The determined reference updates are then used to check developer-written updates in LLVM. When discrepancies arise, the reference updates serve as the update skeletons to guide the fixing. We realized our approach as a tool named MetaLoc, which determines proper debug location updates for LLVM optimizations. More importantly, with MetaLoc, we have reported and patched 46 previously unknown update errors in LLVM. All the patches, along with 22 new regression tests, have been merged into the LLVM codebase, effectively improving the accuracy and reliability of debug information in all programs optimized by LLVM. Furthermore, our approach uncovered and led to corrections in two issues within LLVM’s official documentation on debug information updates. 
    more » « less
  2. Deep Learning (DL) is a class of machine learning algorithms that are used in a wide variety of applications. Like any software system, DL programs can have bugs. To support bug localization in DL programs, several tools have been proposed in the past. As most of the bugs that occur due to improper model structure known as structural bugs lead to inadequate performance during training, it is challenging for developers to identify the root cause and address these bugs. To support bug detection and localization in DL programs, in this article, we propose Theia, which detects and localizes structural bugs in DL programs. Unlike the previous works, Theia considers the training dataset characteristics to automatically detect bugs in DL programs developed using two DL libraries,KerasandPyTorch. Since training the DL models is a time-consuming process, Theia detects these bugs at the beginning of the training process and alerts the developer with informative messages containing the bug’s location and actionable fixes which will help them to improve the structure of the model. We evaluated Theia on a benchmark of 40 real-world buggy DL programs obtained fromStack Overflow. Our results show that Theia successfully localizes 57/75 structural bugs in 40 buggy programs, whereas NeuraLint, a state-of-the-art approach capable of localizing structural bugs before training localizes 17/75 bugs. 
    more » « less
  3. We present a novel symbolic reasoning engine for SQL which can efficiently generate an inputIfornqueriesP1, ⋯,Pn, such that their outputs onIsatisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of eachPi— that is, a subset ofPi’s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques. 
    more » « less
  4. Understanding variations in the routes by which wild animals gain and lose water is challenging, and common methods require longitudinal sampling, which can be prohibitive. However, a new approach usesΔ′17OBW(Δ′17O of animal body water), calculated from measurements ofδ′17O andδ′18O in a single sample, as a natural tracer of water flux.Δ′17OBWis promising, but its relationship to organismal variables such as metabolic rate and water intake have not been validated. Here, we continuously measured oxygen influxes and effluxes of captive deer mice (Peromyscus maniculatus), and manipulated their water intake and metabolic rate. We used these oxygen flux data to predictΔ′17OBWfor the mice and compared these model predictions withΔ′17OBWmeasured in blood plasma samples. As expected,Δ′17OBWpositively correlated with drinking water intake and negatively correlated with metabolic rate. All predictedΔ′17OBW(based on measured oxygen fluxes) values differed from measuredΔ′17OBWvalues by <30 per meg (mean absolute difference: 11 ± 9 per meg), suggesting high accuracy for this modelling approach because studies currently report a range of 300 per meg forΔ′17OBWamong mammals, birds and fish. 
    more » « less
  5. Claesen, Jan (Ed.)
    ABSTRACT The human skin microbiome is a diverse ecosystem that can help prevent infections by producing biomolecules and peptides that inhibit growth and virulence of bacterial pathogens.Staphylococcus aureusis a major human pathogen responsible for diseases that range from acute skin and soft tissue infections to life-threatening septicemia. Its ability to form biofilms is a key virulence factor contributing to its success as a pathogen as well as to its increased antimicrobial resistance. Here, we investigated the ability of bacterial skin commensals to produce molecules that inhibitS. aureusbiofilm formation. Matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) identified 77 human skin microbiome bacterial isolates fromStaphylococcusandBacillusgenera. Metabolites from cell-free concentrated media (CFCM) from 26 representative isolates were evaluated for their ability to inhibit biofilm formation by both methicillin-resistant (MRSA) and methicillin-sensitive (MSSA)S. aureusstrains. CFCM, derived from most of the isolates, inhibited biofilm formation to varying extents but did not inhibit planktonic growth ofS. aureus. Size fractionation of the CFCM of threeS.epidermidisisolates indicated that they produce different bioactive molecules. Cluster analysis, based on either MALDI-TOF mass spectra or whole-genome sequencing draft genomes, did not show clear clusters associated with levels of biofilm inhibition amongS. epidermidisstrains. Finally, similar biosynthetic gene clusters were detected in allS. epidermidisstrains analyzed. These findings indicate that several bacterial constituents of the human skin microbiome display antibiofilmin vitroactivity, warranting further investigation on their potential as novel therapeutic agents. IMPORTANCEThe skin is constantly exposed to the environment and consequently to numerous pathogens. The bacterial community that colonizes healthy skin is thought to play an important role in protecting us against infections.S. aureusis a leading cause of death worldwide and is frequently involved in several types of infections, including skin and soft tissue infections. Its ability to adhere to surfaces and produce biofilms is considered an important virulence factor. Here, we analyzed the activity of different species of bacteria isolated from healthy skin onS. aureusbiofilm formation. We found that some species ofStaphylococcusandBacilluscan reduceS. aureusbiofilm formation, although a generally lower level of inhibitory activity was observed compared toS. epidermidisisolates. AmongS. epidermidisisolates, strength of activity was dependent on the strain. Our data highlight the importance of mining the skin microbiome for isolates that could help combat skin pathogens. 
    more » « less