skip to main content


Title: Do Machine Learning Models Learn Statistical Rules Inferred from Data?
Machine learning models can make critical errors that are easily hidden within vast amounts of data. Such errors often run counter to rules based on human intuition. However, rules based on human knowledge are challenging to scale or to even formalize. We thereby seek to infer statistical rules from the data and quantify the extent to which a model has learned them. We propose a framework SQRL that integrates logic-based methods with statistical inference to derive these rules from a model’s training data without supervision. We further show how to adapt models at test time to reduce rule violations and produce more coherent predictions. SQRL generates up to 300K rules over datasets from vision, tabular, and language settings. We uncover up to 158K violations of those rules by state-of-the-art models for classification, object detection, and data imputation. Test-time adaptation reduces these violations by up to 68.7% with relative performance improvement up to 32%. SQRL is available at https://github.com/DebugML/sqrl.  more » « less
Award ID(s):
1836936
PAR ID:
10512553
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
PMLR
Date Published:
Journal Name:
International Conference on Machine Learning
Format(s):
Medium: X
Location:
Honolulu, Hawaii, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Katsaros, Panagiotis ; Nenzi, Laura (Ed.)
    We present eMOP, a tool for incremental runtime verification (RV) of test executions during software evolution. We previously used RV to find hundreds of bugs in open-source projects by monitoring passing tests against formal specifications of Java APIs. We also proposed evolution-aware techniques to reduce RV’s runtime overhead and human time to inspect specification violations. eMOP brings these benefits to developers in a tool that seamlessly integrates with the Maven build system. We describe eMOP’s design, implementation, and usage. We evaluate eMOP on 676 versions of 21 projects, including those from our earlier prototypes' evaluation. eMOP is up to 8.4x faster and shows up to 31.3x fewer violations, compared to running RV from scratch after each code change. eMOP also does not miss new violations in our evaluation, and it is open-sourced at https://github.com/SoftEngResearch/emop. 
    more » « less
  2. Image classifiers have become an important component of today’s software, from consumer and business applications to safety-critical domains. The advent of Deep Neural Networks (DNNs) is the key catalyst behind such wide-spread success. However, wide adoption comes with serious concerns about the robustness of software systems dependent on image classification DNNs, as several severe erroneous behaviors have been reported under sensitive and critical circumstances. We argue that developers need to rigorously test their software’s image classifiers and delay deployment until acceptable. We present an approach to testing image classifier robustness based on class property violations. We have found that many of the reported erroneous cases in popular DNN image classifiers occur because the trained models confuse one class with another or show biases towards some classes over others. These bugs usually violate some class properties of one or more of those classes. Most DNN testing techniques focus on per-image violations and thus fail to detect such class-level confusions or biases. We developed a testing approach to automatically detect class-based confusion and bias errors in DNN-driven image classification software. We evaluated our implementation, DeepInspect, on several popular image classifiers with precision up to 100% (avg. 72.6%) for confusion errors, and up to 84.3% (avg. 66.8%) for bias errors. DeepInspect found hundreds of classification mistakes in widely-used models, many of which expose errors indicating confusion or bias. 
    more » « less
  3. Abstract

    Many statistical models currently used in ecology and evolution account for covariances among random errors. Here, I address five points: (i) correlated random errors unite many types of statistical models, including spatial, phylogenetic and time‐series models; (ii) random errors are neither unpredictable nor mistakes; (iii) diagnostics for correlated random errors are not useful, but simulations are; (iv) model predictions can be made with random errors; and (v) can random errors be causal?

    These five points are illustrated by applying statistical models to analyse simulated spatial, phylogenetic and time‐series data. These three simulation studies are paired with three types of predictions that can be made using information from covariances among random errors: predictions for goodness‐of‐fit, interpolation, and forecasting.

    In the simulation studies, models incorporating covariances among random errors improve inference about the relationship between dependent and independent variables. They also imply the existence of unmeasured variables that generate the covariances among random errors. Understanding the covariances among random errors gives information about possible processes underlying the data.

    Random errors are caused by something. Therefore, to extract full information from data, covariances among random errors should not just be included in statistical models; they should also be studied in their own right. Data are hard won, and appropriate statistical analyses can make the most of them.

     
    more » « less
  4. Part-of-speech (POS) tagging is the foundation of many natural language processing applications. Rule-based POS tagging is a wellknown solution, which assigns tags to the words using a set of predefined rules. Many researchers favor statistical-based approaches over rule-based methods for better empirical accuracy. However, until now, the computational cost of rule-based POS tagging has made it difficult to study whether more complex rules or larger rulesets could lead to accuracy competitive with statistical approaches. In this paper, we leverage two hardware accelerators, the Automata Processor (AP) and Field Programmable Gate Arrays (FPGA), to accelerate rule-based POS tagging by converting rules to regular expressions and exploiting the highly-parallel regular-expressionmatching ability of these accelerators. We study the relationship between rule set size and accuracy, and observe that adding more rules only poses minimal overhead on the AP and FPGA. This allows a substantial increase in the number and complexity of rules, leading to accuracy improvement. Our experiments on Treebank and Brown corpora achieve up to 2,600X and 1,914X speedups on the AP and on the FPGA respectively over rule-based methods on the CPU in the rule-matching stage, up to 58× speedup over the Perceptron POS tagger on the CPU in total testing time, and up to 253× speedup over the LSTM tagger on the GPU in total testing time, while showing a competitive accuracy compared to neural-network and statistical solutions. 
    more » « less
  5. null (Ed.)
    Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic Out-of-Context Dataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and machine vision using OCD. We conducted psychophysics experiments to establish a human benchmark for out-of-context recognition, and then compared it with state-of-the-art computer vision models to quantify the gap between the two. We propose a context-aware recognition transformer model, fusing object and contextual information via multi-head attention. Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets. All source code and data are publicly available at https://github.com/kreimanlab/WhenPigsFlyContext 
    more » « less