skip to main content

This content will become publicly available on July 1, 2024

Title: Significance Tests of Feature Relevance for a Black-Box Learner
An exciting recent development is the uptake of deep neural networks in many scientific fields, where the main objective is outcome prediction with a black-box nature. Significance testing is promising to address the black-box issue and explore novel scientific insights and interpretations of the decision-making process based on a deep learning model. However, testing for a neural network poses a challenge because of its black-box nature and unknown limiting distributions of parameter estimates while existing methods require strong assumptions or excessive computation. In this article, we derive one-split and two-split tests relaxing the assumptions and computational complexity of existing black-box tests and extending to examine the significance of a collection of features of interest in a dataset of possibly a complex type, such as an image. The one-split test estimates and evaluates a black-box model based on estimation and inference subsets through sample splitting and data perturbation. The two-split test further splits the inference subset into two but requires no perturbation. Also, we develop their combined versions by aggregating the p-values based on repeated sample splitting. By deflating the bias-sd-ratio, we establish asymptotic null distributions of the test statistics and the consistency in terms of Type II error. Numerically, we demonstrate the utility of the proposed tests on seven simulated examples and six real datasets. Accompanying this article is our python library dnn-inference ( that implements the proposed tests.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Neural Networks and Learning Systems
Page Range / eLocation ID:
1 to 14
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Allen, Genevra (Ed.)
    Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has established important statistical properties like consistency and asymptotic normality by considering subsampling in lieu of bootstrapping. Though such results open the door to traditional inference procedures, all formal methods suggested thus far place severe restrictions on the testing framework and their computational overhead often precludes their practical scientific use. Here we propose a hypothesis test to formally assess feature significance, which uses permutation tests to circumvent computationally infeasible estimates of nuisance parameters. This test is intended to be analogous to the F-test for linear regression. We establish asymptotic validity of the test via exchangeability arguments and show that the test maintains high power with orders of magnitude fewer computations. Importantly, the procedure scales easily to big data settings where large training and testing sets may be employed, conducting statistically valid inference without the need to construct additional models. Simulations and applications to ecological data, where random forests have recently shown promise, are provided. 
    more » « less
  2. If you want to estimate whether height is related to weight in humans, what would you do? You could measure the height and weight of a large number of people, and then run a statistical test. Such ‘independence tests’ can be thought of as a screening procedure: if the two properties (height and weight) are not related, then there is no point in proceeding with further analyses. In the last 100 years different independence tests have been developed. However, classical approaches often fail to accurately discern relationships in the large, complex datasets typical of modern biomedical research. For example, connectomics datasets include tens or hundreds of thousands of connections between neurons that collectively underlie how the brain performs certain tasks. Discovering and deciphering relationships from these data is currently the largest barrier to progress in these fields. Another drawback to currently used methods of independence testing is that they act as a ‘black box’, giving an answer without making it clear how it was calculated. This can make it difficult for researchers to reproduce their findings – a key part of confirming a scientific discovery. Vogelstein et al. therefore sought to develop a method of performing independence tests on large datasets that can easily be both applied and interpreted by practicing scientists. The method developed by Vogelstein et al., called Multiscale Graph Correlation (MGC, pronounced ‘magic’), combines recent developments in hypothesis testing, machine learning, and data science. The result is that MGC typically requires between one half to one third as big a sample size as previously proposed methods for analyzing large, complex datasets. Moreover, MGC also indicates the nature of the relationship between different properties; for example, whether it is a linear relationship or not. Testing MGC on real biological data, including a cancer dataset and a human brain imaging dataset, revealed that it is more effective at finding possible relationships than other commonly used independence methods. MGC was also the only method that explained how it found those relationships. MGC will enable relationships to be found in data across many fields of inquiry – and not only in biology. Scientists, policy analysts, data journalists, and corporate data scientists could all use MGC to learn about the relationships present in their data. To that extent, Vogelstein et al. have made the code open source in MATLAB, R, and Python. 
    more » « less
  3. Summary Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of $2\times 2$ contingency tables constructed through sequential coarse-to-fine discretization of the sample , transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment. 
    more » « less
  4. Abstract Particle filters avoid parametric estimates for Bayesian posterior densities, which alleviates Gaussian assumptions in nonlinear regimes. These methods, however, are more sensitive to sampling errors than Gaussian-based techniques such as ensemble Kalman filters. A recent study by the authors introduced an iterative strategy for particle filters that match posterior moments—where iterations improve the filter’s ability to draw samples from non-Gaussian posterior densities. The iterations follow from a factorization of particle weights, providing a natural framework for combining particle filters with alternative filters to mitigate the impact of sampling errors. The current study introduces a novel approach to forming an adaptive hybrid data assimilation methodology, exploiting the theoretical strengths of nonparametric and parametric filters. At each data assimilation cycle, the iterative particle filter performs a sequence of updates while the prior sample distribution is non-Gaussian, then an ensemble Kalman filter provides the final adjustment when Gaussian distributions for marginal quantities are detected. The method employs the Shapiro–Wilk test to determine when to make the transition between filter algorithms, which has outstanding power for detecting departures from normality. Experiments using low-dimensional models demonstrate that the approach has a significant value, especially for nonhomogeneous observation networks and unknown model process errors. Moreover, hybrid factors are extended to consider marginals of more than one collocated variables using a test for multivariate normality. Findings from this study motivate the use of the proposed method for geophysical problems characterized by diverse observation networks and various dynamic instabilities, such as numerical weather prediction models. Significance Statement Data assimilation statistically processes observation errors and model forecast errors to provide optimal initial conditions for the forecast, playing a critical role in numerical weather forecasting. The ensemble Kalman filter, which has been widely adopted and developed in many operational centers, assumes Gaussianity of the prior distribution and solves a linear system of equations, leading to bias in strong nonlinear regimes. On the other hand, particle filters avoid many of those assumptions but are sensitive to sampling errors and are computationally expensive. We propose an adaptive hybrid strategy that combines their advantages and minimizes the disadvantages of the two methods. The hybrid particle filter–ensemble Kalman filter is achieved with the Shapiro–Wilk test to detect the Gaussianity of the ensemble members and determine the timing of the transition between these filter updates. Demonstrations in this study show that the proposed method is advantageous when observations are heterogeneous and when the model has an unknown bias. Furthermore, by extending the statistical hypothesis test to the test for multivariate normality, we consider marginals of more than one collocated variable. These results encourage further testing for real geophysical problems characterized by various dynamic instabilities, such as real numerical weather prediction models. 
    more » « less
  5. Many recent developments in causal inference, and functional estimation problems more generally, have been motivated by the fact that classical one-step (first-order) debiasing methods, or their more recent sample-split double machine-learning avatars, can outperform plugin estimators under surprisingly weak conditions. These first-order corrections improve on plugin estimators in a black-box fashion, and consequently are often used in conjunction with powerful off-the-shelf estimation methods. These first-order methods are however provably suboptimal in a minimax sense for functional estimation when the nuisance functions live in Holder-type function spaces. This suboptimality of first-order debiasing has motivated the development of "higher-order" debiasing methods. The resulting estimators are, in some cases, provably optimal over Holder-type spaces, but both the estimators which are minimax-optimal and their analyses are crucially tied to properties of the underlying function space. In this paper we investigate the fundamental limits of structure-agnostic functional estimation, where relatively weak conditions are placed on the underlying nuisance functions. We show that there is a strong sense in which existing first-order methods are optimal. We achieve this goal by providing a formalization of the problem of functional estimation with black-box nuisance function estimates, and deriving minimax lower bounds for this problem. Our results highlight some clear tradeoffs in functional estimation -- if we wish to remain agnostic to the underlying nuisance function spaces, impose only high-level rate conditions, and maintain compatibility with black-box nuisance estimators then first-order methods are optimal. When we have an understanding of the structure of the underlying nuisance functions then carefully constructed higher-order estimators can outperform first-order estimators. 
    more » « less