skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Interpretable Machine Learning for Discovery: Statistical Challenges and Opportunities
New technologies have led to vast troves of large and complex data sets across many scientific domains and industries. People routinely use machine learning techniques not only to process, visualize, and make predictions from these big data, but also to make data-driven discoveries. These discoveries are often made using interpretable machine learning, or machine learning models and techniques that yield human-understandable insights. In this article, we discuss and review the field of interpretable machine learning, focusing especially on the techniques, as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using interpretable machine learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation both from a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude byhighlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven discoveries.  more » « less
Award ID(s):
2210837
PAR ID:
10529596
Author(s) / Creator(s):
; ;
Publisher / Repository:
Annual Review of Statistics and Its Application
Date Published:
Journal Name:
Annual Review of Statistics and Its Application
Volume:
11
Issue:
1
ISSN:
2326-8298
Page Range / eLocation ID:
97 to 121
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We review several concepts and modeling techniques from statistical and machine learning that have been developed to forecast recidivism. We show how these methods might be repurposed for forecasting police officer use of force. Using open Chicago police department use-of-force complaint data for illustration, we discuss feature engineering, construction of black-box models, interpretable forecasts, and fairness. 
    more » « less
  2. Material characterization techniques are widely used to characterize the physical and chemical properties of materials at the nanoscale and, thus, play central roles in material scientific discoveries. However, the large and complex datasets generated by these techniques often require significant human effort to interpret and extract meaningful physicochemical insights. Artificial intelligence (AI) techniques such as machine learning (ML) have the potential to improve the efficiency and accuracy of surface analysis by automating data analysis and interpretation. In this perspective paper, we review the current role of AI in surface analysis and discuss its future potential to accelerate discoveries in surface science, materials science, and interface science. We highlight several applications where AI has already been used to analyze surface analysis data, including the identification of crystal structures from XRD data, analysis of XPS spectra for surface composition, and the interpretation of TEM and SEM images for particle morphology and size. We also discuss the challenges and opportunities associated with the integration of AI into surface analysis workflows. These include the need for large and diverse datasets for training ML models, the importance of feature selection and representation, and the potential for ML to enable new insights and discoveries by identifying patterns and relationships in complex datasets. Most importantly, AI analyzed data must not just find the best mathematical description of the data, but it must find the most physical and chemically meaningful results. In addition, the need for reproducibility in scientific research has become increasingly important in recent years. The advancement of AI, including both conventional and the increasing popular deep learning, is showing promise in addressing those challenges by enabling the execution and verification of scientific progress. By training models on large experimental datasets and providing automated analysis and data interpretation, AI can help to ensure that scientific results are reproducible and reliable. Although integration of knowledge and AI models must be considered for the transparency and interpretability of models, the incorporation of AI into the data collection and processing workflow will significantly enhance the efficiency and accuracy of various surface analysis techniques and deepen our understanding at an accelerated pace. 
    more » « less
  3. Abstract Based on historical developments and the current state of the art in gas-phase transmission electron microscopy (GP-TEM), we provide a perspective covering exciting new technologies and methodologies of relevance for chemical and surface sciences. Considering thermal and photochemical reaction environments, we emphasize the benefit of implementing gas cells, quantitative TEM approaches using sensitive detection for structured electron illumination (in space and time) and data denoising, optical excitation, and data mining using autonomous machine learning techniques. These emerging advances open new ways to accelerate discoveries in chemical and surface sciences. Graphical abstract 
    more » « less
  4. Predicting and understanding how people make decisions has been a long-standing goal in many fields, with quantitative models of human decision-making informing research in both the social sciences and engineering. We show how progress toward this goal can be accelerated by using large datasets to power machine-learning algorithms that are constrained to produce interpretable psychological theories. Conducting the largest experiment on risky choice to date and analyzing the results using gradient-based optimization of differentiable decision theories implemented through artificial neural networks, we were able to recapitulate historical discoveries, establish that there is room to improve on existing theories, and discover a new, more accurate model of human decision-making in a form that preserves the insights from centuries of research. 
    more » « less
  5. Despite their successes, machine learning techniques are often stochastic, error-prone and blackbox. How could they then be used in fields such as theoretical physics and pure mathematics for which error-free results and deep understanding are a must? In this Perspective, we discuss techniques for obtaining zero-error results with machine learning, with a focus on theoretical physics and pure mathematics. Non-rigorous methods can enable rigorous results via conjecture generation or verification by reinforcement learning. We survey applications of these techniques-for-rigor ranging from string theory to the smooth 4D Poincaré conjecture in low-dimensional topology. We also discuss connections between machine learning theory and mathematics or theoretical physics such as a new approach to field theory motivated by neural network theory, and a theory of Riemannian metric flows induced by neural network gradient descent, which encompasses Perelman’s formulation of the Ricci flow that was used to solve the 3D Poincaré conjecture. 
    more » « less