NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

JANET: Joint Adaptive predictioN-region Estimation for Time-series

https://doi.org/10.1007/s10994-025-06812-2

English, Eshant; Wong-Toi, Eliot; Fontana, Matteo; Mandt, Stephan; Smyth, Padhraic; Lippert, Christoph (August 2025, Machine Learning)

Abstract Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET (JointAdaptive predictioN-regionEstimation forTime-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlledK-familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET’s superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.
more » « less
Free, publicly-accessible full text available August 1, 2026
Benchmark data repositories for better benchmarking

Longjohn, Rachel; Kelly, Markelle; Singh, Sameer; Smyth, Padhraic (June 2025, Neural Information Processing Systems (NeurIPS))

In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for—and levies criticisms at—data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are stored, documented, and shared. In this paper, we analyze the landscape of these benchmark data repositories and the role they can play in improving benchmarking. This role includes addressing issues with both datasets themselves (e.g., representational harms, construct validity) and the manner in which evaluation is carried out using such datasets (e.g., overemphasis on a few datasets and metrics, lack of reproducibility). To this end, we identify and discuss a set of considerations surrounding the design and use of benchmark data repositories, with a focus on improving benchmarking practices in machine learning.
more » « less
Free, publicly-accessible full text available June 5, 2026
A Generative Diffusion Model for Probabilistic Ensembles of Precipitation Maps Conditioned on Multisensor Satellite Observations

https://doi.org/10.1109/TGRS.2025.3548518

Guilloteau, Clément; Kerrigan, Gavin; Nelson, Kai; Migliorini, Giosue; Smyth, Padhraic; Li, Runze; Foufoula-Georgiou, Efi (January 2025, IEEE Transactions on Geoscience and Remote Sensing)

Full Text Available
Perceptions of Linguistic Uncertainty by Language Models and Humans

Belem, Catarina G; Kelly, Markelle; Steyvers, Mark; Singh, Sameer; Smyth, Padhraic (November 2024, ACL Anthology)

Full Text Available
Differentiating mental models of self and others: A hierarchical framework for knowledge assessment.

https://doi.org/10.1037/rev0000443

Kumar, Aakriti; Smyth, Padhraic; Steyvers, Mark (November 2023, Psychological Review)

Full Text Available
Perceptions of Linguistic Uncertainty by Language Models and Humans

https://doi.org/10.18653/v1/2024.emnlp-main.483

Belém, Catarina G; Kelly, Markelle; Steyvers, Mark; Singh, Sameer; Smyth, Padhraic (January 2024, Association for Computational Linguistics)

*Uncertainty expressions* such as ‘probably’ or ‘highly unlikely’ are pervasive in human language. While prior work has established that there is population-level agreement in terms of how humans quantitatively interpret these expressions, there has been little inquiry into the abilities of language models in the same context. In this paper, we investigate how language models map linguistic expressions of uncertainty to numerical responses. Our approach assesses whether language models can employ theory of mind in this setting: understanding the uncertainty of another agent about a particular statement, independently of the model’s own certainty about that statement. We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner. However, we observe systematically different behavior depending on whether a statement is actually true or false. This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge (as compared to humans). These findings raise important questions and have broad implications for human-AI and AI-AI communication.
more » « less
Full Text Available
Variable-Based Calibration for Machine Learning Classifiers

https://doi.org/10.1609/aaai.v37i7.25991

Kelly, Markelle; Smyth, Padhraic (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant miscalibration as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based calibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability.
more » « less
Full Text Available
Capturing Humans’ Mental Models of AI: An Item Response Theory Approach

https://doi.org/10.1145/3593013.3594111

Kelly, Markelle; Kumar, Aakriti; Smyth, Padhraic; Steyvers, Mark (June 2023, FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency)

Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a question-answering setting, repeatedly assessing their teammate’s performance. Using this experimental data, we demonstrate the use of our framework for testing research questions about people’s perceptions of both AI agents and other people. We contrast mental models of AI teammates with those of human teammates as we characterize the dimensionality of these mental models, their development over time, and the influence of the participants’ own self-perception. Our results indicate that people expect AI agents’ performance to be significantly better on average than the performance of other humans, with less variation across different types of problems. We conclude with a discussion of the implications of these findings for human-AI interaction.
more » « less
Full Text Available
Systematically tracking the hourly progression of large wildfires using GOES satellite observations

https://doi.org/10.5194/essd-16-1395-2024

Liu, Tianjia; Randerson, James T; Chen, Yang; Morton, Douglas C; Wiggins, Elizabeth B; Smyth, Padhraic; Foufoula-Georgiou, Efi; Nadler, Roy; Nevo, Omer (January 2024, Earth System Science Data)

Abstract. In the western United States, prolonged drought, a warming climate, and historical fuel buildup have contributed to larger and more intense wildfires as well as to longer fire seasons. As these costly wildfires become more common, new tools and methods are essential for improving our understanding of the evolution of fires and how extreme weather conditions, including heat waves, windstorms, droughts, and varying levels of active-fire suppression, influence fire spread. Here, we develop the Geostationary Operational Environmental Satellites (GOES)-Observed Fire Event Representation (GOFER) algorithm to derive the hourly fire progression of large wildfires and create a product of hourly fire perimeters, active-fire lines, and fire spread rates. Using GOES-East and GOES-West geostationary satellite detections of active fires, we test the GOFER algorithm on 28 large wildfires in California from 2019 to 2021. The GOFER algorithm includes parameter optimizations for defining the burned-to-unburned boundary and correcting for the parallax effect from elevated terrain. We evaluate GOFER perimeters using 12 h data from the Visible Infrared Imaging Radiometer Suite (VIIRS)-derived Fire Event Data Suite (FEDS) and final fire perimeters from the California's Fire and Resource Assessment Program (FRAP). Although the GOES imagery used to derive GOFER has a coarser resolution (2 km at the Equator), the final fire perimeters from GOFER correspond reasonably well to those obtained from FRAP, with a mean Intersection-over-Union (IoU) of 0.77, in comparison to 0.83 between FEDS and FRAP; the IoU indicates the area of overlap over the area of the union relative to the reference perimeters, in which 0 is no agreement and 1 is perfect agreement. GOFER fills a key temporal gap present in other fire tracking products that rely on low-Earth-orbit imagery, where perimeters are available at intervals of 12 h or longer or at ad hoc intervals from aircraft overflights. This is particularly relevant when a fire spreads rapidly, such as at maximum hourly spread rates of over 5 km h−1. Our GOFER algorithm for deriving the hourly fire progression using GOES can be applied to large wildfires across North and South America and reveals considerable variability in the rates of fire spread on diurnal timescales. The resulting GOFER product has a broad set of potential applications, including the development of predictive models for fire spread and the improvement of atmospheric transport models for surface smoke estimates. The resulting GOFER product has a broad set of potential applications, including the development of predictive models for fire spread and the improvement of atmospheric transport models for surface smoke estimates (https://doi.org/10.5281/zenodo.8327264, Liu et al., 2023).
more » « less
Full Text Available
A Brief Tour of Deep Learning from a Statistical Perspective

https://doi.org/10.1146/annurev-statistics-032921-013738

Nalisnick, Eric; Smyth, Padhraic; Tran, Dustin (March 2023, Annual Review of Statistics and Its Application)

We expose the statistical foundations of deep learning with the goal of facilitating conversation between the deep learning and statistics communities. We highlight core themes at the intersection; summarize key neural models, such as feedforward neural networks, sequential neural networks, and neural latent variable models; and link these ideas to their roots in probability and statistics. We also highlight research directions in deep learning where there are opportunities for statistical contributions.
more » « less
Full Text Available

« Prev Next »

Search for: All records