Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the graph, CF-GNN produces a prediction set/interval that provably contains the true label with pre-defined coverage probability (e.g. 90%). We establish a permutation invariance condition that enables the validity of CP on graph data and provide an exact characterization of the test-time coverage. Besides valid coverage, it is crucial to reduce the prediction set size/interval length for practical use. We observe a key connection between non-conformity scores and network structures, which motivates us to develop a topology-aware output correction model that learns to update the prediction and produces more efficient prediction sets/intervals. Extensive experiments show that CF-GNN achieves any pre-defined target marginal coverage while significantly reducing the prediction set/interval size by up to 74% over the baselines. It also empirically achieves satisfactory conditional coverage over various raw and network features.
more »
« less
Conformal Prediction: A Data Perspective
Conformal prediction (CP), a distribution-free uncertainty quantification (UQ) framework, reliably provides valid predictive inference for black-box models. CP constructs prediction sets or intervals that contain the true output with a specified probability. However, modern data science’s diverse modalities, along with increasing data and model complexity, challenge traditional CP methods. These developments have spurred novel approaches to address evolving scenarios. This survey reviews the foundational concepts of CP and recent advancements from a data-centric perspective, including applications to structured, unstructured, and dynamic data. We also discuss the challenges and opportunities CP faces in large-scale data and models.
more »
« less
- Award ID(s):
- 2440542
- PAR ID:
- 10677259
- Publisher / Repository:
- ACM Computing Survey
- Date Published:
- Journal Name:
- ACM Computing Surveys
- Volume:
- 58
- Issue:
- 2
- ISSN:
- 0360-0300
- Page Range / eLocation ID:
- 1 to 37
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Arunachalam, Viswanathan (Ed.)During the COVID-19 pandemic, the prevalence of asymptomatic cases challenged the reliability of epidemiological statistics in policymaking. To address this, we introducedcontagion potential(CP) as a continuous metric derived from sociodemographic and epidemiological data to quantify the infection risk posed by the asymptomatic within a region. However, CP estimation is hindered by incomplete or biased incidence data, where underreporting and testing constraints make direct estimation infeasible. To overcome this limitation, we employ a hypothesis-testing approach to infer CP from sampled data, allowing for robust estimation despite missing information. Even within the sample collected from spatial contact data, individuals possess partial knowledge of their neighborhoods, as their awareness is restricted to interactions captured by available tracking data. We introduce an adjustment factor that calibrates the sample CPs so that the sample is a reasonable estimate of the population CP. Further complicating estimation, biases in epidemiological and mobility data arise from heterogeneous reporting rates and sampling inconsistencies, which we address throughinverse probability weightingto enhance reliability. Using a spatial model for infection spread through social mixing and an optimization framework based on the SIRS epidemic model, we analyze real infection datasets from Italy, Germany, and Austria. Our findings demonstrate that statistical methods can achieve high-confidence CP estimates while accounting for variations in sample size, confidence level, mobility models, and viral strains. By assessing the effects of bias, social mixing, and sampling frequency, we propose statistical corrections to improve CP prediction accuracy. Finally, we discuss how reliable CP estimates can inform outbreak mitigation strategies despite the inherent uncertainties in epidemiological data.more » « less
-
Accurate minimum operating voltage Vmin prediction is a critical element in manufacturing tests. Conventional methods lack coverage guarantees in interval predictions. Conformal Prediction (CP), a distribution-free machine learning approach, excels in providing rigorous coverage guarantees for interval predictions. However, standard CP predictors may fail due to a lack of knowledge of process variations. We address this challenge by providing principled conformalized interval prediction in the presence of process variations with high data efficiency, where the data from a few additional chips is utilized for calibration. We demonstrate the superiority of the proposed method on industrial 16nm chip data.more » « less
-
Prediction of Indian summer monsoon rainfall (ISMR) is at the heart of tropical climate prediction. Despite enormous progress having been made in predicting ISMR since 1886, the operational forecasts during recent decades (1989–2012) have little skill. Here we show, with both dynamical and physical–empirical models, that this recent failure is largely due to the models’ inability to capture new predictability sources emerging during recent global warming, that is, the development of the central-Pacific El Nino-Southern Oscillation (CP–ENSO), the rapid deepening of the Asian Low and the strengthening of North and South Pacific Highs during boreal spring. A physical–empirical model that captures these new predictors can produce an independent forecast skill of 0.51 for 1989–2012 and a 92-year retrospective forecast skill of 0.64 for 1921–2012. The recent low skills of the dynamical models are attributed to deficiencies in capturing the developing CP–ENSO and anomalous Asian Low. The results reveal a considerable gap between ISMR prediction skill and predictability.more » « less
-
Hand signals are the most widely used, feasible, and device-free communication method in manufacturing plants, airport ramps, and other noisy or voice-prohibiting environments. Enabling IoT agents, such as robots, to recognize and communicate by hand signals will facilitate human-machine collaboration for the emerging “Industry 5.0.” While many prior works succeed in hand signal recognition, few can rigorously guarantee the accuracy of their predictions. This project proposes a method that builds on the theory of conformal prediction (CP) to provide statistical guarantees on hand signal recognition accuracy and, based on it, measure the uncertainty in this communication process. It utilizes a calibration set with a few representative samples to ensure that trained models provide a conformal prediction set that reaches or exceeds the truth worth and trustworthiness at a user-specified level. Subsequently, the uncertainty in the recognition process can be detected by measuring the length of the conformal prediction set. Furthermore, the proposed CP-based method can be used with IoT models without fine-tuning as an out-of-the-box and promising lightweight approach to modeling uncertainty. Our experiments show that the proposed conformal recognition method can achieve accurate hand signal prediction in novel scenarios. When selecting an error level α = 0.10, it provided 100% accuracy for out-of-distribution test sets.more » « less
An official website of the United States government

