skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Weighted Survival Regression Framework for Incorporating External Prediction Information
Abstract In this article, we develop a weighted approach to estimation for right-censored time to event data in the presence of external predictions available from a prediction model. There are several advantages to the proposed approach. First, the method allows for arbitrary forms for the external prediction model. Second, the methodology can be fit easily using standard software packages that allow for subject-specific weights. Third, all that is needed from the external models are access to predictions and not the actually prediction equation. A complication is that inference becomes challenging, so we develop new theoretical results along with a perturbation-based method for inference. The methodology is applied to three publicly available datasets.  more » « less
Award ID(s):
2149492 1914937
PAR ID:
10634904
Author(s) / Creator(s):
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Journal of Statistical Theory and Practice
Volume:
19
Issue:
4
ISSN:
1559-8608
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The problem of determining which nucleotides of an RNA sequence are paired or unpaired in the secondary structure of an RNA, which we call RNA state inference, can be studied by different machine learning techniques. Successful state inference of RNA sequences can be used to generate auxiliary information for data-directed RNA secondary structure prediction. Typical tools for state inference, such as hidden Markov models, exhibit poor performance in RNA state inference, owing in part to their inability to recognize nonlocal dependencies. Bidirectional long short-term memory (LSTM) neural networks have emerged as a powerful tool that can model global nonlinear sequence dependencies and have achieved state-of-the-art performances on many different classification problems. This paper presents a practical approach to RNA secondary structure inference centered around a deep learning method for state inference. State predictions from a deep bidirectional LSTM are used to generate synthetic SHAPE data that can be incorporated into RNA secondary structure prediction via the Nearest Neighbor Thermodynamic Model (NNTM). This method produces predicted secondary structures for a diverse test set of 16S ribosomal RNA that are, on average, 25 percentage points more accurate than undirected MFE structures. Accuracy is highly dependent on the success of our state inference method, and investigating the global features of our state predictions reveals that accuracy of both our state inference and structure inference methods are highly dependent on the similarity of pairing patterns of the sequence to the training dataset. Availability of a large training dataset is critical to the success of this approach. Code available at https://github.com/dwillmott/rna-state-inf . 
    more » « less
  2. For joint inference over multiple variables, a variety of structured prediction techniques have been developed to model correlations among variables and thereby improve predictions. However, many classical approaches suffer from one of two primary drawbacks: they either lack the ability to model high-order correlations among variables while maintaining computationally tractable inference, or they do not allow to explicitly model known correlations. To address this shortcoming, we introduce ‘Graph Structured Prediction Energy Networks,’ for which we develop inference techniques that allow to both model explicit local and implicit higher-order correlations while maintaining tractability of inference. We apply the proposed method to tasks from the natural language processing and computer vision domain and demonstrate its general utility 
    more » « less
  3. ABSTRACT Conformal predictions transform a measurable, heuristic notion of uncertainty into statistically valid confidence intervals such that, for a future sample, the true class prediction will be included in the conformal prediction set at a predetermined confidence. In a Bayesian perspective, common estimates of uncertainty in multivariate classification, namelyp‐values, only provide the probability that the data fits the presumed class model,P(D|M). Conformal predictions, on the other hand, address the more meaningful probability that a model fits the data,P(M|D). Herein, two methods to perform inductive conformal predictions are investigated—the traditional Split Conformal Prediction that uses an external calibration set and a novel Bagged Conformal Prediction, closely related to Cross Conformal Predictions, that utilizes bagging to calibrate the heuristic notions of uncertainty. Methods for preprocessing the conformal prediction scores to improve performance are discussed and investigated. These conformal prediction strategies are applied to identifying four non‐steroidal anti‐inflammatory drugs (NSAIDs) from hyperspectral Raman imaging data. In addition to assigning meaningful confidence intervals on the model results, we herein demonstrate how conformal predictions can add additional diagnostics for model quality and method stability. 
    more » « less
  4. While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference [A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, T. Zrnic,Science382, 669–674 (2023)], which assumes that a good pretrained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its CIs typically have significantly lower variability. 
    more » « less
  5. Abstract Over the last three decades, many growth and yield systems developed for the southeast USA have incorporated methods to create a compatible basal area (BA) prediction and projection equation. This technique allows practitioners to calibrate BA models using both measurements at a given arbitrary age, as well as the increment in BA when time series panel data are available. As a result, model parameters for either prediction or projection alternatives are compatible. One caveat of this methodology is that pairs of observations used to project forward have the same weight as observations from a single measurement age, regardless of the projection time interval. To address this problem, we introduce a variance–covariance structure giving different weights to predictions with variable intervals. To test this approach, prediction and projection equations were fitted simultaneously using an ad hoc matrix structure. We tested three different error structures in fitting models with (i) homoscedastic errors described by a single parameter (Method 1); (ii) heteroscedastic errors described with a weighting factor $${w}_t$$ (Method 2); and (iii) errors including both prediction ($$\overset{\smile }{\varepsilon }$$) and projection errors ($$\tilde{\varepsilon}$$) in the weighting factor $${w}_t$$ (Method 3). A rotation-age dataset covering nine sites, each including four blocks with four silvicultural treatments per block, was used for model calibration and validation, including explicit terms for each treatment. Fitting using an error structure which incorporated the combined error term ($$\overset{\smile }{\varepsilon }$$ and $$\tilde{\varepsilon}$$) into the weighting factor $${w}_t$$ (Method 3), generated better results according to the root mean square error with respect to the other two methods evaluated. Also, the system of equations that incorporated silvicultural treatments as dummy variables generated lower root mean square error (RMSE) and Akaike’s index values (AIC) in all methods. Our results show a substantial improvement over the current prediction-projection approach, resulting in consistent estimators for BA. 
    more » « less