We propose a new classified mixed model prediction (CMMP) procedure, called pseudo-Bayesian CMMP,that uses network information in matching the group index between the training data and new data, whosecharacteristics of interest one wishes to predict. The current CMMP procedures do not incorporate suchinformation; as a result, the methods are not consistent in terms of matching the group index. Although, asthe number of training data groups increases, the current CMMP method can predict the mixed effects ofinterest consistently, its accuracy is not guaranteed when the number of groups is moderate, as is the case inmany potential applications. The proposed pseudo-Bayesian CMMP procedure assumes a flexible workingprobability model for the group index of the new observation to match the index of a training data group,which may be viewed as a pseudo prior. We show that, given any working model satisfying mild conditions,the pseudo-Bayesian CMMP procedure is consistent and asymptotically optimal both in terms of matchingthe group index and in terms of predicting the mixed effect of interest associated with the new observations.The theoretical results are fully supported by results of empirical studies, including Monte-Carlo simulationsand real-data validation.
more »
« less
Classified generalized linear mixed model prediction incorporating pseudo‐prior information
Abstract We develop a method of classified mixed model prediction based on generalized linear mixed models that incorporate pseudo‐prior information to improve prediction accuracy. We establish consistency of the proposed method both in terms of prediction of the true mixed effect of interest and in terms of correctly identifying the potential class corresponding to the new observations if such a class matching one of the training data classes exists. Empirical results, including simulation studies and real‐data validation, fully support the theoretical findings.
more »
« less
- Award ID(s):
- 1914465
- PAR ID:
- 10478958
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Canadian Journal of Statistics
- Volume:
- 51
- Issue:
- 2
- ISSN:
- 0319-5724
- Page Range / eLocation ID:
- 580 to 595
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In nowadays biomedical research, there has been a growing demand for making accurate prediction at subject levels. In many of these situations, data are collected as longitudinal curves and display distinct individual characteristics. Thus, prediction mechanisms accommodated with functional mixed effects models (FMEM) are useful. In this paper, we developed a classified functional mixed model prediction (CFMMP) method, which adapts classified mixed model prediction (CMMP) to the framework of FMEM. Performance of CFMMP against functional regression prediction based on simulation studies and the consistency property of CFMMP estimators are explored. Real‐world applications of CFMMP are illustrated using real world examples including data from the hormone research menstrual cycles and the diffusion tensor imaging.more » « less
-
ABSTRACT Conformal predictions transform a measurable, heuristic notion of uncertainty into statistically valid confidence intervals such that, for a future sample, the true class prediction will be included in the conformal prediction set at a predetermined confidence. In a Bayesian perspective, common estimates of uncertainty in multivariate classification, namelyp‐values, only provide the probability that the data fits the presumed class model,P(D|M). Conformal predictions, on the other hand, address the more meaningful probability that a model fits the data,P(M|D). Herein, two methods to perform inductive conformal predictions are investigated—the traditional Split Conformal Prediction that uses an external calibration set and a novel Bagged Conformal Prediction, closely related to Cross Conformal Predictions, that utilizes bagging to calibrate the heuristic notions of uncertainty. Methods for preprocessing the conformal prediction scores to improve performance are discussed and investigated. These conformal prediction strategies are applied to identifying four non‐steroidal anti‐inflammatory drugs (NSAIDs) from hyperspectral Raman imaging data. In addition to assigning meaningful confidence intervals on the model results, we herein demonstrate how conformal predictions can add additional diagnostics for model quality and method stability.more » « less
-
Abstract MotivationPolygenic risk score (PRS) has been widely exploited for genetic risk prediction due to its accuracy and conceptual simplicity. We introduce a unified Bayesian regression framework, NeuPred, for PRS construction, which accommodates varying genetic architectures and improves overall prediction accuracy for complex diseases by allowing for a wide class of prior choices. To take full advantage of the framework, we propose a summary-statistics-based cross-validation strategy to automatically select suitable chromosome-level priors, which demonstrates a striking variability of the prior preference of each chromosome, for the same complex disease, and further significantly improves the prediction accuracy. ResultsSimulation studies and real data applications with seven disease datasets from the Wellcome Trust Case Control Consortium cohort and eight groups of large-scale genome-wide association studies demonstrate that NeuPred achieves substantial and consistent improvements in terms of predictive r2 over existing methods. In addition, NeuPred has similar or advantageous computational efficiency compared with the state-of-the-art Bayesian methods. Availability and implementationThe R package implementing NeuPred is available at https://github.com/shuangsong0110/NeuPred. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract Archetypal analysis (AA) is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of AA in practice is the inherent computational complexity of the existing algorithms. In this paper, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that provided data are approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of AA. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of AA. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.more » « less
An official website of the United States government

