skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Classified functional mixed effects model prediction
In nowadays biomedical research, there has been a growing demand for making accurate prediction at subject levels. In many of these situations, data are collected as longitudinal curves and display distinct individual characteristics. Thus, prediction mechanisms accommodated with functional mixed effects models (FMEM) are useful. In this paper, we developed a classified functional mixed model prediction (CFMMP) method, which adapts classified mixed model prediction (CMMP) to the framework of FMEM. Performance of CFMMP against functional regression prediction based on simulation studies and the consistency property of CFMMP estimators are explored. Real‐world applications of CFMMP are illustrated using real world examples including data from the hormone research menstrual cycles and the diffusion tensor imaging.  more » « less
Award ID(s):
2210569
PAR ID:
10629252
Author(s) / Creator(s):
;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Statistics in Medicine
Volume:
43
Issue:
7
ISSN:
0277-6715
Page Range / eLocation ID:
1329 to 1340
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We develop a method of classified mixed model prediction based on generalized linear mixed models that incorporate pseudo‐prior information to improve prediction accuracy. We establish consistency of the proposed method both in terms of prediction of the true mixed effect of interest and in terms of correctly identifying the potential class corresponding to the new observations if such a class matching one of the training data classes exists. Empirical results, including simulation studies and real‐data validation, fully support the theoretical findings. 
    more » « less
  2. We propose a new classified mixed model prediction (CMMP) procedure, called pseudo-Bayesian CMMP,that uses network information in matching the group index between the training data and new data, whosecharacteristics of interest one wishes to predict. The current CMMP procedures do not incorporate suchinformation; as a result, the methods are not consistent in terms of matching the group index. Although, asthe number of training data groups increases, the current CMMP method can predict the mixed effects ofinterest consistently, its accuracy is not guaranteed when the number of groups is moderate, as is the case inmany potential applications. The proposed pseudo-Bayesian CMMP procedure assumes a flexible workingprobability model for the group index of the new observation to match the index of a training data group,which may be viewed as a pseudo prior. We show that, given any working model satisfying mild conditions,the pseudo-Bayesian CMMP procedure is consistent and asymptotically optimal both in terms of matchingthe group index and in terms of predicting the mixed effect of interest associated with the new observations.The theoretical results are fully supported by results of empirical studies, including Monte-Carlo simulationsand real-data validation. 
    more » « less
  3. Abstract Machine learning models are susceptible to being misled by biases in training data that emphasize incidental correlations over the intended learning task. In this study, we demonstrate the impact of data bias on the performance of a machine learning model designed to predict the likelihood of synthesizability of crystal compounds. The model performs a binary classification on labeled crystal samples. Despite using the same architecture for the machine learning model, we showcase how the model’s learning and prediction behavior differs once trained on distinct data. We use two data sets for illustration: a mixed-source data set that integrates experimental and computational crystal samples and a single-source data set consisting of data exclusively from one computational database. We present simple procedures to detect data bias and to evaluate its effect on the model’s performance and generalization. This study reveals how inconsistent, unbalanced data can propagate bias, undermining real-world applicability even for advanced machine learning techniques. 
    more » « less
  4. While machine learning models perform well on offline data, assessing their performance in real-world, resource-constrained environments-considering accuracy, prediction time, power consumption, and memory usage-is crucial for practical applications. This research implements a mobile-based Human Activity Recognition solution to classify three postures-sitting, standing, and walking-using smartphone sensors, specifically accelerometer, gyroscope, and magnetometer. Time-domain features extracted from these sensors were used, with Random Forest employed for feature selection. One traditional machine learning model, Logistic Regression, and one deep learning model, Convolutional Neural Network, were trained and deployed via an Android application for real-time evaluation. While the Convolutional Neural Network achieved higher accuracy and better memory efficiency, Logistic Regression demonstrated faster prediction times during real-time use. Both models showed reduced accuracy for standing and walking postures in real-world conditions, emphasizing the challenges of deploying machine learning models in dynamic environments. This study highlights the importance of evaluating machine learning models in real-world settings to ensure reliability and efficiency, particularly in resource-constrained environments. 
    more » « less
  5. Public genomic repositories are notoriously lacking in racially and ethnically diverse samples. This limits the reaches of exploration and has in fact been one of the driving factors for the initiation of the All of Us project. Our particular focus here is to provide a model-based framework for accurately predicting DNA methylation from genetic data using racially sparse public repository data. Epigenetic alterations are of great interest in cancer research but public repository data is limited in the information it provides. However, genetic data is more plentiful. Our phenotype of interest is cervical cancer in The Cancer Genome Atlas (TCGA) repository. Being able to generate such predictions would nicely complement other work that has generated gene-level predictions of gene expression for normal samples. We develop a new prediction approach which uses shared random effects from a nested error mixed effects regression model. The sharing of random effects allows borrowing of strength across racial groups greatly improving predictive accuracy. Additionally, we show how to further borrow strength by combining data from different cancers in TCGA even though the focus of our predictions is DNA methylation in cervical cancer. We compare our methodology against other popular approaches including the elastic net shrinkage estimator and random forest prediction. Results are very encouraging with the shared classified random effects approach uniformly producing more accurate predictions – overall and for each racial group. 
    more » « less