Abstract BackgroundEmerging evidence indicates an elevated risk of post-concussion musculoskeletal injuries in collegiate athletes; however, identifying athletes at highest risk remains to be elucidated. ObjectiveThe purpose of this study was to model post-concussion musculoskeletal injury risk in collegiate athletes by integrating a comprehensive set of variables by machine learning. MethodsA risk model was developed and tested on a dataset of 194 athletes (155 in the training set and 39 in the test set) with 135 variables entered into the analysis, which included participant’s heath and athletic history, concussion injury and recovery-specific criteria, and outcomes from a diverse array of concussion assessments. The machine learning approach involved transforming variables by the weight of evidence method, variable selection using L1-penalized logistic regression, model selection via the Akaike Information Criterion, and a final L2-regularized logistic regression fit. ResultsA model with 48 predictive variables yielded significant predictive performance of subsequent musculoskeletal injury with an area under the curve of 0.82. Top predictors included cognitive, balance, and reaction at baseline and acute timepoints. At a specified false-positive rate of 6.67%, the model achieves a true-positive rate (sensitivity) of 79% and a precision (positive predictive value) of 95% for identifying at-risk athletes via a well-calibrated composite risk score. ConclusionsThese results support the development of a sensitive and specific injury risk model using standard data combined with a novel methodological approach that may allow clinicians to target high injury risk student athletes. The development and refinement of predictive models, incorporating machine learning and utilizing comprehensive datasets, could lead to improved identification of high-risk athletes and allow for the implementation of targeted injury risk reduction strategies by identifying student athletes most at risk for post-concussion musculoskeletal injury.
more »
« less
Predicting implementation of response to intervention in math using elastic net logistic regression
IntroductionThe primary objective of this study was to identify variables that significantly influence the implementation of math Response to Intervention (RTI) at the school level, utilizing the ECLS-K: 2011 dataset. MethodsDue to missing values in the original dataset, a Random Forest algorithm was employed for data imputation, generating a total of 10 imputed datasets. Elastic net logistic regression, combined with nested cross-validation, was applied to each imputed dataset, potentially resulting in 10 models with different variables. Variables for the models derived from the imputed datasets were selected using four methods, leading to four candidate models for final selection. These models were assessed based on their performance of prediction accuracy, culminating in the selection of the final model that outperformed the others. Results and discussionMethod50and Methodcoefemerged as the most effective, achieving a balanced accuracy of 0.852. The ultimate model selected relevant variables that effectively predicted RTI. The predictive accuracy of the final model was also demonstrated by the receiver operating characteristic (ROC) plot and the corresponding area under the curve (AUC) value, indicating its ability to accurately forecast math RTI implementation in schools for the following year.
more »
« less
- Award ID(s):
- 1749275
- PAR ID:
- 10625431
- Publisher / Repository:
- Frontiers Media
- Date Published:
- Journal Name:
- Frontiers in Psychology
- Volume:
- 15
- ISSN:
- 1664-1078
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract BackgroundEmerging evidence indicates an elevated risk of post-concussion musculoskeletal (MSK) injuries in collegiate athletes; however, identifying athletes at highest risk remains to be elucidated. ObjectiveThe purpose of this study was to model post-concussion MSK injury risk in collegiate athletes by integrating a comprehensive set of variables by machine learning. MethodsA risk model was developed and tested on a dataset of 194 athletes (155 in the training set and 39 in the test set) with 135 variables entered into the analysis, which included participant’s heath and athletic history, concussion injury and recovery specific criteria, and outcomes from a diverse array of concussions assessments. The machine learning approach involved transforming variables by the Weight of Evidence method, variable selection using L1-penalized logistic regression, model selection via the Akaike Information Criterion, and a final L2-regularized logistic regression fit. ResultsA model with 48 predictive variables yielded significant predictive performance of subsequent MSK injury with an area under the curve of 0.82. Top predictors included cognitive, balance, and reaction at Baseline and Acute timepoints. At a specified false positive rate of 6.67%, the model achieves a true positive rate (sensitivity) of 79% and a precision (positive predictive value) of 95% for identifying at-risk athletes via a well calibrated composite risk score. ConclusionThese results support the development of a sensitive and specific injury risk model using standard data combined with a novel methodological approach that may allow clinicians to target high injury risk student-athletes. The development and refinement of predictive models, incorporating machine learning and utilizing comprehensive datasets, could lead to improved identification of high-risk athletes and allow for the implementation of targeted injury risk reduction strategies by identifying student-athletes most at risk for post-concussion MSK injury. Key PointsThere is a well-established elevated risk of post-concussion subsequent musculoskeletal injury; however, prior efforts have failed to identify risk factors.This study developed a composite risk score model with an AUC of 0.82 from common concussion clinical measures and participant demographics.By identifying athletes at elevated risk, clinicians may be able to reduce injury risk through targeted injury risk reduction programs.more » « less
-
Abstract MotivationIntegrating multiple omics datasets can significantly advance our understanding of disease mechanisms, physiology, and treatment responses. However, a major challenge in multi-omics studies is the disparity in sample sizes across different datasets, which can introduce bias and reduce statistical power. To address this issue, we propose a novel framework, OmicsNMF, designed to impute missing omics data and enhance disease phenotype prediction. OmicsNMF integrates Generative Adversarial Networks (GANs) with Non-Negative Matrix Factorization (NMF). NMF is a well-established method for uncovering underlying patterns in omics data, while GANs enhance the imputation process by generating realistic data samples. This synergy aims to more effectively address sample size disparity, thereby improving data integration and prediction accuracy. ResultsFor evaluation, we focused on predicting breast cancer subtypes using the imputed data generated by our proposed framework, OmicsNMF. Our results indicate that OmicsNMF consistently outperforms baseline methods. We further assessed the quality of the imputed data through survival analysis, revealing that the imputed omics profiles provide significant prognostic power for both overall survival and disease-free status. Overall, OmicsNMF effectively leverages GANs and NMF to impute missing samples while preserving key biological features. This approach shows potential for advancing precision oncology by improving data integration and analysis. Availability and implementationSource code is available at: https://github.com/compbiolabucf/OmicsNMF.more » « less
-
Abstract MotivationQuality assessment (QA) of predicted protein tertiary structure models plays an important role in ranking and using them. With the recent development of deep learning end-to-end protein structure prediction techniques for generating highly confident tertiary structures for most proteins, it is important to explore corresponding QA strategies to evaluate and select the structural models predicted by them since these models have better quality and different properties than the models predicted by traditional tertiary structure prediction methods. ResultsWe develop EnQA, a novel graph-based 3D-equivariant neural network method that is equivariant to rotation and translation of 3D objects to estimate the accuracy of protein structural models by leveraging the structural features acquired from the state-of-the-art tertiary structure prediction method—AlphaFold2. We train and test the method on both traditional model datasets (e.g. the datasets of the Critical Assessment of Techniques for Protein Structure Prediction) and a new dataset of high-quality structural models predicted only by AlphaFold2 for the proteins whose experimental structures were released recently. Our approach achieves state-of-the-art performance on protein structural models predicted by both traditional protein structure prediction methods and the latest end-to-end deep learning method—AlphaFold2. It performs even better than the model QA scores provided by AlphaFold2 itself. The results illustrate that the 3D-equivariant graph neural network is a promising approach to the evaluation of protein structural models. Integrating AlphaFold2 features with other complementary sequence and structural features is important for improving protein model QA. Availability and implementationThe source code is available at https://github.com/BioinfoMachineLearning/EnQA. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract MotivationMultiple sequence alignment (MSA) is a basic step in many bioinformatics pipelines. However, achieving highly accurate alignments on large datasets, especially those with sequence length heterogeneity, is a challenging task. Ultra-large multiple sequence alignment using Phylogeny-aware Profiles (UPP) is a method for MSA estimation that builds an ensemble of Hidden Markov Models (eHMM) to represent an estimated alignment on the full-length sequences in the input, and then adds the remaining sequences into the alignment using selected HMMs in the ensemble. Although UPP provides good accuracy, it is computationally intensive on large datasets. ResultsWe present UPP2, a direct improvement on UPP. The main advance is a fast technique for selecting HMMs in the ensemble that allows us to achieve the same accuracy as UPP but with greatly reduced runtime. We show that UPP2 produces more accurate alignments compared to leading MSA methods on datasets exhibiting substantial sequence length heterogeneity and is among the most accurate otherwise. Availability and implementationhttps://github.com/gillichu/sepp. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
An official website of the United States government

