skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 31, 2026

Title: A Machine Learning Model for Post-Concussion Musculoskeletal Injury Risk in Collegiate Athletes
Abstract BackgroundEmerging evidence indicates an elevated risk of post-concussion musculoskeletal (MSK) injuries in collegiate athletes; however, identifying athletes at highest risk remains to be elucidated. ObjectiveThe purpose of this study was to model post-concussion MSK injury risk in collegiate athletes by integrating a comprehensive set of variables by machine learning. MethodsA risk model was developed and tested on a dataset of 194 athletes (155 in the training set and 39 in the test set) with 135 variables entered into the analysis, which included participant’s heath and athletic history, concussion injury and recovery specific criteria, and outcomes from a diverse array of concussions assessments. The machine learning approach involved transforming variables by the Weight of Evidence method, variable selection using L1-penalized logistic regression, model selection via the Akaike Information Criterion, and a final L2-regularized logistic regression fit. ResultsA model with 48 predictive variables yielded significant predictive performance of subsequent MSK injury with an area under the curve of 0.82. Top predictors included cognitive, balance, and reaction at Baseline and Acute timepoints. At a specified false positive rate of 6.67%, the model achieves a true positive rate (sensitivity) of 79% and a precision (positive predictive value) of 95% for identifying at-risk athletes via a well calibrated composite risk score. ConclusionThese results support the development of a sensitive and specific injury risk model using standard data combined with a novel methodological approach that may allow clinicians to target high injury risk student-athletes. The development and refinement of predictive models, incorporating machine learning and utilizing comprehensive datasets, could lead to improved identification of high-risk athletes and allow for the implementation of targeted injury risk reduction strategies by identifying student-athletes most at risk for post-concussion MSK injury. Key PointsThere is a well-established elevated risk of post-concussion subsequent musculoskeletal injury; however, prior efforts have failed to identify risk factors.This study developed a composite risk score model with an AUC of 0.82 from common concussion clinical measures and participant demographics.By identifying athletes at elevated risk, clinicians may be able to reduce injury risk through targeted injury risk reduction programs.  more » « less
Award ID(s):
2413833
PAR ID:
10599738
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
medRxiv
Date Published:
Format(s):
Medium: X
Institution:
medRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundEmerging evidence indicates an elevated risk of post-concussion musculoskeletal injuries in collegiate athletes; however, identifying athletes at highest risk remains to be elucidated. ObjectiveThe purpose of this study was to model post-concussion musculoskeletal injury risk in collegiate athletes by integrating a comprehensive set of variables by machine learning. MethodsA risk model was developed and tested on a dataset of 194 athletes (155 in the training set and 39 in the test set) with 135 variables entered into the analysis, which included participant’s heath and athletic history, concussion injury and recovery-specific criteria, and outcomes from a diverse array of concussion assessments. The machine learning approach involved transforming variables by the weight of evidence method, variable selection using L1-penalized logistic regression, model selection via the Akaike Information Criterion, and a final L2-regularized logistic regression fit. ResultsA model with 48 predictive variables yielded significant predictive performance of subsequent musculoskeletal injury with an area under the curve of 0.82. Top predictors included cognitive, balance, and reaction at baseline and acute timepoints. At a specified false-positive rate of 6.67%, the model achieves a true-positive rate (sensitivity) of 79% and a precision (positive predictive value) of 95% for identifying at-risk athletes via a well-calibrated composite risk score. ConclusionsThese results support the development of a sensitive and specific injury risk model using standard data combined with a novel methodological approach that may allow clinicians to target high injury risk student athletes. The development and refinement of predictive models, incorporating machine learning and utilizing comprehensive datasets, could lead to improved identification of high-risk athletes and allow for the implementation of targeted injury risk reduction strategies by identifying student athletes most at risk for post-concussion musculoskeletal injury. 
    more » « less
  2. Context Temporal prediction of lower extremity (LE) injury risk will benefit clinicians by allowing them to better leverage limited resources and target athletes most at risk. Objective To characterize instantaneous risk of LE injury by demographic factors sex, sport, body mass index (BMI), and previous injury history. Instantaneous injury risk was defined as injury risk at any given point in time following baseline measurement. Design Descriptive epidemiology study. Setting NCAA Division I athletic program. Patients or Other Participants 278 NCAA Division I varsity student-athletes (119 males, 159 females). Main Outcome Measure(s) LE injuries were tracked for 237±235 days. Sex-stratified univariate Cox regression models investigated the association between time to first LE injury and BMI, sport, and previous LE injury history. Relative risk ratios and Kaplan-Meier curves were generated. Variables identified in the univariate analysis were included in a multivariate Cox regression model. Results Females displayed similar instantaneous LE injury risk compared to males (HR=1.29, 95%CI=[0.91,1.83], p=0.16). Overweight athletes (BMI>25 kg/m2) had similar instantaneous LE injury risk compared with athletes with BMI<25 kg/m2 (HR=1.23, 95%CI=[0.84,1.82], p=0.29). Athletes with previous LE injuries were not more likely to sustain subsequent LE injury than athletes with no previous injury (HR=1.09, 95%CI=[0.76,1.54], p=0.64). Basketball (HR=3.12, 95%CI=[1.51,6.44], p=0.002) and soccer (HR=2.78, 95%CI=[1.46,5.31], p=0.002) athletes had higher risk of LE injury than cross-country athletes. In the multivariate model, females were at greater LE injury risk than males (HR=1.55, 95%CI=[1.00,2.39], p=0.05), and males with BMI>25 kg/m2 were at greater risk than all other athletes (HR=0.44, 95%CI=[0.19,1.00], p=0.05). Conclusions In a collegiate athletic population, previous LE injury history was not a significant contributor to risk of future LE injury, while being female or being male with BMI>25 kg/m2 resulted in increased risk of LE injury. Clinicians can use these data to extrapolate LE injury risk occurrence to specific populations. 
    more » « less
  3. Abstract INTRODUCTIONIdentifying mild cognitive impairment (MCI) patients at risk for dementia could facilitate early interventions. Using electronic health records (EHRs), we developed a model to predict MCI to all‐cause dementia (ACD) conversion at 5 years. METHODSCox proportional hazards model was used to identify predictors of ACD conversion from EHR data in veterans with MCI. Model performance (area under the receiver operating characteristic curve [AUC] and Brier score) was evaluated on a held‐out data subset. RESULTSOf 59,782 MCI patients, 15,420 (25.8%) converted to ACD. The model had good discriminative performance (AUC 0.73 [95% confidence interval (CI) 0.72–0.74]), and calibration (Brier score 0.18 [95% CI 0.17–0.18]). Age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors, while body mass index, alcohol abuse, and sleep apnea were protective factors. DISCUSSIONEHR‐based prediction model had good performance in identifying 5‐year MCI to ACD conversion and has potential to assist triaging of at‐risk patients. HighlightsOf 59,782 veterans with mild cognitive impairment (MCI), 15,420 (25.8%) converted to all‐cause dementia within 5 years.Electronic health record prediction models demonstrated good performance (area under the receiver operating characteristic curve 0.73; Brier 0.18).Age and vascular‐related morbidities were predictors of dementia conversion.Synthetic data was comparable to real data in modeling MCI to dementia conversion. Key PointsAn electronic health record–based model using demographic and co‐morbidity data had good performance in identifying veterans who convert from mild cognitive impairment (MCI) to all‐cause dementia (ACD) within 5 years.Increased age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors for 5‐year conversion from MCI to ACD.High body mass index, alcohol abuse, and sleep apnea were protective factors for 5‐year conversion from MCI to ACD.Models using synthetic data, analogs of real patient data that retain the distribution, density, and covariance between variables of real patient data but are not attributable to any specific patient, performed just as well as models using real patient data. This could have significant implications in facilitating widely distributed computing of health‐care data with minimized patient privacy concern that could accelerate scientific discoveries. 
    more » « less
  4. Abstract BACKGROUNDLimited research has explored the effect of cardiovascular risk and amyloid interplay on cognitive decline in East Asians. METHODSVascular burden was quantified using Framingham's General Cardiovascular Risk Score (FRS) in 526 Korean Brain Aging Study (KBASE) participants. Cognitive differences in groups stratified by FRS and amyloid positivity were assessed at baseline and longitudinally. RESULTSBaseline analyses revealed that amyloid‐negative (Aβ–) cognitively normal (CN) individuals with high FRS had lower cognition compared to Aβ– CN individuals with low FRS (p < 0.0001). Longitudinally, amyloid pathology predominantly drove cognitive decline, while FRS alone had negligible effects on cognition in CN and mild cognitive impairment (MCI) groups. CONCLUSIONOur findings indicate that managing vascular risk may be crucial in preserving cognition in Aβ– individuals early on and before the clinical manifestation of dementia. Within the CN and MCI groups, irrespective of FRS status, amyloid‐positive individuals had worse cognitive performance than Aβ– individuals. HighlightsVascular risk significantly affects cognition in amyloid‐negative older Koreans.Amyloid‐negative CN older adults with high vascular risk had lower baseline cognition.Amyloid pathology drives cognitive decline in CN and MCI, regardless of vascular risk.The study underscores the impact of vascular health on the AD disease spectrum. 
    more » « less
  5. Abstract INTRODUCTIONAlzheimer's disease (AD) initiates years prior to symptoms, underscoring the importance of early detection. While amyloid accumulation starts early, individuals with substantial amyloid burden may remain cognitively normal, implying that amyloid alone is not sufficient for early risk assessment. METHODSGiven the genetic susceptibility of AD, a multi‐factorial pseudotime approach was proposed to integrate amyloid imaging and genotype data for estimating a risk score. Validation involved association with cognitive decline and survival analysis across risk‐stratified groups, focusing on patients with mild cognitive impairment (MCI). RESULTSOur risk score outperformed amyloid composite standardized uptake value ratio in correlation with cognitive scores. MCI subjects with lower pseudotime risk score showed substantial delayed onset of AD and slower cognitive decline. Moreover, pseudotime risk score demonstrated strong capability in risk stratification within traditionally defined subgroups such as early MCI, apolipoprotein E (APOE) ε4+ MCI,APOEε4– MCI, and amyloid+ MCI. DISCUSSIONOur risk score holds great potential to improve the precision of early risk assessment. HighlightsAccurate early risk assessment is critical for the success of clinical trials.A new risk score was built from integrating amyloid imaging and genetic data.Our risk score demonstrated improved capability in early risk stratification. 
    more » « less