skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 27, 2026

Title: A Machine Learning Model for Post-Concussion Musculoskeletal Injury Risk in Collegiate Athletes
Abstract BackgroundEmerging evidence indicates an elevated risk of post-concussion musculoskeletal injuries in collegiate athletes; however, identifying athletes at highest risk remains to be elucidated. ObjectiveThe purpose of this study was to model post-concussion musculoskeletal injury risk in collegiate athletes by integrating a comprehensive set of variables by machine learning. MethodsA risk model was developed and tested on a dataset of 194 athletes (155 in the training set and 39 in the test set) with 135 variables entered into the analysis, which included participant’s heath and athletic history, concussion injury and recovery-specific criteria, and outcomes from a diverse array of concussion assessments. The machine learning approach involved transforming variables by the weight of evidence method, variable selection using L1-penalized logistic regression, model selection via the Akaike Information Criterion, and a final L2-regularized logistic regression fit. ResultsA model with 48 predictive variables yielded significant predictive performance of subsequent musculoskeletal injury with an area under the curve of 0.82. Top predictors included cognitive, balance, and reaction at baseline and acute timepoints. At a specified false-positive rate of 6.67%, the model achieves a true-positive rate (sensitivity) of 79% and a precision (positive predictive value) of 95% for identifying at-risk athletes via a well-calibrated composite risk score. ConclusionsThese results support the development of a sensitive and specific injury risk model using standard data combined with a novel methodological approach that may allow clinicians to target high injury risk student athletes. The development and refinement of predictive models, incorporating machine learning and utilizing comprehensive datasets, could lead to improved identification of high-risk athletes and allow for the implementation of targeted injury risk reduction strategies by identifying student athletes most at risk for post-concussion musculoskeletal injury.  more » « less
Award ID(s):
2413833
PAR ID:
10599820
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer
Date Published:
Journal Name:
Sports Medicine
ISSN:
0112-1642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract BackgroundEmerging evidence indicates an elevated risk of post-concussion musculoskeletal (MSK) injuries in collegiate athletes; however, identifying athletes at highest risk remains to be elucidated. ObjectiveThe purpose of this study was to model post-concussion MSK injury risk in collegiate athletes by integrating a comprehensive set of variables by machine learning. MethodsA risk model was developed and tested on a dataset of 194 athletes (155 in the training set and 39 in the test set) with 135 variables entered into the analysis, which included participant’s heath and athletic history, concussion injury and recovery specific criteria, and outcomes from a diverse array of concussions assessments. The machine learning approach involved transforming variables by the Weight of Evidence method, variable selection using L1-penalized logistic regression, model selection via the Akaike Information Criterion, and a final L2-regularized logistic regression fit. ResultsA model with 48 predictive variables yielded significant predictive performance of subsequent MSK injury with an area under the curve of 0.82. Top predictors included cognitive, balance, and reaction at Baseline and Acute timepoints. At a specified false positive rate of 6.67%, the model achieves a true positive rate (sensitivity) of 79% and a precision (positive predictive value) of 95% for identifying at-risk athletes via a well calibrated composite risk score. ConclusionThese results support the development of a sensitive and specific injury risk model using standard data combined with a novel methodological approach that may allow clinicians to target high injury risk student-athletes. The development and refinement of predictive models, incorporating machine learning and utilizing comprehensive datasets, could lead to improved identification of high-risk athletes and allow for the implementation of targeted injury risk reduction strategies by identifying student-athletes most at risk for post-concussion MSK injury. Key PointsThere is a well-established elevated risk of post-concussion subsequent musculoskeletal injury; however, prior efforts have failed to identify risk factors.This study developed a composite risk score model with an AUC of 0.82 from common concussion clinical measures and participant demographics.By identifying athletes at elevated risk, clinicians may be able to reduce injury risk through targeted injury risk reduction programs. 
    more » « less
  2. Context Temporal prediction of lower extremity (LE) injury risk will benefit clinicians by allowing them to better leverage limited resources and target athletes most at risk. Objective To characterize instantaneous risk of LE injury by demographic factors sex, sport, body mass index (BMI), and previous injury history. Instantaneous injury risk was defined as injury risk at any given point in time following baseline measurement. Design Descriptive epidemiology study. Setting NCAA Division I athletic program. Patients or Other Participants 278 NCAA Division I varsity student-athletes (119 males, 159 females). Main Outcome Measure(s) LE injuries were tracked for 237±235 days. Sex-stratified univariate Cox regression models investigated the association between time to first LE injury and BMI, sport, and previous LE injury history. Relative risk ratios and Kaplan-Meier curves were generated. Variables identified in the univariate analysis were included in a multivariate Cox regression model. Results Females displayed similar instantaneous LE injury risk compared to males (HR=1.29, 95%CI=[0.91,1.83], p=0.16). Overweight athletes (BMI>25 kg/m2) had similar instantaneous LE injury risk compared with athletes with BMI<25 kg/m2 (HR=1.23, 95%CI=[0.84,1.82], p=0.29). Athletes with previous LE injuries were not more likely to sustain subsequent LE injury than athletes with no previous injury (HR=1.09, 95%CI=[0.76,1.54], p=0.64). Basketball (HR=3.12, 95%CI=[1.51,6.44], p=0.002) and soccer (HR=2.78, 95%CI=[1.46,5.31], p=0.002) athletes had higher risk of LE injury than cross-country athletes. In the multivariate model, females were at greater LE injury risk than males (HR=1.55, 95%CI=[1.00,2.39], p=0.05), and males with BMI>25 kg/m2 were at greater risk than all other athletes (HR=0.44, 95%CI=[0.19,1.00], p=0.05). Conclusions In a collegiate athletic population, previous LE injury history was not a significant contributor to risk of future LE injury, while being female or being male with BMI>25 kg/m2 resulted in increased risk of LE injury. Clinicians can use these data to extrapolate LE injury risk occurrence to specific populations. 
    more » « less
  3. IntroductionPredictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis. MethodsThis is a retrospective cohort study from a SafetyNet hospital’s electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound. ResultsWe developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG. ConclusionMachine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary. 
    more » « less
  4. Background:Previous studies have examined the effect of whole body (WB) parameters on anterior cruciate ligament (ACL) strain and loads, as well as knee joint kinetics and kinematics. However, articular cartilage damage occurs in relation to ACL failure, and the effect of WB parameters on ACL strain and articular cartilage biomechanics during dynamic tasks is unclear. Purposes:(1) To investigate the effect of WB parameters on ACL strain, as well as articular cartilage stress and contact force, during a single-leg cross drop (SLCD) and single-leg drop (SLD). (2) To identify WB parameters predictive of high ACL strain during these tasks. Study Design:Descriptive laboratory study. Methods:Three-dimensional motion analysis data from 14 physically active men and women were recorded during an SLCD and SLD. OpenSim was used to obtain their kinematics, kinetics, and muscle forces for the WB model. Using these data in kinetically driven finite element simulations of the knee joint produced outputs of ACL strains and articular cartilage stresses and contact forces. Spearman correlation coefficients were used to assess relationships between WB parameters and ACL strain and cartilage biomechanics. Moreover, receiver operating characteristic curve analyses and multivariate binary logistic regressions were used to find the WB parameters that could discriminate high from low ACL strain trials. Results:Correlations showed that more lumbar rotation away from the stance limb at peak ACL strain had the strongest overall association (ρ = 0.877) with peak ACL strain. Higher knee anterior shear force (ρ = 0.895) and lower gluteus maximus muscle force (ρ = 0.89) at peak ACL strain demonstrated the strongest associations with peak articular cartilage stress or contact force in ≥1 of the analyzed tasks. The regression model that used muscle forces to predict high ACL strain trials during the dominant limb SLD yielded the highest accuracy (93.5%), sensitivity (0.881), and specificity (0.952) among all regression models. Conclusion:WB parameters that were most consistently associated with and predictive of high ACL strain and poor articular cartilage biomechanics during the SLCD and SLD tasks included greater knee abduction angle at initial contact and higher anterior shear force at peak ACL strain, as well as lower gracilis, gluteus maximus, and medial gastrocnemius muscle forces. Clinical relevance:Knowledge of which landing postures create a high risk for ACL or cartilage injury may help reduce injuries in athletes by avoiding those postures and practicing the tasks with reduced high-risk motions, as well as by strengthening the muscles that protect the knee during single-leg landings. 
    more » « less
  5. Abstract AimsTo develop machine‐learning algorithms for predicting the risk of a hospitalization or emergency department (ED) visit for opioid use disorder (OUD) (i.e. OUD acute events) in Pennsylvania Medicaid enrollees in the Opioid Use Disorder Centers of Excellence (COE) program and to evaluate the fairness of model performance across racial groups. MethodsWe studied 20 983 United States Medicaid enrollees aged 18 years or older who had COE visits between April 2019 and March 2021. We applied multivariate logistic regression, least absolute shrinkage and selection operator models, random forests, and eXtreme Gradient Boosting (XGB), to predict OUD acute events following the initial COE visit. Our models included predictors at the system, patient, and regional levels. We assessed model performance using multiple metrics by racial groups. Individuals were divided into a low, medium and high‐risk group based on predicted risk scores. ResultsThe training (n = 13 990) and testing (n = 6993) samples displayed similar characteristics (mean age 38.1 ± 9.3 years, 58% male, 80% White enrollees) with 4% experiencing OUD acute events at baseline. XGB demonstrated the best prediction performance (C‐statistic = 76.6% [95% confidence interval = 75.6%–77.7%] vs. 72.8%–74.7% for other methods). At the balanced cutoff, XGB achieved a sensitivity of 68.2%, specificity of 70.0%, and positive predictive value of 8.3%. The XGB model classified the testing sample into high‐risk (6%), medium‐risk (30%), and low‐risk (63%) groups. In the high‐risk group, 40.7% had OUD acute events vs. 16.5% and 5.0% in the medium‐ and low‐risk groups. The high‐ and medium‐risk groups captured 44% and 26% of individuals with OUD events. The XGB model exhibited lower false negative rates and higher false positive rates in racial/ethnic minority groups than White enrollees. ConclusionsNew machine‐learning algorithms perform well to predict risks of opioid use disorder (OUD) acute care use among United States Medicaid enrollees and improve fairness of prediction across racial and ethnic groups compared with previous OUD‐related models. 
    more » « less