Abstract Background Predictive models utilizing social determinants of health (SDH), demographic data, and local weather data were trained to predict missed imaging appointments (MIA) among breast imaging patients at the Boston Medical Center (BMC). Patients were characterized by many different variables, including social needs, demographics, imaging utilization, appointment features, and weather conditions on the date of the appointment. Methods This HIPAA compliant retrospective cohort study was IRB approved. Informed consent was waived. After data preprocessing steps, the dataset contained 9,970 patients and 36,606 appointments from 1/1/2015 to 12/31/2019. We identified 57 potentially impactful variables used in the initial prediction model and assessed each patient for MIA. We then developed a parsimonious model via recursive feature elimination, which identified the 25 most predictive variables. We utilized linear and non-linear models including support vector machines (SVM), logistic regression (LR), and random forest (RF) to predict MIA and compared their performance. Results The highest-performing full model is the nonlinear RF, achieving the highest Area Under the ROC Curve (AUC) of 76% and average F1 score of 85%. Models limited to the most predictive variables were able to attain AUC and F1 scores comparable to models with all variables included. The variables most predictive of missed appointments included timing, prior appointment history, referral department of origin, and socioeconomic factors such as household income and access to caregiving services. Conclusions Prediction of MIA with the data available is inherently limited by the complex, multifactorial nature of MIA. However, the algorithms presented achieved acceptable performance and demonstrated that socioeconomic factors were useful predictors of MIA. In contrast with non-modifiable demographic factors, we can address SDH to decrease the incidence of MIA.
more »
« less
Depression predictions from GPS-based mobility do not generalize well to large demographically heterogeneous samples
Abstract Depression is one of the most common mental health issues in the United States, affecting the lives of millions of people suffering from it as well as those close to them. Recent advances in research on mobile sensing technologies and machine learning have suggested that a person’s depression can be passively measured by observing patterns in people’s mobility behaviors. However, the majority of work in this area has relied on highly homogeneous samples, most frequently college students. In this study, we analyse over 57 million GPS data points to show that the same procedure that leads to high prediction accuracy in a homogeneous student sample (N = 57; AUC = 0.82), leads to accuracies only slightly higher than chance in a U.S.-wide sample that is heterogeneous in its socio-demographic composition as well as mobility patterns (N = 5,262; AUC = 0.57). This pattern holds across three different modelling approaches which consider both linear and non-linear relationships. Further analyses suggest that the prediction accuracy is low across different socio-demographic groups, and that training the models on more homogeneous subsamples does not substantially improve prediction accuracy. Overall, the findings highlight the challenge of applying mobility-based predictions of depression at scale.
more »
« less
- Award ID(s):
- 1761810
- PAR ID:
- 10374089
- Date Published:
- Journal Name:
- Scientific Reports
- Volume:
- 11
- Issue:
- 1
- ISSN:
- 2045-2322
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Crisostomi, Emanuele (Ed.)In light of the outbreak of COVID-19, analyzing and measuring human mobility has become increasingly important. A wide range of studies have explored spatiotemporal trends over time, examined associations with other variables, evaluated non-pharmacologic interventions (NPIs), and predicted or simulated COVID-19 spread using mobility data. Despite the benefits of publicly available mobility data, a key question remains unanswered: are models using mobility data performing equitably across demographic groups? We hypothesize that bias in the mobility data used to train the predictive models might lead to unfairly less accurate predictions for certain demographic groups. To test our hypothesis, we applied two mobility-based COVID infection prediction models at the county level in the United States using SafeGraph data, and correlated model performance with sociodemographic traits. Findings revealed that there is a systematic bias in models’ performance toward certain demographic characteristics. Specifically, the models tend to favor large, highly educated, wealthy, young, and urban counties. We hypothesize that the mobility data currently used by many predictive models tends to capture less information about older, poorer, less educated and people from rural regions, which in turn negatively impacts the accuracy of the COVID-19 prediction in these areas. Ultimately, this study points to the need of improved data collection and sampling approaches that allow for an accurate representation of the mobility patterns across demographic groups.more » « less
-
Emerging transportation modes, including car-sharing, bike-sharing, and ride-hailing, are transforming urban mobility yet have been shown to reinforce socioeconomic inequity. These services rely on accurate demand prediction, but the demand data on which these models are trained reflect biases around demographics, socioeconomic conditions, and entrenched geographic patterns. To address these biases and improve fairness, we present FairST, a fairness-aware demand prediction model for spatiotemporal urban applications, with emphasis on new mobility. We use 1D (time-varying, space-constant), 2D (space-varying, time-constant) and 3D (both time- and space-varying) convolutional branches to integrate heterogeneous features, while including fairness metrics as a form of regularization to improve equity across demographic groups. We propose two spatiotemporal fairness metrics, region-based fairness gap (RFG), applicable when demographic information is provided as a constant for a region, and individual-based fairness gap (IFG), applicable when a continuous distribution of demographic information is available. Experimental results on bike share and ride share datasets show that FairST can reduce inequity in demand prediction for multiple sensitive attributes (i.e. race, age, and education level), while achieving better accuracy than even state-of-the-art fairness-oblivious methods.more » « less
-
Grundy, Quinn (Ed.)Early research on the impact of COVID-19 on academic scientists suggests that disruptions to research, teaching, and daily work life are not experienced equally. However, this work has overwhelmingly focused on experiences of women and parents, with limited attention to the disproportionate impact on academic work by race, disability status, sexual identity, first-generation status, and academic career stage. Using a stratified random survey sample of early-career academics in four science disciplines ( N = 3,277), we investigated socio-demographic and career stage differences in the effect of the COVID-19 pandemic along seven work outcomes: changes in four work areas (research progress, workload, concern about career advancement, support from mentors) and work disruptions due to three COVID-19 related life challenges (physical health, mental health, and caretaking). Our analyses examined patterns across career stages as well as separately for doctoral students and for postdocs/assistant professors. Overall, our results indicate that scientists from marginalized (i.e., devalued) and minoritized (i.e., underrepresented) groups across early career stages reported more negative work outcomes as a result of COVID-19. However, there were notable patterns of differences depending on the socio-demographic identities examined. Those with a physical or mental disability were negatively impacted on all seven work outcomes. Women, primary caregivers, underrepresented racial minorities, sexual minorities, and first-generation scholars reported more negative experiences across several outcomes such as increased disruptions due to physical health symptoms and additional caretaking compared to more privileged counterparts. Doctoral students reported more work disruptions from life challenges than other early-career scholars, especially those related to health problems, while assistant professors reported more negative changes in areas such as decreased research progress and increased workload. These findings suggest that the COVID-19 pandemic has disproportionately harmed work outcomes for minoritized and marginalized early-career scholars. Institutional interventions are required to address these inequalities in an effort to retain diverse cohorts in academic science.more » « less
-
Rentería, Miguel E (Ed.)The differential progression of ten chronic overlapping pain conditions (COPC) and four comorbid mental disorders across demographic groups have rarely been reported in the literature. To fill in this gap, we conducted retrospective cohort analyses using All of Us Research Program data from 1970 to 2023. Separate cohorts were created to assess the differential patterns across sex, race, and ethnicity. Logistic regression models, controlling for demographic variables and household income level, were employed to identify significant sociodemographic factors associated with the differential progression from one COPC or mental condition to another. Among the 139 frequent disease pairs, we identified group-specific patterns in 15 progression pathways. Black or African Americans with a COPC condition had a significantly increased association in progression to other COPCs (CLBP- > IBS, CLBP- > MHA, or IBS- > MHA, OR≥1.25, adj.p ≤ 4.0x10-3) or mental disorders (CLBP- > anxiety, CLBP- > depression, MHA- > anxiety, MHA- > depression, OR≥1.25, adj.p ≤ 1.9x10-2) after developing a COPC. Females had an increased likelihood of chronic low back pain after anxiety and depression (OR≥1.12, adj.p ≤ 1.5x10-2). Additionally, the lowest income bracket was associated with an increased risk of developing another COPC from a COPC (CLBP- > MHA, IBS- > MHA, MHA- > CLBP, or MHA- > IBS, OR≥1.44, adj.p ≤ 2.6x10-2) or from a mental disorder (depression- > MHA, depression- > CLBP, anxiety- > CLBP, or anxiety- > IBS, OR≥1.50, adj.p ≤ 2.0x10-2), as well as developing a mental disorder after a COPC (CLBP- > depression, CBLP- > anxiety, MHA- > anxiety, OR≥1.37,adj.p ≤ 1.6x10−2). To our knowledge, this is the first study that unveils the sociodemographic influence on COPC progression. These findings suggest the importance of considering sociodemographic factors to achieve optimal prognostication and preemptive management of COPCs.more » « less
An official website of the United States government

