skip to main content


Search for: All records

Award ID contains: 1633130

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Nocturnal hypoglycemia is a common phenomenon among patients with diabetes and can lead to a broad range of adverse events and complications. Identifying factors associated with hypoglycemia can improve glucose control and patient care. We propose a repeated measures random forest (RMRF) algorithm that can handle nonlinear relationships and interactions and the correlated responses from patients evaluated over several nights. Simulation results show that our proposed algorithm captures the informative variable more often than naïvely assuming independence. RMRF also outperforms standard random forest and extremely randomized trees algorithms. We demonstrate scenarios where RMRF attains greater prediction accuracy than generalized linear models. We apply the RMRF algorithm to analyze a diabetes study with 2524 nights from 127 patients with type 1 diabetes. We find that nocturnal hypoglycemia is associated with HbA1c, bedtime blood glucose (BG), insulin on board, time system activated, exercise intensity, and daytime hypoglycemia. The RMRF can accurately classify nights at high risk of nocturnal hypoglycemia.

     
    more » « less
  2. With the collection and availability of data on student academic performance and academic background, higher education institutions have recently stepped up initiatives in and infrastructure for learning analytics, leveraging this deluge of data to inform student success. With definitions of student success varying from analyses of what predicts levels of specific career readiness competencies to degree completion, the environment is a fertile ground for statistical practice and collaboration among a statistically savvy yet diverse clientele of instructors, programme advisors and administrators. In this paper, we discuss our experiences to this end through a consulting project evaluating the impact of writing course class size on students achieving a graduation writing requirement. In detailing the workflow for and challenges in this project, we share aspects of statistical communication and reporting, applications of innovative statistical methodology developed by our research group for handling confounding factors and correlated inputs and training through an interdisciplinary applied institutional research professional development programme. This paper illustrates how instilling an appreciation for statistical inference through each of these components is invaluable for capturing institutional buy‐in for data‐informed decision‐making in general statistical practice.

     
    more » « less
  3. Individuals may respond to treatments with significant heterogeneity. To optimize the treatment effect, it is necessary to recommend treatments based on individual characteristics. Existing methods in the literature for learning individualized treatment regimes are usually designed for randomized studies with binary treatments. In this study, we propose an algorithm to extend random forest of interaction trees (Su et al., 2009) to accommodate multiple treatments. By integrating the generalized propensity score into the interaction tree growing process, the proposed method can handle both randomized and observational study data with multiple treatments. The performance of the proposed method, relative to existing approaches in the literature, is evaluated through simulation studies. The proposed method is applied to an assessment of multiple voluntary educational programmes at a large public university.

     
    more » « less
  4. null (Ed.)
    Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a correct specification of the propensity score model. Logistic regression and, with increasing popularity, machine learning tools are used to estimate propensity scores. We introduce a stacked generalization ensemble learning approach to improve propensity score estimation by fitting a meta learner on the predictions of a suitable set of diverse base learners. We perform a comprehensive Monte Carlo simulation study, implementing a broad range of scenarios that mimic characteristics of typical data sets in educational studies. The population average treatment effect is estimated using the propensity score in Inverse Probability of Treatment Weighting. Our proposed stacked ensembles, especially using gradient boosting machines as a meta learner trained on a set of 12 base learner predictions, led to superior reduction of bias compared to the current state-of-the-art in propensity score estimation. Further, our simulations imply that commonly used balance measures (averaged standardized absolute mean differences) might be misleading as propensity score model selection criteria. We apply our proposed model - which we call GBM-Stack - to assess the population average treatment effect of a Supplemental Instruction (SI) program in an introductory psychology (PSY 101) course at San Diego State University. Our analysis provides evidence that moving the whole population to SI attendance would on average lead to 1.69 times higher odds to pass the PSY 101 class compared to not offering SI, with a 95% bootstrap confidence interval of (1.31, 2.20). 
    more » « less
  5. Observational studies require matching across groups over multiple confounding variables. Across the literature, matching algorithms fail to handle the issue of missing data. Consequently, missing values are regularly imputed prior to being considered in the matching process. However, imputing is not always practical, forcing us to drop an observation due to the deficiency of the chosen algorithm, decreasing the power of the study and possibly failing to capture crucial latent information. We propose a missing data mechanism to incorporate within an iterative multivariate matching method. The underlying framework utilizes random forest as a natural tool in constructing a distance matrix, implemented with surrogate splits where there might be missing values. The output is then easily fed into an optimal matching algorithm. We apply this method to evaluate the effectiveness of supplemental instruction (SI) sessions, a voluntary program where students seek additional help, in a large enrollment, bottleneck introductory business statistics course. This is an observational study with two groups, those who attend multiple SI sessions and those who do not, and, as typical in educational data mining, challenged by missing data. Additionally, we perform a data simulation on missingness to further demonstrate the efficacy of our proposed approach.

     
    more » « less
  6. We expand methods for estimating an optimal treatment regime (OTR) from the personalized medicine literature to educational data mining applications. As part of this development, we detail and modify the current state-of-the-art, assess the efcacy of the approaches for student success studies, and provide practitioners the machinery to apply the methods in their specifc problems. Our particular interest is to estimate an optimal treatment regime for students enrolled in an introductory statistics course at San Diego State University (SDSU). The available treatments are combinations of three programs SDSU implemented to foster student success in this large enrollment, bottleneck STEM course. We leverage tree-based reinforcement learning approaches based on either an inverse probability-weighted purity measure or an augmented probability-weighted purity measure. The thereby deduced OTR promises to signifcantly increase the average grade in the introductory course and also reveals the need for program recommendations to students as only very few, on their own, selected their optimal treatment. 
    more » « less