skip to main content


Title: Exposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction

Joint effects of genetic and environmental factors have been increasingly recognized in the development of many complex human diseases. Despite the popularity of case‐control and case‐only designs, longitudinal cohort studies that can capture time‐varying outcome and exposure information have long been recommended for gene–environment (G × E) interactions. To date, literature on sampling designs for longitudinal studies of G × E interaction is quite limited. We therefore consider designs that can prioritize a subsample of the existing cohort for retrospective genotyping on the basis of currently available outcome, exposure, and covariate data. In this work, we propose stratified sampling based on summaries of individual exposures and outcome trajectories and develop a full conditional likelihood approach for estimation that adjusts for the biased sample. We compare the performance of our proposed design and analysis with combinations of different sampling designs and estimation approaches via simulation. We observe that the full conditional likelihood provides improved estimates for the G × E interaction and joint exposure effects over uncorrected complete‐case analysis, and the exposure enriched outcome trajectory dependent design outperforms other designs in terms of estimation efficiency and power for detection of the G × E interaction. We also illustrate our design and analysis using data from the Normative Aging Study, an ongoing longitudinal cohort study initiated by the Veterans Administration in 1963. Copyright © 2017 John Wiley & Sons, Ltd.

 
more » « less
NSF-PAR ID:
10238092
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
36
Issue:
18
ISSN:
0277-6715
Page Range / eLocation ID:
p. 2947-2960
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Case-crossover designs are widely used to study short-term exposure effects on the risk of acute adverse health events. While the frequentist literature on this topic is vast, there is no Bayesian work in this general area. The contribution of this paper is twofold. First, the paper establishes Bayesian equivalence results that require characterization of the set of priors under which the posterior distributions of the risk ratio parameters based on a case-crossover and time-series analysis are identical. Second, the paper studies inferential issues under case-crossover designs in a Bayesian framework. Traditionally, a conditional logistic regression is used for inference on risk-ratio parameters in case-crossover studies. We consider instead a more general full likelihood-based approach which makes less restrictive assumptions on the risk functions. Formulation of a full likelihood leads to growth in the number of parameters proportional to the sample size. We propose a semi-parametric Bayesian approach using a Dirichlet process prior to handle the random nuisance parameters that appear in a full likelihood formulation. We carry out a simulation study to compare the Bayesian methods based on full and conditional likelihood with the standard frequentist approaches for case-crossover and time-series analysis. The proposed methods are illustrated through the Detroit Asthma Morbidity, Air Quality and Traffic study, which examines the association between acute asthma risk and ambient air pollutant concentrations.

     
    more » « less
  2. Abstract

    We consider a regression analysis of longitudinal data in the presence of outcome‐dependent observation times and informative censoring. Existing approaches commonly require a correct specification of the joint distribution of longitudinal measurements, the observation time process, and informative censoring time under the joint modeling framework and can be computationally cumbersome due to the complex form of the likelihood function. In view of these issues, we propose a semiparametric joint regression model and construct a composite likelihood function based on a conditional order statistics argument. As a major feature of our proposed methods, the aforementioned joint distribution is not required to be specified, and the random effect in the proposed joint model is treated as a nuisance parameter. Consequently, the derived composite likelihood bypasses the need to integrate over the random effect and offers the advantage of easy computation. We show that the resulting estimators are consistent and asymptotically normal. We use simulation studies to evaluate the finite‐sample performance of the proposed method and apply it to a study of weight loss data that motivated our investigation.

     
    more » « less
  3. Abstract

    We propose a model-based clustering method for high-dimensional longitudinal data via regularization in this paper. This study was motivated by the Trial of Activity in Adolescent Girls (TAAG), which aimed to examine multilevel factors related to the change of physical activity by following up a cohort of 783 girls over 10 years from adolescence to early adulthood. Our goal is to identify the intrinsic grouping of subjects with similar patterns of physical activity trajectories and the most relevant predictors within each group. The previous analyses conducted clustering and variable selection in two steps, while our new method can perform the tasks simultaneously. Within each cluster, a linear mixed-effects model (LMM) is fitted with a doubly penalized likelihood to induce sparsity for parameter estimation and effect selection. The large-sample joint properties are established, allowing the dimensions of both fixed and random effects to increase at an exponential rate of the sample size, with a general class of penalty functions. Assuming subjects are drawn from a Gaussian mixture distribution, model effects and cluster labels are estimated via a coordinate descent algorithm nested inside the Expectation-Maximization (EM) algorithm. Bayesian Information Criterion (BIC) is used to determine the optimal number of clusters and the values of tuning parameters. Our numerical studies show that the new method has satisfactory performance and is able to accommodate complex data with multilevel and/or longitudinal effects.

     
    more » « less
  4. Abstract

    The availability of vast amounts of longitudinal data from electronic health records (EHRs) and personal wearable devices opens the door to numerous new research questions. In many studies, individual variability of a longitudinal outcome is as important as the mean. Blood pressure fluctuations, glycemic variations, and mood swings are prime examples where it is critical to identify factors that affect the within‐individual variability. We propose a scalable method, within‐subject variance estimator by robust regression (WiSER), for the estimation and inference of the effects of both time‐varying and time‐invariant predictors on within‐subject variance. It is robust against the misspecification of the conditional distribution of responses or the distribution of random effects. It shows similar performance as the correctly specified likelihood methods but is 103∼ 105times faster. The estimation algorithm scales linearly in the total number of observations, making it applicable to massive longitudinal data sets. The effectiveness of WiSER is evaluated in extensive simulation studies. Its broad applicability is illustrated using the accelerometry data from the Women's Health Study and a clinical trial for longitudinal diabetes care.

     
    more » « less
  5. Broadband infrastructure in urban parks may serve crucial functions including an amenity to boost overall park use and a bridge to propagate WiFi access into contiguous neighborhoods. This project: SCC:PG Park WiFi as a BRIDGE to Community Resilience has developed a new model —Build Resilience through the Internet and Digital Greenspace Exposure, leveraging off-the-shelf WiFi technology, novel algorithms, community assets, and local partnerships to lower greenspace WiFi costs. This interdisciplinary work leverages: computer science, information studies, landscape architecture, and public health. Collaboration methodologies and relational definitions across disciplines are still nascent —especially when paired with civic-engaged, applied research. Student researchers (UG/Grad) are excellent partners in bridging disciplinary barriers and constraints. Their capacity to assimilate multiple frameworks has produced refinements to the project’s theoretical lenses and suggested novel socio-technical methodology improvements. Further, they are excellent ambassadors to community partners and stakeholders. In BRIDGE, we tested two mechanisms to augment student research participation. In both, we leveraged a classic, curriculum-based model named the Partnership for Action Learning in Sustainability program (PALS). This campus-wide, community-engaged initiative pairs faculty and students with community partners. PALS curates economic, environmental, and social sustainability challenges and scopes projects to customize appropriate coursework that addresses identified challenges. Outcomes include: literature searches, wireframes, and design plans that target solutions to civic problems. Constraints include the short semester timeframe and curriculum-learning-outcome constraints. (1) On BRIDGE, Dr. Kweon executed a semester-based Landscape Architecture PALS 400-level-studio. 18 undergraduates conducted in-class and in-field work to assess community needs and proposed design solutions for future park-wide WiFi. Research topics included: community-park history, neighborhood demographics, case-study analysis, and land-cover characteristics. The students conducted an in-Park, community engagement session —via interactive posterboard surveys, to gain input on what park amenities might be redesigned or added to promote WiFi use. The students then produced seven re-design plans; one included a café/garden, with an eco-corridor that integrated technology with nature. (2) From the classic, curriculum-based PALS model we created a summer-intensive for our five research assistants, to stimulate interdisciplinary collaboration in their research tasks and co-analysis of project data products: experimental technical WiFi-setup, community survey results, and stakeholder needs-assessments. Students met weekly with each other and team leadership, exchanged journal articles, and attended joint research events. This model shows promise for integrating students more formally into an interdisciplinary research project. An end-of-intensive focus group highlighted, from the students’ perspective, the pro/cons of this model. Results: In contrasting the two mechanisms, our results include: Model 1 is tried-and-trued and produces standardized, reliable products. However, as work is group based, student independence is limited —to explore topics/themes of interest. Civic groups are typically thrilled with the diversity of action plans produced. Model 2 provides greater independence in student-learning outcomes, fosters interdisciplinary, “dictionary-building” that can be used by the full team, deepens methodological approaches, and allows for student stipend payments. Lessons learned: intensive time frame needed more research team support and ideally should be extended, when possible, over the full project-span. UMD-IRB#1785365-4; NSF-award: 2125526. 
    more » « less