skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Fritsche, Lars G."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Large‐scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well‐characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease‐gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest givensummary resultsfrom standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease‐gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.

    more » « less
  2. Biobanks linked to electronic health records provide rich resources for health‐related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large‐scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis‐generating studies of disease‐treatment, disease‐exposure, and disease‐gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank‐based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank‐based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.

    more » « less