Abstract Background Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative—an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients ( N =36,736). Methods We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. Results We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals’ SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p -value=2.32×10 −16 , EAA p -value=6.73×10 −11 ). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. Conclusions Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping.
more »
« less
The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities
Biobanks linked to electronic health records provide rich resources for health‐related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large‐scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis‐generating studies of disease‐treatment, disease‐exposure, and disease‐gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank‐based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank‐based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
more »
« less
- Award ID(s):
- 1712933
- PAR ID:
- 10453676
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Statistics in Medicine
- Volume:
- 39
- Issue:
- 6
- ISSN:
- 0277-6715
- Page Range / eLocation ID:
- p. 773-800
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Research and experimentation using big data sets, specifically large sets of electronic health records (EHR) and social media data, is demonstrating the potential to understand the spread of diseases and a variety of other issues. Applications of advanced algorithms, machine learning, and artificial intelligence indicate a potential for rapidly advancing improvements in public health. For example, several reports indicate that social media data can be used to predict disease outbreak and spread (Brown, 2015). Since real-world EHR data has complicated security and privacy issues preventing it from being widely used by researchers, there is a real need to synthetically generate EHR data that is realistic and representative. Current EHR generators, such as Syntheaä (Walonoski et al., 2018) only simulate and generate pure medical-related data. However, adding patients’ social media data with their simulated EHR data would make combined data more comprehensive and realistic for healthcare research. This paper presents a patients’ social media data generator that extends an EHR data generator. By adding coherent social media data to EHR data, a variety of issues can be examined for emerging interests, such as where a contagious patient may have been and others with whom they may have been in contact. Social media data, specifically Twitter data, is generated with phrases indicating the onset of symptoms corresponding to the synthetically generated EHR reports of simulated patients. This enables creation of an open data set that is scalable up to a big-data size, and is not subject to the security, privacy concerns, and restrictions of real healthcare data sets. This capability is important to the modeling and simulation community, such as scientists and epidemiologists who are developing algorithms to analyze the spread of diseases. It enables testing a variety of analytics without revealing real-world private patient information.more » « less
-
Abstract Taking an action research approach, we engaged in fieldwork with school-based behavioral health care teams to: observe record keeping practices, design and deploy a prototype system addressing key challenges, and reflect on its use. We describe the challenges of capturing behavioral data using both paper and electronic records. Creating records of behaviors requires direct observation, and as a result the record keeping responsibility is challenging to distribute across a care team. Behavioral data on paper must be transferred and prepared for reporting, both inside the organization and to stakeholders outside of the organization. In prototyping a computerized working record, we targeted user needs for capturing details of a behavioral incident in the moment. Challenges persisted through the transition from paper to our prototype, and based on these empirical findings over two years of fieldwork, we present five tensions in representing behavioral data in an electronic health record. These tensions reflect the differences between entering behavioral data into the record for intraorganizational use versus interorganizational use.more » « less
-
The use of network analysis as a tool has increased exponentially as more clinical researchers see the benefits of network data for modeling of infectious disease transmission or translational activities in a variety of areas, including patient-caregiving teams, provider networks, patient-support networks, and adoption of health behaviors or treatments, to name a few. Yet, relational data such as network data carry a higher risk of deductive disclosure. Cases of reidentification have occurred and this is expected to become more common as computational ability increases. Recent data sharing policies aim to promote reproducibility, support replicability, and protect federal investment in the effort to collect these research data by making them available for secondary analyses. However, typical practices to protect individual-level clinical research data may not be sufficiently protective of participant privacy in the case of network data, nor in some cases do they permit secondary data analysis. When sharing data, researchers must balance security, accessibility, reproducibility, and adaptability (suitability for secondary analyses). Here, we provide background about applying network analysis to health and clinical research, describe the pros and cons of applying typical practices for sharing clinical data to network data, and provide recommendations for sharing network data.more » « less
-
null (Ed.)Background Wearable technology, such as smartwatches, can capture valuable patient-generated data and help inform patient care. Electronic health records provide logical and practical platforms for including such data, but it is necessary to evaluate the way the data are presented and visualized. Objective The aim of this study is to evaluate a graphical interface that displays patients’ health data from smartwatches, mimicking the integration within the environment of electronic health records. Methods A total of 12 health care professionals evaluated a simulated interface using a usability scale questionnaire, testing the clarity of the interface, colors, usefulness of information, navigation, and readability of text. Results The interface was positively received, with 14 out of the 16 questions generating a score of 5 or greater among at least 75% of participants (9/12). On an 8-point Likert scale, the highest rated features of the interface were quick turnaround times (mean score 7.1), readability of the text (mean score 6.8), and use of terminology/abbreviations (mean score 6.75). Conclusions Collaborating with health care professionals to develop and refine a graphical interface for visualizing patients’ health data from smartwatches revealed that the key elements of the interface were acceptable. The implementation of such data from smartwatches and other mobile devices within electronic health records should consider the opinions of key stakeholders as the development of this platform progresses.more » « less
An official website of the United States government
