skip to main content


Title: The impact of cross-kingdom molecular forensics on genetic privacy
Abstract

Recent advances in metagenomic technology and computational prediction may inadvertently weaken an individual’s reasonable expectation of privacy. Through cross-kingdom genetic and metagenomic forensics, we can already predict at least a dozen human phenotypes with varying degrees of accuracy. There is also growing potential to detect a “molecular echo” of an individual’s microbiome from cells deposited on public surfaces. At present, host genetic data from somatic or germ cells provide more reliable information than microbiome samples. However, the emerging ability to infer personal details from different microscopic biological materials left behind on surfaces requires in-depth ethical and legal scrutiny. There is potential to identify and track individuals, along with new, surreptitious means of genetic discrimination. This commentary underscores the need to update legal and policy frameworks for genetic privacy with additional considerations for the information that could be acquired from microbiome-derived data. The article also aims to stimulate ubiquitous discourse to ensure the protection of genetic rights and liberties in the post-genomic era.

 
more » « less
NSF-PAR ID:
10230004
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Microbiome
Volume:
9
Issue:
1
ISSN:
2049-2618
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    As urbanization continues to increase, it is expected that two‐thirds of the human population will reside in cities by 2050. Urbanization fragments and degrades natural landscapes, threatening wildlife including economically important species such as bees. In this study, we employ whole genome sequencing to characterize the population genetics, metagenome and microbiome, and environmental stressors of a common wild bee,Ceratina calcarata. Population genomic analyses revealed the presence of low genetic diversity and elevated levels of inbreeding. Through analyses of isolation by distance, resistance, and environment across urban landscapes, we found that green spaces including shrubs and scrub were the most optimal pathways for bee dispersal, and conservation efforts should focus on preserving these land traits to maintain high connectivity across sites for wild bees. Metagenomic analyses revealed landscape sites exhibiting urban heat island effects, such as high temperatures and development but low precipitation and green space, had the highest taxa alpha diversity across all domains even when isolating for potential pathogens. Notably, the integration of population and metagenomic data showed that reduced connectivity in urban areas is not only correlated with lower relatedness among individuals but is also associated with increased pathogen diversity, exposing vulnerable urban bees to more pathogens. Overall, our combined population and metagenomic approach found significant environmental variation in bee microbiomes and nutritional resources even in the absence of genetic differentiation, as well as enabled the potential early detection of stressors to bee health.

     
    more » « less
  2. Abstract Motivation

    Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference genomes to identify microbes in the sample. Since such a collection of reference genomes is very large, the approach often needs high-end computing machines with large memory which is not often available to researchers. Alternative approaches follow an alignment-free methodology where the presence of a microbe is predicted using the information about the unique k-mers present in the microbial genomes. However, such approaches suffer from high false positives due to trading off the value of k with the computational resources. In this article, we propose a highly efficient metagenomic sequence classification (MSC) algorithm that is a hybrid of both approaches. Instead of aligning reads to the full genomes, MSC aligns reads onto a set of carefully chosen, shorter and highly discriminating model sequences built from the unique k-mers of each of the reference sequences.

    Results

    Microbiome researchers are generally interested in two objectives of a taxonomic classifier: (i) to detect prevalence, i.e. the taxa present in a sample, and (ii) to estimate their relative abundances. MSC is primarily designed to detect prevalence and experimental results show that MSC is indeed a more effective and efficient algorithm compared to the other state-of-the-art algorithms in terms of accuracy, memory and runtime. Moreover, MSC outputs an approximate estimate of the abundances.

    Availability and implementation

    The implementations are freely available for non-commercial purposes. They can be downloaded from https://drive.google.com/open?id=1XirkAamkQ3ltWvI1W1igYQFusp9DHtVl.

     
    more » « less
  3. Abstract Motivation

    metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability.

    Results

    we developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred’s utility in analyzing microbiome metal requirements.

    Availability and implementation

    mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Untreated tooth decays affect nearly one third of the world and is the most prevalent disease burden among children. The disease progression of tooth decay is multifactorial and involves a prolonged decrease in pH, resulting in the demineralization of tooth surfaces. Bacterial species that are capable of fermenting carbohydrates contribute to the demineralization process by the production of organic acids. The combined use of machine learning and 16s rRNA sequencing offers the potential to predict tooth decay by identifying the bacterial community that is present in an individual’s oral cavity. A few recent studies have demonstrated machine learning predictive modeling using 16s rRNA sequencing of oral samples, but they lack consideration of the multifactorial nature of tooth decay, as well as the role of fungal species within their models. Here, the oral microbiome of mother–child dyads (both healthy and caries-active) was used in combination with demographic–environmental factors and relevant fungal information to create a multifactorial machine learning model based on the LASSO-penalized logistic regression. For the children, not only were several bacterial species found to be caries-associated ( Prevotella histicola, Streptococcus mutans , and Rothia muciloginosa ) but also Candida detection and lower toothbrushing frequency were also caries-associated. Mothers enrolled in this study had a higher detection of S. mutans and Candida and a higher plaque index. This proof-of-concept study demonstrates the significant impact machine learning could have in prevention and diagnostic advancements for tooth decay, as well as the importance of considering fungal and demographic–environmental factors. 
    more » « less
  5. null (Ed.)
    Analysis of municipal wastewater, or sewage for public health applications is a rapidly expanding field aimed at understanding emerging epidemiological trends, including human and disease migration. The newly gained ability to extract and analyze genetic material from wastewater poses important societal and ethical questions, including: How to safeguard data? Who owns genetic data recovered from wastewater? What are the ethical and legal issues surrounding its use? In the U.S., both corporate and legal policies regarding privacy have been historically reactive instead of proactive. In wastewater-based epidemiology (WBE), the pace of innovation has outpaced the ability of social and legal mechanisms to keep up. To address this discrepancy, early and robust discussions of the research, policies, and ethics surrounding WBE analysis and genetics is needed. This paper contributes to this discussion by examining ownership issues for human genetic data recovered from wastewater and the uses to which it may be put. We focus particularly on the risks associated with personally identifiable data, highlighting potential risks, relevant privacy-enhancing technologies, and appropriate ethics. The paper proposes an approach for people conducting WBE studies to help them systematically consider the ethical and privacy implications of their work. 
    more » « less