skip to main content


This content will become publicly available on June 1, 2025

Title: Information-incorporated clustering analysis of disease prevalence trends
In biomedical research the analysis of disease prevalence is of critical importance. While most of the existing prevalence studies focus on individual diseases, there has been increasing effort that jointly examines the prevalence values and their trends of multiple diseases. Such joint analysis can provide valuable insights not shared by individual-disease analysis. A critical limitation of the existing analysis is that there is a lack of attention to existing information, which has been accumulated through a large number of studies and can be valuable especially when there are a large number of diseases but the number of prevalence values for a specific disease is limited. In this study we conduct the functional clustering analysis of prevalence trends for a large number of diseases. A novel approach based on the penalized fusion technique is developed to incorporate information mined from published articles. It is innovatively designed to take into account that such information may not be fully relevant or correct. Another significant development is that statistical properties are rigorously established. Simulation is conducted and demonstrates its competitive performance. In the analysis of data from Taiwan NHIRD (National Health Insurance Research Database), new and interesting findings that differ from the existing ones are made.  more » « less
Award ID(s):
1916251 2209685
NSF-PAR ID:
10512845
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Institute of Mathematical Statistics
Date Published:
Journal Name:
The Annals of Applied Statistics
Volume:
18
Issue:
2
ISSN:
1932-6157
Page Range / eLocation ID:
1035–1050
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    An increasing number of disorders have been identified for which two or more distinct alleles in two or more genes are required to either cause the disease or to significantly modify its onset, severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of alleles underlying digenic and oligogenic diseases in individual whole exome or whole genome sequences. Information that links patient phenotypes to databases of gene–phenotype associations observed in clinical or non-human model organism research can provide useful information and improve variant prioritization for genetic diseases. Additional background knowledge about interactions between genes can be utilized to identify sets of variants in different genes in the same individual which may then contribute to the overall disease phenotype. We have developed OligoPVP, an algorithm that can be used to prioritize causative combinations of variants in digenic and oligogenic diseases, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods in the case of digenic diseases. Our results show that OligoPVP can efficiently prioritize sets of variants in digenic diseases using a phenotype-driven approach and identify etiologically important variants in whole genomes. OligoPVP naturally extends to oligogenic disease involving interactions between variants in two or more genes. It can be applied to the identification of multiple interacting candidate variants contributing to phenotype, where the action of modifier genes is suspected from pedigree analysis or failure of traditional causative variant identification.

     
    more » « less
  2. Abstract Background

    Climate change presents an imminent threat to almost all biological systems across the globe. In recent years there have been a series of studies showing how changes in climate can impact infectious disease transmission. Many of these publications focus on simulations based on in silico data, shadowing empirical research based on field and laboratory data. A synthesis work of empirical climate change and infectious disease research is still lacking.

    Methods

    We conducted a systemic review of research from 2015 to 2020 period on climate change and infectious diseases to identify major trends and current gaps of research. Literature was sourced from Web of Science and PubMed literary repositories using a key word search, and was reviewed using a delineated inclusion criteria by a team of reviewers.

    Results

    Our review revealed that both taxonomic and geographic biases are present in climate and infectious disease research, specifically with regard to types of disease transmission and localities studied. Empirical investigations on vector-borne diseases associated with mosquitoes comprised the majority of research on the climate change and infectious disease literature. Furthermore, demographic trends in the institutions and individuals published revealed research bias towards research conducted across temperate, high-income countries. We also identified key trends in funding sources for most resent literature and a discrepancy in the gender identities of publishing authors which may reflect current systemic inequities in the scientific field.

    Conclusions

    Future research lines on climate change and infectious diseases should considered diseases of direct transmission (non-vector-borne) and more research effort in the tropics. Inclusion of local research in low- and middle-income countries was generally neglected. Research on climate change and infectious disease has failed to be socially inclusive, geographically balanced, and broad in terms of the disease systems studied, limiting our capacities to better understand the actual effects of climate change on health.

    Graphical abstract 
    more » « less
  3. Abstract Coral disease is becoming increasingly problematic on reefs worldwide. However, most coral disease research has focused on the abiotic drivers of disease, potentially overlooking the role of species interactions in disease dynamics. Coral predators in particular can influence disease by breaking through protective tissues and exposing corals to infections, vectoring diseases among corals, or serving as reservoirs for pathogens. Numerous studies have demonstrated the relationship between corallivores and disease in certain contexts, but to date there has been no comprehensive synthesis of the relationships between corallivores and disease, which hinders our understanding of coral disease dynamics. To address this void, we identified 65 studies from 26 different ecoregions that examine this predator–prey-disease relationship. Observational studies found over 20 positive correlations between disease prevalence and corallivore abundance, with just four instances documenting a negative correlation between corallivores and disease. Studies found putative pathogens in corallivore guts and experiments demonstrated the ability of corallivores to vector pathogens. Corallivores were also frequently found infesting disease margins or targeting diseased tissues, but the ecological ramifications of this behavior remains unknown. We found that the impact of corallivores was taxon-dependent, with most invertebrates increasing disease incidence, prevalence, or progression; fish showing highly context-dependent effects; and xanthid crabs decreasing disease progression. Simulated wounding caused disease in many cases, but experimental wound debridement slowed disease progression in others, which could explain contrasting findings from different taxa. The negative effects of corallivores are likely to worsen as storms intensify, macroalgal cover increases, more nutrients are added to marine systems, and water temperatures increase. As diseases continue to impact coral reefs globally, a more complete understanding of the ecological dynamics of disease—including those involving coral predators—is of paramount importance to coral reef conservation and management. 
    more » « less
  4. Summary

    Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa.

     
    more » « less
  5. Land use change analysis provides valuable information for landscape monitoring, managing, and prioritizing large area conservation practices. There has been significant interest in the southeastern United States (SEUS) due to substantial land change from various economic activities since the 1940s. This study uses quantitative data from the Economic Research Service (ERS) for landscape change analysis, addressing land change among five major land types for twelve states in the SEUS from 1945 to 2012. The study also conducted a literature review using the PSALSAR framework to identify significant drivers related to land type changes from research articles within the region. The analysis showed how each land type changed over the period for each state in the time period and the percentage change for the primary drivers related to land use change. The literature review identified significant drivers of land use and land cover change (LULCC) within the SEUS. The associated drivers were categorized into natural and artificial drivers, then further subdivided into eight categories related to land type changes in the region. A schematic diagram was developed to show land type changes that impacted environmental changes from various studies in the SEUS. The results concluded that Forest land accounted for 12% change and agricultural land for 20%; population growth in the region is an average of 2.59% annually. It also concluded that the need for research to understand past land use trends, direction and magnitude of land cover changes is essential. Significant drivers such as urban expansion and agriculture are critical to the impending use of land in the region; their impacts are attributed to environmental changes in the region and must be monitored. 
    more » « less