skip to main content


Title: Spatiotemporal Tracking of SARS-CoV-2 Variants using informative subtype markers and association graphs
Viral subtyping can facilitate visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Understanding the virus's evolution spatiotemporally can help forensic strategies. We have identified mutation variation within SARS-CoV-2 sequences via an entropy measure followed by frequency analysis. These signatures, Informative Subtype Markers (ISMs), define a compact set of nucleotide sites that characterize the most variable (and thus most informative) positions in the viral genomes sequenced from different individuals. Using these ISMs, we show that we can use them for a variety of downstream analyses, such as comparing countries' subtype compositions. We present association graphs as a visualization tool to connect different ISMs based on their co-occurrence across different individuals. In particular, we investigate dominant ISMs for different locations, across different factors such as gender and age.  more » « less
Award ID(s):
1936791
NSF-PAR ID:
10291896
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Spatiotemporal Tracking of SARS-CoV-2 Variants using informative subtype markers and association graphs
Page Range / eLocation ID:
516 to 519
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper. 
    more » « less
  2. Abstract

    Identifying patterns of pathogen infection in natural systems is crucial to understanding mechanisms of host–pathogen interactions. In this study, we explored how Junonia coenia densovirus (JcDV) infection varies over space and time in populations of the Melissa blue butterfly (Lycaeides melissa: Lycaenidae) using two different host plants. Collections ofL. melissaadults from multiple populations and years, along with host plant tissue and community samples of arthropods found on host plants, were screened to determine JcDV prevalence and load. Additionally, we sampled at multiple time points within a singleL. melissaflight season to investigate intra‐annual variation in infection patterns.

    We found population‐specific variation in viral prevalence ofL. melissaacross collection years, with historical samples potentially having higher viral prevalence than contemporary samples, although host plant diet was not informative for these patterns. Patterns of infection across multiple generations within a flight season showed that late‐season samples had a higher proportion of JcDV‐positive individuals, suggesting an accumulation of virus over the season. Sequence data from a segment of the JcDV capsid gene showed a lack of viral genetic diversity betweenL. melissacollected from different localities, and little to no viral particles were found in the surrounding environment.

    Our discovery of temporal variation in infection suggests that multiple sampling efforts must be made when describing pathogen prevalence in multivoltine hosts. Our findings represent an important first step towards further exploration of the ecological factors mediating disease prevalence and host‐specific variability of infection in wild insect populations.

     
    more » « less
  3. Abstract

    Influenza A viruses in wild birds pose threats to the poultry industry, wild birds, and human health under certain conditions. Of particular importance are wild waterfowl, which are the primary reservoir of low‐pathogenicity influenza viruses that ultimately cause high‐pathogenicity outbreaks in poultry farms. Despite much work on the drivers of influenza A virus prevalence, the underlying viral subtype dynamics are still mostly unexplored. Nevertheless, understanding these dynamics, particularly for the agriculturally significant H5 and H7 subtypes, is important for mitigating the risk of outbreaks in domestic poultry farms. Here, using an expansive surveillance database, we take a large‐scale look at the spatial, temporal, and taxonomic drivers in the prevalence of these two subtypes among influenza A‐positive wild waterfowl. We document spatiotemporal trends that are consistent with past work, particularly an uptick in H5 viruses in late autumn and H7 viruses in spring. Interestingly, despite large species differences in temporal trends in overall influenza A virus prevalence, we document only modest differences in the relative abundance of these two subtypes and little, if any, temporal differences among species. As such, it appears that differences in species' phenology, physiology, and behaviors that influence overall susceptibility to influenza A viruses play a much lesser role in relative susceptibility to different subtypes. Instead, species are likely to freely pass viruses among each other regardless of subtype. Importantly, despite the similarities among species documented here, individual species still may play important roles in moving viruses across large geographic areas or sustaining local outbreaks through their different migratory behaviors.

     
    more » « less
  4. Abstract Background

    The term virus ‘spillover’ embodies a highly complex phenomenon and is often used to refer to viral transmission from a primary reservoir host to a new, naïve yet susceptible and permissive host species. Spillover transmission can result in a virus becoming pathogenic, causing disease and death to the new host if successful infection and transmission takes place.

    Main text

    The scientific literature across diverse disciplines has used the terms virus spillover, spillover transmission, cross-species transmission, and host shift almost indistinctly to imply the complex process of establishment of a virus from an original host (source/donor) to a naïve host (recipient), which have close or distant taxonomic or evolutionary ties. Spillover transmission may result in unsuccessful onward transmission, if the virus dies off before propagation. Alternatively, successful viral establishment in the new host can occur if subsequent secondary transmission among individuals of the same novel species and among other sympatric susceptible species occurred. As such, virus spillover transmission is a common yet highly complex phenomenon that encompasses multiple subtle stages that can be deconstructed to be studied separately to better understand the drivers of disease emergence. Rabies virus (RABV) is a well-documented viral pathogen which still inflicts heavy impact on humans, companion animals, wildlife, and livestock throughout Latin America due substantial spatial temporal and ecological—natural and expansional—overlap with several virus reservoir hosts. Thereby, the rabies disease system represents a robust avenue through which the drivers and uncertainties surrounding spillover transmission can be unravel at its different subtle stages to better understand how they may be affected by coarse, medium, and fine scale variables.

    Conclusions

    The continued study of viral spillover transmission necessitates the elucidation of its complexities to better assess the cross-scale impacts of ecological forces linked to the propensity of spillover success. Improving capacities to reconstruct and predict spillover transmission would prevent public health impacts on those most at risk populations across the globe.

    Graphical Abstract: 
    more » « less
  5. Data from many real-world applications can be naturally represented by multi-view networks where the different views encode different types of relationships (e.g., friendship, shared interests in music, etc.) between real-world individuals or entities. There is an urgent need for methods to obtain low-dimensional, information preserving and typically nonlinear embeddings of such multi-view networks. However, most of the work on multi-view learning focuses on data that lack a network structure, and most of the work on network embeddings has focused primarily on single-view networks. Against this background, we consider the multi-view network representation learning problem, i.e., the problem of constructing low-dimensional information preserving embeddings of multi-view networks. Specifically, we investigate a novel Generative Adversarial Network (GAN) framework for Multi-View Network Embedding, namely MEGAN, aimed at preserving the information from the individual network views, while accounting for connectivity across (and hence complementarity of and correlations between) different views. The results of our experiments on two real-world multi-view data sets show that the embeddings obtained using MEGAN outperform the state-of-the-art methods on node classification, link prediction and visualization tasks.

     
    more » « less