skip to main content

Title: Standardizing data reporting in the research community to enhance the utility of open data for SARS-CoV-2 wastewater surveillance
SARS-CoV-2 RNA detection in wastewater is being rapidly developed and adopted as a public health monitoring tool worldwide. With wastewater surveillance programs being implemented across many different scales and by many different stakeholders, it is critical that data collected and shared are accompanied by an appropriate minimal amount of meta-information to enable meaningful interpretation and use of this new information source and intercomparison across datasets. While some databases are being developed for specific surveillance programs locally, regionally, nationally, and internationally, common globally-adopted data standards have not yet been established within the research community. Establishing such standards will require national and international consensus on what meta-information should accompany SARS-CoV-2 wastewater measurements. To establish a recommendation on minimum information to accompany reporting of SARS-CoV-2 occurrence in wastewater for the research community, the United States National Science Foundation (NSF) Research Coordination Network on Wastewater Surveillance for SARS-CoV-2 hosted a workshop in February 2021 with participants from academia, government agencies, private companies, wastewater utilities, public health laboratories, and research institutes. This report presents the primary two outcomes of the workshop: (i) a recommendation on the set of minimum meta-information that is needed to confidently interpret wastewater SARS-CoV-2 data, and (ii) insights from workshop discussions more » on how to improve standardization of data reporting. « less
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Award ID(s):
2038087 2028564 2029025 2027679
Publication Date:
Journal Name:
Environmental Science: Water Research & Technology
Sponsoring Org:
National Science Foundation
More Like this
  1. Wastewater surveillance for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an emerging approach to help identify the risk of a coronavirus disease (COVID-19) outbreak. This tool can contribute to public health surveillance at both community (wastewater treatment system) and institutional (e.g., colleges, prisons, and nursing homes) scales. This paper explores the successes, challenges, and lessons learned from initial wastewater surveillance efforts at colleges and university systems to inform future research, development and implementation. We present the experiences of 25 college and university systems in the United States that monitored campus wastewater for SARS-CoV-2 during the fall 2020 academicmore »period. We describe the broad range of approaches, findings, resources, and impacts from these initial efforts. These institutions range in size, social and political geographies, and include both public and private institutions. Our analysis suggests that wastewater monitoring at colleges requires consideration of local information needs, sewage infrastructure, resources for sampling and analysis, college and community dynamics, approaches to interpretation and communication of results, and follow-up actions. Most colleges reported that a learning process of experimentation, evaluation, and adaptation was key to progress. This process requires ongoing collaboration among diverse stakeholders including decision-makers, researchers, faculty, facilities staff, students, and community members.« less
  2. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, andmore »it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper.« less
  3. Abstract Background Wastewater-based epidemiology (WBE) for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) can be an important source of information for coronavirus disease 2019 (COVID-19) management during and after the pandemic. Currently, governments and transportation industries around the world are developing strategies to minimize SARS-CoV-2 transmission associated with resuming activity. This study investigated the possible use of SARS-CoV-2 RNA wastewater surveillance from airline and cruise ship sanitation systems and its potential use as a COVID-19 public health management tool. Methods Aircraft and cruise ship wastewater samples (n = 21) were tested for SARS-CoV-2 using two virus concentration methods, adsorption–extraction by electronegative membranemore »(n = 13) and ultrafiltration by Amicon (n = 8), and five assays using reverse-transcription quantitative polymerase chain reaction (RT-qPCR) and RT-droplet digital PCR (RT-ddPCR). Representative qPCR amplicons from positive samples were sequenced to confirm assay specificity. Results SARS-CoV-2 RNA was detected in samples from both aircraft and cruise ship wastewater; however concentrations were near the assay limit of detection. The analysis of multiple replicate samples and use of multiple RT-qPCR and/or RT-ddPCR assays increased detection sensitivity and minimized false-negative results. Representative qPCR amplicons were confirmed for the correct PCR product by sequencing. However, differences in sensitivity were observed among molecular assays and concentration methods. Conclusions The study indicates that surveillance of wastewater from large transport vessels with their own sanitation systems has potential as a complementary data source to prioritize clinical testing and contact tracing among disembarking passengers. Importantly, sampling methods and molecular assays must be further optimized to maximize detection sensitivity. The potential for false negatives by both wastewater testing and clinical swab testing suggests that the two strategies could be employed together to maximize the probability of detecting SARS-CoV-2 infections amongst passengers.« less
  4. In response to COVID-19, the international water community rapidly developed methods to quantify the SARS-CoV-2 genetic signal in untreated wastewater. Wastewater surveillance using such methods has the potential to complement clinical testing in assessing community health. This interlaboratory assessment evaluated the reproducibility and sensitivity of 36 standard operating procedures (SOPs), divided into eight method groups based on sample concentration approach and whether solids were removed. Two raw wastewater samples were collected in August 2020, amended with a matrix spike (betacoronavirus OC43), and distributed to 32 laboratories across the U.S. Replicate samples analyzed in accordance with the project's quality assurance planmore »showed high reproducibility across the 36 SOPs: 80% of the recovery-corrected results fell within a band of ±1.15 log 10 genome copies per L with higher reproducibility observed within a single SOP (standard deviation of 0.13 log 10 ). The inclusion of a solids removal step and the selection of a concentration method did not show a clear, systematic impact on the recovery-corrected results. Other methodological variations ( e.g. , pasteurization, primer set selection, and use of RT-qPCR or RT-dPCR platforms) generally resulted in small differences compared to other sources of variability. These findings suggest that a variety of methods are capable of producing reproducible results, though the same SOP or laboratory should be selected to track SARS-CoV-2 trends at a given facility. The methods showed a 7 log 10 range of recovery efficiency and limit of detection highlighting the importance of recovery correction and the need to consider method sensitivity when selecting methods for wastewater surveillance.« less
  5. Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. Theymore »increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and include practical targets and criteria for the publication and digitization of taxonomic concept representations and alignments in taxonomic treatments, checklists, and backbones. As a motivating case, consider the abundantly sampled North American deer mouse— Peromyscus maniculatus (Wagner 1845)—which was recently split from one continental species into five more narrowly defined forms, so that the name P. maniculatus is now only applied east of the Mississippi River (Bradley et al. 2019, Greenbaum et al. 2019). That single change instantly rendered ambiguous ~7% of North American mammal records in the Global Biodiversity Information Facility (n=242,663, downloaded 2021-06-04; 2021) and ⅓ of all National Ecological Observatory Network (NEON) small mammal samples (n=10,256, downloaded 2021-06-27). While this type of ambiguity is common in name-based databases when species are split, the example of P. maniculatus is particularly striking for its impact upon biological questions ranging from hantavirus surveillance in North America to studies of climate change impacts upon rodent life-history traits. Of special relevance to NEON sampling is recent evidence suggesting deer mice potentially transmit SARS-CoV-2 (Griffin et al. 2021). Automating the updating of occurrence records in such cases and others will require operational representations of taxonomic concepts—e.g., range maps, reference sequences, and diagnostic traits—that are FAIR in addition to taxonomic concept alignment information (Franz and Peet 2009). Despite steady progress, it remains difficult to find, access, and reuse authoritative information about how to apply taxonomic names even when it is already digitized. It can also be difficult to tell without manual inspection whether similar types of concept representations derived from multiple sources, such as range maps or reference sequences selected from different research articles or checklists, are in fact interoperable for a particular application. The issue is therefore different from important ongoing efforts to digitize trait information in species circumscriptions, for example, and focuses on how already digitized knowledge can best be packaged to inform human experts and artifical intelligence applications (Sterner and Franz 2017). We therefore propose developing community guidelines and criteria for FAIR taxonomic concept representations as "semantic artefacts" of general relevance to linked open data and life sciences research (Le Franc et al. 2020).« less