skip to main content

Title: AgeStrucNb: Software for Simulating and Detecting Changes in the Effective Number of Breeders (Nb)
Abstract Estimation of the effective number of breeders per reproductive event (Nb) using single sample DNA-marker-based methods has rapidly grown in recent years. However, estimating Nb is difficult in age-structured populations because the performance of estimators is influenced by the Nb / Ne ratio, which varies among species with different life histories. We provide a computer program, AgeStrucNb, to simulate age-structured populations (including life history) and also estimate Nb. The AgeStrucNb program is composed of 4 major components to simulate, subsample, estimate, and then visualize Nb time series data. AgeStrucNb allows users to also quantify the precision and accuracy of any set of loci or sample size to estimate Nb for many species and populations. AgeStrucNb allows users to conduct power analysis to evaluate sensitivity to detect changes in Nb or the power to detect a correlation between trends in Nb and environmental variables (e.g., temperature, habitat quality, predator or pathogen abundance) that could be driving changes in Nb. The software provides Nb estimates for empirical data sets using the LDNe (linkage disequilibrium) method, includes publication-quality output graphs, and outputs genotype files in Genepop format for use in other programs. AgeStrucNb will help advance the application of genetic markers for more » monitoring Nb, which will help biologists to detect population declines and growth, which is crucial for research and conservation of natural and managed populations. « less
; ; ; ; ; ;
Sherwin, William
Award ID(s):
Publication Date:
Journal Name:
Journal of Heredity
Page Range or eLocation-ID:
491 to 497
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, andmore »it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper.« less
  2. Abstract Objective Randomized controlled trials (RCTs) are the gold standard method for evaluating whether a treatment works in health care but can be difficult to find and make use of. We describe the development and evaluation of a system to automatically find and categorize all new RCT reports. Materials and Methods Trialstreamer continuously monitors PubMed and the World Health Organization International Clinical Trials Registry Platform, looking for new RCTs in humans using a validated classifier. We combine machine learning and rule-based methods to extract information from the RCT abstracts, including free-text descriptions of trial PICO (populations, interventions/comparators, and outcomes) elementsmore »and map these snippets to normalized MeSH (Medical Subject Headings) vocabulary terms. We additionally identify sample sizes, predict the risk of bias, and extract text conveying key findings. We store all extracted data in a database, which we make freely available for download, and via a search portal, which allows users to enter structured clinical queries. Results are ranked automatically to prioritize larger and higher-quality studies. Results As of early June 2020, we have indexed 673 191 publications of RCTs, of which 22 363 were published in the first 5 months of 2020 (142 per day). We additionally include 304 111 trial registrations from the International Clinical Trials Registry Platform. The median trial sample size was 66. Conclusions We present an automated system for finding and categorizing RCTs. This yields a novel resource: a database of structured information automatically extracted for all published RCTs in humans. We make daily updates of this database available on our website (« less
  3. Silva, Daniel de (Ed.)
    Biodiversity loss is a global ecological crisis that is both a driver of and response to environmental change. Understanding the connections between species declines and other components of human-natural systems extends across the physical, life, and social sciences. From an analysis perspective, this requires integration of data from different scientific domains, which often have heterogeneous scales and resolutions. Community science projects such as eBird may help to fill spatiotemporal gaps and enhance the resolution of standardized biological surveys. Comparisons between eBird and the more comprehensive North American Breeding Bird Survey (BBS) have found these datasets can produce consistent multi-year abundancemore »trends for bird populations at national and regional scales. Here we investigate the reliability of these datasets for estimating patterns at finer resolutions, inter-annual changes in abundance within town boundaries. Using a case study of 14 focal species within Massachusetts, we calculated four indices of annual relative abundance using eBird and BBS datasets, including two different modeling approaches within each dataset. We compared the correspondence between these indices in terms of multi-year trends, annual estimates, and inter-annual changes in estimates at the state and town-level. We found correspondence between eBird and BBS multi-year trends, but this was not consistent across all species and diminished at finer, inter-annual temporal resolutions. We further show that standardizing modeling approaches can increase index reliability even between datasets at coarser temporal resolutions. Our results indicate that multiple datasets and modeling methods should be considered when estimating species population dynamics at finer temporal resolutions, but standardizing modeling approaches may improve estimate correspondence between abundance datasets. In addition, reliability of these indices at finer spatial scales may depend on habitat composition, which can impact survey accuracy.« less
  4. Substandard and falsified (SF) pharmaceuticals account for an estimated 10% of the pharmaceutical supply chain in low- and middle-income countries (LMICs), where a lack of regulatory and laboratory resources limits the ability to conduct effective post-market surveillance and allows SF products to penetrate the supply chain. The Distributed Pharmaceutical Analysis Laboratory (DPAL) was established in 2014 to expand testing of pharmaceutical dosage forms sourced from LMICs; DPAL is an alliance of academic institutions throughout the United States and abroad that provides high quality, validated chemical analysis of pharmaceutical dosage forms sourced from partners in LMICs. Results from analysis are reportedmore »to relevant regulatory agencies and are used to inform purchasing decisions made by in-country stakeholders. As the DPAL program has expanded to testing more than 1000 pharmaceutical dosage forms annually, challenges have surfaced regarding data management and sample tracking. Here, we describe a pilot project between DPAL and ARTiFACTs that applies blockchain to organize and manage key data generated during the DPAL workflow, including a sample’s progress through the workflow, its physical location, provenance of metadata, and lab reputability. Recording time and date stamps with this data will create a permanent and verifiable chain-of-custody for samples. This secure, distributed ledger will be linked to an easy-to-use dashboard, allowing stakeholders to view results and experimental details for each sample in real time and verify the integrity of DPAL analysis data. Introducing this blockchain-based system as a pilot will allow us to test the technology with real users analyzing real samples. Feedback from users will be recorded and necessary adjustments will be made to the system before the implementation of blockchain across all DPAL sites. Anticipated benefits of implementing blockchain for managing DPAL data include efficient management for routing work, increasing throughput, creating a chain of custody for samples and their data in alignment with the distributed nature of DPAL, and using the analysis results to detect patterns of quality within and across brands of products and develop enhanced sampling techniques and best practices.« less
  5. The approximately 100 billion neurons in our brain are responsible for everything we do and experience. Experiments aimed at discovering how these cells encode and process information generate vast amounts of data. These data span multiple scales, from interactions between individual molecules to coordinated waves of electrical activity that spread across the entire brain surface. To understand how the brain works, we must combine and make sense of these diverse types of information. Computational modeling provides one way of doing this. Using equations, we can calculate the chemical and electrical changes that take place in neurons. We can then buildmore »models of neurons and neural circuits that reproduce the patterns of activity seen in experiments. Exploring these models can provide insights into how the brain itself works. Several software tools are available to simulate neural circuits, but none provide an easy way of incorporating data that span different scales, from molecules to cells to networks. Moreover, most of the models require familiarity with computer programming. Dura-Bernal et al. have now developed a new software tool called NetPyNE, which allows users without programming expertise to build sophisticated models of brain circuits. It features a user-friendly interface for defining the properties of the model at molecular, cellular and circuit scales. It also provides an easy and automated method to identify the properties of the model that enable it to reproduce experimental data. Finally, NetPyNE makes it possible to run the model on supercomputers and offers a variety of ways to visualize and analyze the resulting output. Users can save the model and output in standardized formats, making them accessible to as many people as possible. Researchers in labs across the world have used NetPyNE to study different brain regions, phenomena and diseases. The software also features in courses that introduce students to neurobiology and computational modeling. NetPyNE can help to interpret isolated experimental findings, and also makes it easier to explore interactions between brain activity at different scales. This will enable researchers to decipher how the brain encodes and processes information, and ultimately could make it easier to understand and treat brain disorders.« less