skip to main content

Search for: All records

Award ID contains: 1910118

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This paper focuses on a core task in computational sustainability and statistical ecology: species distribution modeling (SDM). In SDM, the occurrence pattern of a species on a landscape is predicted by environmental features based on observations at a set of locations. At first, SDM may appear to be a binary classification problem, and one might be inclined to employ classic tools (e.g., logistic regression, support vector machines, neural networks) to tackle it. However, wildlife surveys introduce structured noise (especially under-counting) in the species observations. If unaccounted for, these observation errors systematically bias SDMs. To address the unique challenges of SDM,more »this paper proposes a framework called StatEcoNet. Specifically, this work employs a graphical generative model in statistical ecology to serve as the skeleton of the proposed computational framework and carefully integrates neural networks under the framework. The advantages of StatEcoNet over related approaches are demonstrated on simulated datasets as well as bird species data. Since SDMs are critical tools for ecological science and natural resource management, StatEcoNet may offer boosted computational and analytical powers to a wide range of applications that have significant social impacts, e.g., the study and conservation of threatened species.« less
  2. Species distributions, abundance, and interactions have always been influenced by human activity and are currently experiencing rapid change. Biodiversity benchmark surveys traditionally require intense human labor inputs to find, identify, and record organisms limiting the rate and impact of scientific enquiry and discovery. Recent emergence and advancement of monitoring technologies have improved biodiversity data collection to a scale and scope previously unimaginable. Community science web platforms, smartphone applications, and technology assisted identification have expedited the speed and enhanced the volume of observational data all while providing open access to these data worldwide. How to integrate and leverage the data intomore »valuable information on how species are changing in space and time requires new best practices in computational and analytical approaches. Here we integrate data from three community science repositories to explore how a specialist herbivore distribution changes in relation to host plant distributions and other environmental factors. We generate a series of temporally explicit species distribution models to generate range predictions for a specialist insect herbivore ( Papilio cresphontes ) and three predominant host-plant species. We find that this insect species has experienced rapid northern range expansion, likely due to a combination of the range of its larval host plants and climate changes in winter. This case study shows rapid data collection through large scale community science endeavors can be leveraged through thoughtful data integration and transparent analytic pipelines to inform how environmental change impacts where species are and their interactions for a more cost effective method of biodiversity benchmarking.« less
  3. The growth of biodiversity data sets generated by citizen scientists continues to accelerate. The availability of such data has greatly expanded the scale of questions researchers can address. Yet, error, bias, and noise continue to be serious concerns for analysts, particularly when data being contributed to these giant online data sets are difficult to verify. Counts of birds contributed to eBird, the world’s largest biodiversity online database, present a potentially useful resource for tracking trends over time and space in species’ abundances. We quantified counting accuracy in a sample of 1,406 eBird checklists by comparing numbers contributed by birders (Nmore »= 246) who visited a popular birding location in Oregon, USA, with numbers generated by a professional ornithologist engaged in a long-term study creating benchmark (reference) measurements of daily bird counts. We focused on waterbirds, which are easily visible at this site. We evaluated potential predictors of count differences, including characteristics of contributed checklists, of each species, and of time of day and year. Count differences were biased toward undercounts, with more than 75% of counts being below the daily benchmark value. Median count discrepancies were −29.1% (range: 0 to −42.8%; N = 20 species). Model sets revealed an important influence of each species’ reference count, which varied seasonally as waterbird numbers fluctuated, and of percent of species known to be present each day that were included on each checklist. That is, checklists indicating a more thorough survey of the species richness at the site also had, on average, smaller count differences. However, even on checklists with the most thorough species lists, counts were biased low and exceptionally variable in their accuracy. To improve utility of such bird count data, we suggest three strategies to pursue in the future. (1) Assess additional options for analytically determining how to select checklists that include less biased count data, as well as exploring options for correcting bias during the analysis stage. (2) Add options for users to provide additional information that helps analysts choose checklists, such as an option for users to tag checklists where they focused on obtaining accurate counts. (3) Explore opportunities to effectively calibrate citizen-science bird count data by establishing a formalized network of marquis sites where dedicated observers regularly contribute carefully collected benchmark data.« less
  4. Spectrum cartography aims at estimating power propagation patterns over a geographical region across multiple frequency bands (i.e., a radio map)—from limited samples taken sparsely over the region. Classic cartography methods are mostly concerned with recovering the aggregate radio frequency (RF) information while ignoring the constituents of the radio map—but fine-grained emitter-level RF information is of great interest. In addition, many existing cartography methods explicitly or implicitly assume random spatial sampling schemes that may be difficult to implement, due to legal/privacy/security issues. The theoretical aspects (e.g., identifiability of the radio map) of many existing methods are also unclear. In this work,more »we propose a joint radio map recovery and disaggregation method that is based on coupled block-term tensor decomposition. Our method guarantees identifiability of the individual radio map of each emitter (thereby that of the aggregate radio map as well), under realistic conditions. The identifiability result holds under a large variety of geographical sampling patterns, including a number of pragmatic systematic sampling strategies. We also propose effective optimization algorithms to carry out the formulated radio map disaggregation problems. Extensive simulations are employed to showcase the effectiveness of the proposed approach.« less
  5. In this paper we propose a quasi-Newton algorithm for the celebrated nonnegative matrix factorization (NMF) problem. The proposed algorithm falls into the general framework of Gauss-Newton and Levenberg-Marquardt methods. However, these methods were not able to handle constraints, which is present in NMF. One of the key contributions in this paper is to apply alternating direction method of multipliers (ADMM) to obtain the iterative update from this Gauss-Newton-like algorithm. Furthermore, we carefully study the structure of the Jacobian Gramian matrix given by the Gauss-Newton updates, and designed a way of exactly inverting the matrix with complexity $\cO(mnk)$, which is amore »significant reduction compared to the naive implementation of complexity $\cO((m+n)^3k^3)$. The resulting algorithm, which we call NLS-ADMM, enjoys fast convergence rate brought by the quasi-Newton algorithmic framework, while maintaining low per-iteration complexity similar to that of alternating algorithms. Numerical experiments on synthetic data confirms the efficiency of our proposed algorithm. \end{abstract}« less