skip to main content


Search for: All records

Creators/Authors contains: "Upham, Nathan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The current crisis in global natural resource management makes it imperative that we better leverage the vast data sources associated with taxonomic entities (such as recognized species of plants and animals), which are known collectively as biodiversity data. However, these data pose considerable challenges for artificial intelligence: while growing rapidly in volume, they remain highly incomplete for many taxonomic groups, often show conflicting signals from different sources, and are multi-modal and therefore constantly changing in structure. In this paper, we motivate, describe, and present a novel workflow combining machine learning and automated reasoning, to discover patterns of taxonomic identity and change - i.e. “taxonomic intelligence” - leading to scalable and broadly impactful AI solutions within the bio-data realm. 
    more » « less
  2. Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. They increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and include practical targets and criteria for the publication and digitization of taxonomic concept representations and alignments in taxonomic treatments, checklists, and backbones. As a motivating case, consider the abundantly sampled North American deer mouse— Peromyscus maniculatus (Wagner 1845)—which was recently split from one continental species into five more narrowly defined forms, so that the name P. maniculatus is now only applied east of the Mississippi River (Bradley et al. 2019, Greenbaum et al. 2019). That single change instantly rendered ambiguous ~7% of North American mammal records in the Global Biodiversity Information Facility (n=242,663, downloaded 2021-06-04; GBIF.org 2021) and ⅓ of all National Ecological Observatory Network (NEON) small mammal samples (n=10,256, downloaded 2021-06-27). While this type of ambiguity is common in name-based databases when species are split, the example of P. maniculatus is particularly striking for its impact upon biological questions ranging from hantavirus surveillance in North America to studies of climate change impacts upon rodent life-history traits. Of special relevance to NEON sampling is recent evidence suggesting deer mice potentially transmit SARS-CoV-2 (Griffin et al. 2021). Automating the updating of occurrence records in such cases and others will require operational representations of taxonomic concepts—e.g., range maps, reference sequences, and diagnostic traits—that are FAIR in addition to taxonomic concept alignment information (Franz and Peet 2009). Despite steady progress, it remains difficult to find, access, and reuse authoritative information about how to apply taxonomic names even when it is already digitized. It can also be difficult to tell without manual inspection whether similar types of concept representations derived from multiple sources, such as range maps or reference sequences selected from different research articles or checklists, are in fact interoperable for a particular application. The issue is therefore different from important ongoing efforts to digitize trait information in species circumscriptions, for example, and focuses on how already digitized knowledge can best be packaged to inform human experts and artifical intelligence applications (Sterner and Franz 2017). We therefore propose developing community guidelines and criteria for FAIR taxonomic concept representations as "semantic artefacts" of general relevance to linked open data and life sciences research (Le Franc et al. 2020). 
    more » « less
  3. null (Ed.)
    Preventing extinctions requires understanding macroecological patterns of vulnerability or persistence. However, correlates of risk can be nonlinear, within-species risk varies geographically, and current-day threats cannot reveal drivers of past losses. We investigated factors that regulated survival or extinction in Caribbean mammals, which have experienced the globally highest level of human-caused postglacial mammalian extinctions, and included all extinct and extant Holocene island populations of non-volant species (219 survivals or extinctions across 118 islands). Extinction selectivity shows a statistically detectable and complex body mass effect, with survival probability decreasing for both mass extremes, indicating that intermediate-sized species have been more resilient. A strong interaction between mass and age of first human arrival provides quantitative evidence of larger mammals going extinct on the earliest islands colonized, revealing an extinction filter caused by past human activities. Survival probability increases on islands with lower mean elevation (mostly small cays acting as offshore refugia) and decreases with more frequent hurricanes, highlighting the risk of extreme weather events and rising sea levels to surviving species on low-lying cays. These findings demonstrate the interplay between intrinsic biology, regional ecology and specific local threats, providing insights for understanding drivers of biodiversity loss across island systems and fragmented habitats worldwide. 
    more » « less
  4. null (Ed.)
    “What is crucial for your ability to communicate with me… pivots on the recipient’s capacity to interpret—to make good inferential sense of the meanings that the declarer is able to send” (Rescher 2000, p148). Conventional approaches to reconciling taxonomic information in biodiversity databases have been based on string matching for unique taxonomic name combinations (Kindt 2020, Norman et al. 2020). However, in their original context, these names pertain to specific usages or taxonomic concepts, which can subsequently vary for the same name as applied by different authors. Name-based synonym matching is a helpful first step (Guala 2016, Correia et al. 2018), but may still leave considerable ambiguity regarding proper usage (Fig. 1). Therefore, developing "taxonomic intelligence" is the bioinformatic challenge to adequately represent, and subsequently propagate, this complex name/usage interaction across trusted biodiversity data networks. How do we ensure that senders and recipients of biodiversity data not only can share messages but do so with “good inferential sense” of their respective meanings? Key obstacles have involved dealing with the complexity of taxonomic name/usage modifications through time, both in terms of accounting for and digitally representing the long histories of taxonomic change in most lineages. An important critique of proposals to use name-to-usage relationships for data aggregation has been the difficulty of scaling them up to reach comprehensive coverage, in contrast to name-based global taxonomic hierarchies (Bisby 2011). The Linnaean system of nomenclature has some unfortunate design limitations in this regard, in that taxonomic names are not unique identifiers, their meanings may change over time, and the names as a string of characters do not encode their proper usage, i.e., the name “Genus species” does not specify a source defining how to use the name correctly (Remsen 2016, Sterner and Franz 2017). In practice, many people provide taxonomic names in their datasets or publications but not a source specifying a usage. The information needed to map the relationships between names and usages in taxonomic monographs or revisions is typically not presented it in a machine-readable format. New approaches are making progress on these obstacles. Theoretical advances in the representation of taxonomic intelligence have made it increasingly possible to implement efficient querying and reasoning methods on name-usage relationships (Chen et al. 2014, Chawuthai et al. 2016, Franz et al. 2015). Perhaps most importantly, growing efforts to produce name-usage mappings on a medium scale by data providers and taxonomic authorities suggest an all-or-nothing approach is not required. Multiple high-profile biodiversity databases have implemented internal tools for explicitly tracking conflicting or dynamic taxonomic classifications, including eBird using concept relationships from AviBase (Lepage et al. 2014); NatureServe in its Biotics database; iNaturalist using its taxon framework (Loarie 2020); and the UNITE database for fungi (Nilsson et al. 2019). Other ongoing projects incorporating taxonomic intelligence include the Flora of Alaska (Flora of Alaska 2020), the Mammal Diversity Database (Mammal Diversity Database 2020) and PollardBase for butterfly population monitoring (Campbell et al. 2020). 
    more » « less
  5. Translating information between the domains of systematics and conservation requires novel information management designs. Such designs should improve interactions across the trading zone between the domains, herein understood as the model according to which knowledge and uncertainty are productively translated in both directions (cf. Collins et al. 2019). Two commonly held attitudes stand in the way of designing a well-functioning systematics-to-conservation trading zone. On one side, there are calls to unify the knowledge signal produced by systematics, underpinned by the argument that such unification is a necessary precondition for conservation policy to be reliably expressed and enacted (e.g., Garnett et al. 2020). As a matter of legal scholarship, the argument for systematic unity by legislative necessity is principally false (Weiss 2003, MacNeil 2009, Chromá 2011), but perhaps effective enough as a strategy to win over audiences unsure about robust law-making practices in light of variable and uncertain knowledge. On the other side, there is an attitude that conservation cannot ever restrict the academic freedom of systematics as a scientific discipline (e.g., Raposo et al. 2017). This otherwise sound argument misses the mark in the context of designing a productive trading zone with conservation. The central interactional challenge is not whether the systematic knowledge can vary at a given time and/or evolve over time, but whether these signal dynamics are tractable in ways that actors can translate into robust maxims for conservation. Redesigning the trading zone should rest on the (historically validated) projection that systematics will continue to attract generations of inspired, productive researchers and broad-based societal support, frequently leading to protracted conflicts and dramatic shifts in how practioners in the field organize and identify organismal lineages subject to conservation. This confident outlook for systematics' future, in turn, should refocus the challenge of designing the trading zone as one of building better information services to model the concurrent conflicts and longer-term evolution of systematic knowledge. It would seem unreasonable to expect the International Union for Conservation of Nature (IUCN) Red List Index to develop better data science models for the dynamics of systematic knowledge (cf. Hoffmann et al. 2011) than are operational in the most reputable information systems designed and used by domain experts (Burgin et al. 2018). The reasonable challenge from conservation to systematics is not to stop being a science but to be a better data science. In this paper, we will review advances in biodiversity data science in relation to representing and reasoning over changes in systematic knowledge with computational logic, i.e., modeling systematic intelligence (Franz et al. 2016). We stress-test this approach with a use case where rapid systematic signal change and high stakes for conservation action intersect, i.e., the Malagasy mouse lemurs ( Microcebus É. Geoffroy, 1834 sec. Schüßler et al. 2020), where the number of recognized species-level concepts has risen from 2 to 25 in the span of 38 years (1982–2020). As much as scientifically defensible, we extend our modeling approach to the level of individual published occurrence records, where the inability to do so sometimes reflects substandard practice but more importantly reveals systemic inadequacies in biodiversity data science or informational modeling. In the absence of shared, sound theoretical foundations to assess taxonomic congruence or incongruence across treatments, and in the absence of biodiversity data platforms capable of propagating logic-enabled, scalable occurrence-to-concept identification events to produce alternative and succeeding distribution maps, there is no robust way to provide a knowledge signal from systematics to conservation that is both consistent in its syntax and acccurate in its semantics, in the sense of accurately reflecting the variation and uncertainty that exists across multiple systematic perspectives. Translating this diagnosis into new designs for the trading zone is only one "half" of the solution, i.e., a technical advancement that then would need to be socially endorsed and incentivized by systematic and conservation communities motivated to elevate their collaborative interactions and trade robustly in inherently variable and uncertain information. 
    more » « less