We assess the identification accuracy of ‘research grade’ observations of lichens posted on the online platform iNaturalist. Our results show that these observations are frequently misidentified or lack the necessary chemical and (or) microscopic information for accurate identification. Lichens are a taxonomically difficult group, but they are ubiquitous and eye-catching and are regularly the subject of observations posted on iNaturalist. Therefore, we provide best practice recommendations for posting lichen observations and commenting on observations. Data from iNaturalist are a valuable tool for understanding and managing biodiversity, particularly at this crucial time when large scale biodiversity decline is occurring globally. However, the data must be accurate for them to effectively support biodiversity conservation efforts. Our recommendations are also applicable to other taxonomically difficult taxa. 
                        more » 
                        « less   
                    
                            
                            Assessing Identification Accuracy of Research Grade iNaturalist Observations in Lichens and other Taxonomically Difficult Organisms
                        
                    
    
            Community science-generated biodiversity data can provide essential information for understanding species distributions, behaviors and conservation statuses. However, their utility can be limited due to high uncertainty and variability in quality, especially for small taxonomically difficult organisms like fungi and insects. One important set of community-generated data that are increasingly used by scientists are Research Grade (RG) iNaturalist observations. These observations are aggregated into the Global Biodiversity Information Facility database. Here we assessed the accuracy of RG lichen observations in iNaturalist. Lichens are mutualistic symbioses formed between fungi and a photosynthetic partner, either algae or cyanobacteria that occur in every terrestrial ecosystem on the planet (Brodo et al. 2001). They are sensitive indicators of environmental health, especially air quality, and provide essental food and nesting material for animals, along with performing many other ecosystem services (Allen and Lendemer 2021, Brodo et al. 2001, Nimis et al. 2002). We examined hundreds of observations and determined if the identification was correct, if it was not possible to identify the observation given the data provided, or if the identification was incorrect. Identification accuracy of selected species varied widely, from zero observations with enough information for correct identification (e.g., Rhizocarpon geographicum and Cladonia chlorophaea ) to 100% correct identifications (e.g., Cetradonia linearis and Physconia subpallida , McMullin and Allen 2022). Most frequently, species that require microscopic examination or chemical tests for accurate identification were unable to be verified versus those that require only macromorphology. We provide a series of suggestions for best practices to improve the quality of RG observations and thus the utility of community-generated observation data for taxonomically difficult organisms. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2115191
- PAR ID:
- 10436322
- Date Published:
- Journal Name:
- Biodiversity Information Science and Standards
- Volume:
- 6
- ISSN:
- 2535-0897
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            All life on earth is linked by a shared evolutionary history. Even before Darwin developed the theory of evolution, Linnaeus categorized types of organisms based on their shared traits. We now know these traits derived from these species’ shared ancestry. This evolutionary history provides a natural framework to harness the enormous quantities of biological data being generated today. The Open Tree of Life project is a collaboration developing tools to curate and share evolutionary estimates (phylogenies) covering the entire tree of life (Hinchliff et al. 2015, McTavish et al. 2017). The tree is viewable at https://tree.opentreeoflife.org, and the data is all freely available online. The taxon identifiers used in the Open Tree unified taxonomy (Rees and Cranston 2017) are mapped to identifiers across biological informatics databases, including the Global Biodiversity Information Facility (GBIF), NCBI, and others. Linking these identifiers allows researchers to easily unify data from across these different resources (Fig. 1). Leveraging a unified evolutionary framework across the diversity of life provides new avenues for integrative wide scale research. Downstream tools, such as R packages developed by the R OpenSci foundation (rotl, rgbif) (Michonneau et al. 2016, Chamberlain 2017) and others tools (Revell 2012), make accessing and combining this information straightforward for students as well as researchers (e.g. https://mctavishlab.github.io/BIO144/labs/rotl-rgbif.html). Figure 1. Example linking phylogenetic relationships accessed from the Open Tree of Life with specimen location data from Global Biodiversity Information Facility. For example, a recent publication by Santorelli et al. 2018 linked evolutionary information from Open Tree with species locality data gathered from a local field study as well as GBIF species location records to test a river-barrier hypothesis in the Amazon. By combining these data, the authors were able test a widely held biogeographic hypothesis across 1952 species in 14 taxonomic groups, and found that a river that had been postulated to drive endemism, was in fact not a barrier to gene flow. However, data provenance and taxonomic name reconciliation remain key hurdles to applying data from these large digital biodiversity and evolution community resources to answering biological questions. In the Amazonian river analysis, while they leveraged use of GBIF records as a secondary check on their species records, they relied on their an intensive local field study for their major conclusions, and preferred taxon specific phylogenetic resources over Open Tree where they were available (Santorelli et al. 2018). When Li et al. 2018 assessed large scale phylogenetic approaches, including Open Tree, for measuring community diversity, they found that synthesis phylogenies were less resolved than purpose-built phylogenies, but also found that these synthetic phylogenies were sufficient for community level phylogenetic diversity analyses. Nonetheless, data quality concerns have limited adoption of analyses data from centralized resources (McTavish et al. 2017). Taxonomic name recognition and reconciliation across databases also remains a hurdle for large scale analyses, despite several ongoing efforts to improve taxonomic interoperability and unify taxonomies, such at Catalogue of Life + (Bánki et al. 2018). In order to support innovative science, large scale digital data resources need to facilitate data linkage between resources, and address researchers' data quality and provenance concerns. I will present the model that the Open Tree of Life is using to provide evolutionary data at the scale of the entire tree of life, while maintaining traceable provenance to the publications and taxonomies these evolutionary relationships are inferred from. I will discuss the hurdles to adoption of these large scale resources by researchers, as well as the opportunities for new research avenues provided by the connections between evolutionary inferences and biodiversity digital databases.more » « less
- 
            Abstract The availability of citizen science data has resulted in growing applications in biodiversity science. One widely used platform, iNaturalist, provides millions of digitally vouchered observations submitted by a global user base. These observation records include a date and a location but otherwise do not contain any information about the sampling process. As a result, sampling biases must be inferred from the data themselves. In the present article, we examine spatial and temporal biases in iNaturalist observations from the platform's launch in 2008 through the end of 2019. We also characterize user behavior on the platform in terms of individual activity level and taxonomic specialization. We found that, at the level of taxonomic class, the users typically specialized on a particular group, especially plants or insects, and rarely made observations of the same species twice. Biodiversity scientists should consider whether user behavior results in systematic biases in their analyses before using iNaturalist data.more » « less
- 
            Abstract Understanding the ranges of rare and endangered species is central to conserving biodiversity in the Anthropocene. Species distribution models (SDMs) have become a common and powerful tool for analyzing species–environment relationships across geographic space. Although evaluating the distribution of rare species is integral to their conservation, this can be difficult when limited distribution data are available. Community science platforms, such as iNaturalist, have emerged as alternative sources for species occurrence data. Although these observations are often thought to be of lower quality than those of natural history collections, they may have potential for improving SDMs for species with few occurrence records from collections. Here, we investigate the utility of iNaturalist data for developing SDMs for a rare high‐elevation plant,Telesonix jamesii. Because methods for modeling rare species are limited in the literature, five different modeling techniques were considered, including profile methods, statistical models, and machine learning algorithms. The inclusion of iNaturalist data doubled the number of usable records forT. jamesii.We found that a random forest (RF) model using ensemble training data performed the highest of any model (area under curve = 0.98). We then compared the performance of RF models that use only natural history training data and those that use a combination of natural history (herbarium specimens) and iNaturalist training data. All models heavily relied on climate data (mean temperature of driest quarter, and precipitation of the warmest quarter), indicating that this species is under threat as climate continues to change. Validation datasets affected model fits as well. Models using only herbarium data performed slightly poorer when evaluated with cross‐validation than when validated externally with iNaturalist data. This study can serve as a model for future SDM studies of species with similar data limitations.more » « less
- 
            Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. They increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and include practical targets and criteria for the publication and digitization of taxonomic concept representations and alignments in taxonomic treatments, checklists, and backbones. As a motivating case, consider the abundantly sampled North American deer mouse— Peromyscus maniculatus (Wagner 1845)—which was recently split from one continental species into five more narrowly defined forms, so that the name P. maniculatus is now only applied east of the Mississippi River (Bradley et al. 2019, Greenbaum et al. 2019). That single change instantly rendered ambiguous ~7% of North American mammal records in the Global Biodiversity Information Facility (n=242,663, downloaded 2021-06-04; GBIF.org 2021) and ⅓ of all National Ecological Observatory Network (NEON) small mammal samples (n=10,256, downloaded 2021-06-27). While this type of ambiguity is common in name-based databases when species are split, the example of P. maniculatus is particularly striking for its impact upon biological questions ranging from hantavirus surveillance in North America to studies of climate change impacts upon rodent life-history traits. Of special relevance to NEON sampling is recent evidence suggesting deer mice potentially transmit SARS-CoV-2 (Griffin et al. 2021). Automating the updating of occurrence records in such cases and others will require operational representations of taxonomic concepts—e.g., range maps, reference sequences, and diagnostic traits—that are FAIR in addition to taxonomic concept alignment information (Franz and Peet 2009). Despite steady progress, it remains difficult to find, access, and reuse authoritative information about how to apply taxonomic names even when it is already digitized. It can also be difficult to tell without manual inspection whether similar types of concept representations derived from multiple sources, such as range maps or reference sequences selected from different research articles or checklists, are in fact interoperable for a particular application. The issue is therefore different from important ongoing efforts to digitize trait information in species circumscriptions, for example, and focuses on how already digitized knowledge can best be packaged to inform human experts and artifical intelligence applications (Sterner and Franz 2017). We therefore propose developing community guidelines and criteria for FAIR taxonomic concept representations as "semantic artefacts" of general relevance to linked open data and life sciences research (Le Franc et al. 2020).more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    