ABSTRACT Camera trap studies have become a popular medium to assess many ecological phenomena including population dynamics, patterns of biodiversity, and monitoring of endangered species. In conjunction with the benefit to scientists, camera traps present an unprecedented opportunity to involve the public in scientific research via image classifications. However, this engagement strategy comes with a myriad of complications. Volunteers vary in their familiarity with wildlife, thus, the accuracy of userâderived classifications may be biased by the commonness or popularity of species and userâexperience. From an extensive multiâsite camera trap study across Michigan, U.S.A, we compiled and classified images through a public science platform called Michigan ZoomIN. We aggregated responses from 15 independent users per image using multiple consensus methods to assess accuracy by comparing to species identification completed by wildlife experts. We also evaluated how different factors including consensus algorithms, study area, wildlife species, user support, and camera type influenced the accuracy of userâderived classifications. Overall accuracy of userâderived classification was 97%; although, several canid (e.g.,Canis lupus, Vulpes vulpes) and mustelid (e.g.,Neovison vison) species were repeatedly difficult to identify by users and had lower accuracy. When validating userâderived classification, we found that study area, consensus method, and user support best explained accuracy. To overcome hesitancy associated with data collected by untrained participants, we demonstrated their value by showing that the accuracy from volunteers was comparable to experts when classifying North American mammals. Our hierarchical workflow that integrated multiple consensus methods led to more image classifications without extensive training and even when the expertise of the volunteer was unknown. Ultimately, adopting such an approach can harness broader participation, expedite future camera trap data synthesis, and improve allocation of resources by scholars to enhance performance of public participants and increase accuracy of userâderived data. © 2021 The Wildlife Society.
more »
« less
Robust and simplified machine learning identification of pitfall trapâcollected ground beetles at the continental scale
Abstract Insect populations are changing rapidly, and monitoring these changes is essential for understanding the causes and consequences of such shifts. However, largeâscale insect identification projects are timeâconsuming and expensive when done solely by human identifiers. Machine learning offers a possible solution to help collect insect data quickly and efficiently.Here, we outline a methodology for training classification models to identify pitfall trapâcollected insects from image data and then apply the method to identify ground beetles (Carabidae). All beetles were collected by the National Ecological Observatory Network (NEON), a continental scale ecological monitoring project with sites across the United States. We describe the procedures for image collection, image data extraction, data preparation, and model training, and compare the performance of five machine learning algorithms and two classification methods (hierarchical vs. singleâlevel) identifying ground beetles from the species to subfamily level. All models were trained using preâextracted feature vectors, not raw image data. Our methodology allows for data to be extracted from multiple individuals within the same image thus enhancing time efficiency, utilizes relatively simple models that allow for direct assessment of model performance, and can be performed on relatively small datasets.The best performing algorithm, linear discriminant analysis (LDA), reached an accuracy of 84.6% at the species level when naively identifying species, which was further increased to >95% when classifications were limited by known local species pools. Model performance was negatively correlated with taxonomic specificity, with the LDA model reaching an accuracy of ~99% at the subfamily level. When classifying carabid species not included in the training dataset at higher taxonomic levels species, the models performed significantly better than if classifications were made randomly. We also observed greater performance when classifications were made using the hierarchical classification method compared to the singleâlevel classification method at higher taxonomic levels.The general methodology outlined here serves as a proofâofâconcept for classifying pitfall trapâcollected organisms using machine learning algorithms, and the image data extraction methodology may be used for nonmachine learning uses. We propose that integration of machine learning in largeâscale identification pipelines will increase efficiency and lead to a greater flow of insect macroecological data, with the potential to be expanded for use with other noninsect taxa.
more »
« less
- Award ID(s):
- 1702426
- PAR ID:
- 10455027
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Ecology and Evolution
- Volume:
- 10
- Issue:
- 23
- ISSN:
- 2045-7758
- Page Range / eLocation ID:
- p. 13143-13153
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The biodiversity crisis necessitates spatially extensive methods to monitor multiple taxonomic groups for evidence of change in response to evolving environmental conditions. Programs that combine passive acoustic monitoring and machine learning are increasingly used to meet this need. These methods require large, annotated datasets, which are timeâconsuming and expensive to produce, creating potential barriers to adoption in dataâ and fundingâpoor regions. Recently released preâtrained avian acoustic classification models provide opportunities to reduce the need for manual labelling and accelerate the development of new acoustic classification algorithms through transfer learning. Transfer learning is a strategy for developing algorithms under data scarcity that uses preâtrained models from related tasks to adapt to new tasks.Our primary objective was to develop a transfer learning strategy using the feature embeddings of a preâtrained avian classification model to train custom acoustic classification models in dataâscarce contexts. We used three annotated avian acoustic datasets to test whether transfer learning and soundscape simulationâbased data augmentation could substantially reduce the annotated training data necessary to develop performant custom acoustic classifiers. We also conducted a sensitivity analysis for hyperparameter choice and model architecture. We then assessed the generalizability of our strategy to increasingly novel nonâavian classification tasks.With as few as two training examples per class, our soundscape simulation data augmentation approach consistently yielded new classifiers with improved performance relative to the preâtrained classification model and transfer learning classifiers trained with other augmentation approaches. Performance increases were evident for three avian test datasets, including singleâclass and multiâlabel contexts. We observed that the relative performance among our data augmentation approaches varied for the avian datasets and nearly converged for one dataset when we included more training examples.We demonstrate an efficient approach to developing new acoustic classifiers leveraging openâsource sound repositories and preâtrained networks to reduce manual labelling. With very few examples, our soundscape simulation approach to data augmentation yielded classifiers with performance equivalent to those trained with many more examples, showing it is possible to reduce manual labelling while still achieving highâperformance classifiers and, in turn, expanding the potential for passive acoustic monitoring to address rising biodiversity monitoring needs.more » « less
-
Kajtoch, Ćukasz (Ed.)This study presents an initial model for bark beetle identification, serving as a foundational step toward developing a fully functional and practical identification tool. Bark beetles are known for extensive damage to forests globally, as well as for uniform and homoplastic morphology which poses identification challenges. Utilizing a MaxViT-based deep learning backbone which utilizes local and global attention to classify bark beetles down to the genus level from images containing multiple beetles. The methodology involves a process of image collection, preparation, and model training, leveraging pre-classified beetle species to ensure accuracy and reliability. The model's F1 score estimates of 0.99 and 1.0 indicates a strong ability to accurately classify genera in the collected data, including those previously unknown to the model. This makes it a valuable first step towards building a tool for applications in forest management and ecological research. While the current model distinguishes among 12 genera, further refinement and additional data will be necessary to achieve reliable species-level identification, which is particularly important for detecting new invasive species. Despite the controlled conditions of image collection and potential challenges in real-world application, this study provides the first model capable of identifying the bark beetle genera, and by far the largest training set of images for any comparable insect group. We also designed a function that reports if a species appears to be unknown. Further research is suggested to enhance the model's generalization capabilities and scalability, emphasizing the integration of advanced machine learning techniques for improved species classification and the detection of invasive or undescribed species.more » « less
-
This data set contains the individual classifications that the Gravity Spy citizen science volunteers made for glitches through 20 July 2024. Classifications made by science team members or in testing workflows have been removed as have classifications of glitches lacking a Gravity Spy identifier. See Zevin et al. (2017) for an explanation of the citizen science task and classification interface. Data about glitches with machine-learning labels are provided in an earlier data release (Glanzer et al., 2021). Final classifications combining ML and volunteer classifications are provided in Zevin et al. (2022). 22 of the classification labels match the labels used in the earlier data release, namely 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line and Whistle. One glitch class that was added to the machine-learning classification has not been added to the Zooniverse project and so does not appear in this file, namely Blip_Low_Frequency. Four classes were added to the citizen science platform but not to the machine learning model and so have only volunteer labels, namely 70HZLINE, HIGHFREQUENCYBURST, LOWFREQUENCYBLIP and PIZZICATO. The glitch class Fast_Scattering added to the machine-learning classification has an equivalent volunteer label CROWN, which is used here (Soni et al. 2021). Glitches are presented to volunteers in a succession of workflows. Workflows include glitches classified by a machine learning classifier as being likely to be in a subset of classes and offer the option to classify only those classes plus None_of_the_Above. Each level includes the classes available in lower levels. The top level does not add new classification options but includes all glitches, including those for which the machine learning model is uncertain of the class. As the classes available to the volunteers change depending on the workflow, a glitch might be classified as None_of_the_Above in a lower workflow and subsequently as a different class in a higher workflow. Workflows and available classes are shown in the table below. Workflow ID Name Number of glitch classes Glitches added 1610 Level 1 3 Blip, Whistle, None_of_the_Above 1934 Level 2 6 Koi_Fish, Power_Line, Violin_Mode 1935 Level 3 10 Chirp, Low_Frequency_Burst, No_Glitch, Scattered_Light 2360 Original level 4 22 1080Lines, 1400Ripples, Air_Compressor, Extremely_Loud, Helix, Light_Modulation, Low_Frequency_Lines, Paired_Doves, Repeating_Blips, Scratchy, Tomte, Wandering_Line 7765 New level 4 15 1080Lines, Extremely_Loud, Low_Frequency_Lines, Repeating_Blips, Scratchy 2117 Original level 5 22 No new glitch classes 7766 New level 5 27 1400Ripples, Air_Compressor, Paired_Doves, Tomte, Wandering_Line, 70HZLINE, CROWN, HIGHFREQUENCYBURST, LOWFREQUENCYBLIP, PIZZICATO 7767 Level 6 27 No new glitch classes Description of data fields Classification_id: a unique identifier for the classification. A volunteer may choose multiple classes for a glitch when classifying, in which case there will be multiple rows with the same classification_id. Subject_id: a unique identifier for the glitch being classified. This field can be used to join the classification to data about the glitch from the prior data release. User_hash: an anonymized identifier for the user making the classification or for anonymous users an identifier that can be used to track the user within a session but which may not persist across sessions. Anonymous_user: True if the classification was made by a non-logged in user. Workflow: The Gravity Spy workflow in which the classification was made. Workflow_version: The version of the workflow. Timestamp: Timestamp for the classification. Classification: Glitch class selected by the volunteer. Related datasets For machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo For classifications of glitches combining machine learning and volunteer classifications, please see Gravity Spy Volunteer Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b. For the training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo. For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo.more » « less
-
Abstract The Gravity Spy project aims to uncover the origins of glitches, transient bursts of noise that hamper analysis of gravitational-wave data. By using both the work of citizen-science volunteers and machine learning algorithms, the Gravity Spy project enables reliable classification of glitches. Citizen science and machine learning are intrinsically coupled within the Gravity Spy framework, with machine learning classifications providing a rapid first-pass classification of the dataset and enabling tiered volunteer training, and volunteer-based classifications verifying the machine classifications, bolstering the machine learning training set and identifying new morphological classes of glitches. These classifications are now routinely used in studies characterizing the performance of the LIGO gravitational-wave detectors. Providing the volunteers with a training framework that teaches them to classify a wide range of glitches, as well as additional tools to aid their investigations of interesting glitches, empowers them to make discoveries of new classes of glitches. This demonstrates that, when giving suitable support, volunteers can go beyond simple classification tasks to identify new features in data at a level comparable to domain experts. The Gravity Spy project is now providing volunteers with more complicated data that includes auxiliary monitors of the detector to identify the root cause of glitches.more » « less
An official website of the United States government
