skip to main content


Title: Camtrap DP: an open standard for the FAIR exchange and archiving of camera trap data
Abstract

Camera trapping has revolutionized wildlife ecology and conservation by providing automated data acquisition, leading to the accumulation of massive amounts of camera trap data worldwide. Although management and processing of camera trap‐derived Big Data are becoming increasingly solvable with the help of scalable cyber‐infrastructures, harmonization and exchange of the data remain limited, hindering its full potential. There is currently no widely accepted standard for exchanging camera trap data. The only existing proposal, “Camera Trap Metadata Standard” (CTMS), has several technical shortcomings and limited adoption. We present a new data exchange format, the Camera Trap Data Package (Camtrap DP), designed to allow users to easily exchange, harmonize and archive camera trap data at local to global scales. Camtrap DP structures camera trap data in a simple yet flexible data model consisting of three tables (Deployments, Media and Observations) that supports a wide range of camera deployment designs, classification techniques (e.g., human and AI, media‐based and event‐based) and analytical use cases, from compiling species occurrence data through distribution, occupancy and activity modeling to density estimation. The format further achieves interoperability by building upon existing standards, Frictionless Data Package in particular, which is supported by a suite of open software tools to read and validate data. Camtrap DP is the consensus of a long, in‐depth, consultation and outreach process with standard and software developers, the main existing camera trap data management platforms, major players in the field of camera trapping and the Global Biodiversity Information Facility (GBIF). Under the umbrella of the Biodiversity Information Standards (TDWG), Camtrap DP has been developed openly, collaboratively and with version control from the start. We encourage camera trapping users and developers to join the discussion and contribute to the further development and adoption of this standard.

 
more » « less
NSF-PAR ID:
10478681
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  more » ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;   « less
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Remote Sensing in Ecology and Conservation
ISSN:
2056-3485
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Global Biodiversity Information Facility (GBIF 2022a) has indexed more than 2 billion occurrence records from 70,147 datasets. These datasets often include "hidden" biotic interaction data because biodiversity communities use the Darwin Core standard (DwC, Wieczorek et al. 2012) in different ways to document biotic interactions. In this study, we extracted biotic interactions from GBIF data using an approach similar to that employed in the Global Biotic Interactions (GloBI; Poelen et al. 2014) and summarized the results. Here we aim to present an estimation of the interaction data available in GBIF, showing that biotic interaction claims can be automatically found and extracted from GBIF. Our results suggest that much can be gained by an increased focus on development of tools that help to index and curate biotic interaction data in existing datasets. Combined with data standardization and best practices for sharing biotic interactions, such as the initiative on plant-pollinators interaction (Salim 2022), this approach can rapidly contribute to and meet open data principles (Wilkinson 2016). We used Preston (Elliott et al. 2020), open-source software that versions biodiversity datasets, to copy all GBIF-indexed datasets. The biodiversity data graph version (Poelen 2020) of the GBIF-indexed datasets used during this study contains 58,504 datasets in Darwin Core Archive (DwC-A) format, totaling 574,715,196 records. After retrieval and verification, the datasets were processed using Elton. Elton extracts biotic interaction data and supports 20+ existing file formats, including various types of data elements in DwC records. Elton also helps align interaction claims (e.g., host of, parasite of, associated with) to the Relations Ontology (RO, Mungall 2022), making it easier to discover datasets across a heterogeneous collection of datasets. Using specific mapping between interaction claims found in the DwC records to the terms in RO*1, Elton found 30,167,984 potential records (with non-empty values for the scanned DwC terms) and 15,248,478 records with recognized interaction types. Taxonomic name validation was performed using Nomer, which maps input names to names found in a variety of taxonomic catalogs. We only considered an interaction record valid where the interaction type could be mapped to a term in RO and where Nomer found a valid name for source and target taxa. Based on the workflow described in Fig. 1, we found 7,947,822 interaction records (52% of the potential interactions). Most of them were generic interactions ( interacts_ with , 87.5%), but the remaining 12.5% (993,477 records) included host-parasite and plant-animal interactions. The majority of the interactions records found involved plants (78%), animals (14%) and fungi (6%). In conclusion, there are many biotic interactions embedded in existing datasets registered in large biodiversity data indexers and aggregators like iDigBio, GBIF, and BioCASE. We exposed these biotic interaction claims using the combined functionality of biodiversity data tools Elton (for interaction data extraction), Preston (for reliable dataset tracking) and Nomer (for taxonomic name alignment). Nonetheless, the development of new vocabularies, standards and best practice guides would facilitate aggregation of interaction data, including the diversification of the GBIF data model (GBIF 2022b) for sharing biodiversity data beyond occurrences data. That is the aim of the TDWG Interest Group on Biological Interactions Data (TDWG 2022). 
    more » « less
  2. Obeid, I. ; Selesnick, I. (Ed.)
    The Neural Engineering Data Consortium at Temple University has been providing key data resources to support the development of deep learning technology for electroencephalography (EEG) applications [1-4] since 2012. We currently have over 1,700 subscribers to our resources and have been providing data, software and documentation from our web site [5] since 2012. In this poster, we introduce additions to our resources that have been developed within the past year to facilitate software development and big data machine learning research. Major resources released in 2019 include: ● Data: The most current release of our open source EEG data is v1.2.0 of TUH EEG and includes the addition of 3,874 sessions and 1,960 patients from mid-2015 through 2016. ● Software: We have recently released a package, PyStream, that demonstrates how to correctly read an EDF file and access samples of the signal. This software demonstrates how to properly decode channels based on their labels and how to implement montages. Most existing open source packages to read EDF files do not directly address the problem of channel labels [6]. ● Documentation: We have released two documents that describe our file formats and data representations: (1) electrodes and channels [6]: describes how to map channel labels to physical locations of the electrodes, and includes a description of every channel label appearing in the corpus; (2) annotation standards [7]: describes our annotation file format and how to decode the data structures used to represent the annotations. Additional significant updates to our resources include: ● NEDC TUH EEG Seizure (v1.6.0): This release includes the expansion of the training dataset from 4,597 files to 4,702. Calibration sequences have been manually annotated and added to our existing documentation. Numerous corrections were made to existing annotations based on user feedback. ● IBM TUSZ Pre-Processed Data (v1.0.0): A preprocessed version of the TUH Seizure Detection Corpus using two methods [8], both of which use an FFT sliding window approach (STFT). In the first method, FFT log magnitudes are used. In the second method, the FFT values are normalized across frequency buckets and correlation coefficients are calculated. The eigenvalues are calculated from this correlation matrix. The eigenvalues and correlation matrix's upper triangle are used to generate feature. ● NEDC TUH EEG Artifact Corpus (v1.0.0): This corpus was developed to support modeling of non-seizure signals for problems such as seizure detection. We have been using the data to build better background models. Five artifact events have been labeled: (1) eye movements (EYEM), (2) chewing (CHEW), (3) shivering (SHIV), (4) electrode pop, electrostatic artifacts, and lead artifacts (ELPP), and (5) muscle artifacts (MUSC). The data is cross-referenced to TUH EEG v1.1.0 so you can match patient numbers, sessions, etc. ● NEDC Eval EEG (v1.3.0): In this release of our standardized scoring software, the False Positive Rate (FPR) definition of the Time-Aligned Event Scoring (TAES) metric has been updated [9]. The standard definition is the number of false positives divided by the number of false positives plus the number of true negatives: #FP / (#FP + #TN). We also recently introduced the ability to download our data from an anonymous rsync server. The rsync command [10] effectively synchronizes both a remote directory and a local directory and copies the selected folder from the server to the desktop. It is available as part of most, if not all, Linux and Mac distributions (unfortunately, there is not an acceptable port of this command for Windows). To use the rsync command to download the content from our website, both a username and password are needed. An automated registration process on our website grants both. An example of a typical rsync command to access our data on our website is: rsync -auxv nedc_tuh_eeg@www.isip.piconepress.com:~/data/tuh_eeg/ Rsync is a more robust option for downloading data. We have also experimented with Google Drive and Dropbox, but these types of technology are not suitable for such large amounts of data. All of the resources described in this poster are open source and freely available at https://www.isip.piconepress.com/projects/tuh_eeg/downloads/. We will demonstrate how to access and utilize these resources during the poster presentation and collect community feedback on the most needed additions to enable significant advances in machine learning performance. 
    more » « less
  3. CitSci.org is a global citizen science software platform and support organization housed at Colorado State University. The mission of CitSci is to help people do high quality citizen science by amplifying impacts and outcomes. This platform hosts over one thousand projects and a diverse volunteer base that has amassed over one million observations of the natural world, focused on biodiversity and ecosystem sustainability. It is a custom platform built using open source components including: PostgreSQL, Symfony, Vue.js, with React Native for the mobile apps. CitSci sets itself apart from other Citizen Science platforms through the flexibility in the types of projects it supports rather than having a singular focus. This flexibility allows projects to define their own datasheets and methodologies. The diversity of programs we host motivated us to take a founding role in the design of the PPSR Core, a set of global, transdisciplinary data and metadata standards for use in Public Participation in Scientific Research (Citizen Science) projects. Through an international partnership between the Citizen Science Association, European Citizen Science Association, and Australian Citizen Science Association, the PPSR team and associated standards enable interoperability of citizen science projects, datasets, and observations. Here we share our experience over the past 10+ years of supporting biodiversity research both as developers of the CitSci.org platform and as stewards of, and contributors to, the PPSR Core standard. Specifically, we share details about: the origin, development, and informatics infrastructure for CitSci our support for biodiversity projects such as population and community surveys our experiences in platform interoperability through PPSR Core working with the Zooniverse, SciStarter, and CyberTracker data quality data sharing goals and use cases. the origin, development, and informatics infrastructure for CitSci our support for biodiversity projects such as population and community surveys our experiences in platform interoperability through PPSR Core working with the Zooniverse, SciStarter, and CyberTracker data quality data sharing goals and use cases. We conclude by sharing overall successes, limitations, and recommendations as they pertain to trust and rigor in citizen science data sharing and interoperability. As the scientific community moves forward, we show that Citizen Science is a key tool to enabling a systems-based approach to ecosystem problems. 
    more » « less
  4. Abstract

    As camera trapping has become a standard practice in wildlife ecology, developing techniques to extract additional information from images will increase the utility of generated data. Despite rapid advancements in camera trapping practices, methods for estimating animal size or distance from the camera using captured images have not been standardized. Deriving animal sizes directly from images creates opportunities to collect wildlife metrics such as growth rates or changes in body condition. Distances to animals may be used to quantify important aspects of sampling design such as the effective area sampled or distribution of animals in the camera's field‐of‐view.

    We present a method of using pixel measurements in an image to estimate animal size or distance from the camera using a conceptual model in photogrammetry known as the ‘pinhole camera model’. We evaluated the performance of this approach both using stationary three‐dimensional animal targets and in a field setting using live captive reindeerRangifer tarandusranging in size and distance from the camera.

    We found total mean relative error of estimated animal sizes or distances from the cameras in our simulation was −3.0% and 3.3% and in our field setting was −8.6% and 10.5%, respectively. In our simulation, mean relative error of size or distance estimates were not statistically different between image settings within camera models, between camera models or between the measured dimension used in calculations.

    We provide recommendations for applying the pinhole camera model in a wildlife camera trapping context. Our approach of using the pinhole camera model to estimate animal size or distance from the camera produced robust estimates using a single image while remaining easy to implement and generalizable to different camera trap models and installations, thus enhancing its utility for a variety of camera trap applications and expanding opportunities to use camera trap images in novel ways.

     
    more » « less
  5. Abstract

    Accurate estimations of animal populations are necessary for management, conservation, and policy decisions. However, methods for surveying animal communities disproportionately represent specific groups or guilds. For example, transect surveys can provide robust data for large arboreal species but underestimate cryptic or small‐bodied terrestrial species, whereas camera traps have the inverse tendency. The integration of information from multiple methodologies would provide the most complete inference on population size or responses to putative covariates, yet a simple, robust framework that allows integration and comparison of multiple data sources has been lacking. We use 27,813 counts of 35 species or species groups derived from concurrent visual transects, dung transects, and camera trap surveys in tropical forests and compare them within a generalized joint attribute modeling framework (GJAM) that both compares and integrates field‐collected dung, visual, and camera trap data to quantify the species‐ and trait‐specific differences in detection for each method. The effectiveness of survey method was strongly dependent on species, as well as animal traits. These differences in effectiveness contributed to meaningful differences in the reported strength of a known important covariate for animal communities (distance to nearest village). Data fusion through GJAM allows clear and unambiguous comparisons of the counts provided from each different methodology, the incorporation of trait information, and fusion of all three data streams to generate a more complete estimate of the effects of an anthropogenic disturbance covariate. Research and conservation resources are extremely limited, which often means that field campaigns attempt to maximize the amount of information gathered especially in remote, hard‐to‐access areas. Advances in these understudied areas will be accelerated by analytical methods that can fully leverage the total body of diverse biodiversity field data, even when they are collected using different methods. We demonstrate that survey methods vary in their effectiveness for counting species based on biological traits, but more importantly that generative models like GJAM can integrate data from multiple sources in one cohesive statistical framework to make improved inference in understudied environments.

     
    more » « less