skip to main content


Title: CitSci.org & PPSR Core: Sharing biodiversity observations across platforms
CitSci.org is a global citizen science software platform and support organization housed at Colorado State University. The mission of CitSci is to help people do high quality citizen science by amplifying impacts and outcomes. This platform hosts over one thousand projects and a diverse volunteer base that has amassed over one million observations of the natural world, focused on biodiversity and ecosystem sustainability. It is a custom platform built using open source components including: PostgreSQL, Symfony, Vue.js, with React Native for the mobile apps. CitSci sets itself apart from other Citizen Science platforms through the flexibility in the types of projects it supports rather than having a singular focus. This flexibility allows projects to define their own datasheets and methodologies. The diversity of programs we host motivated us to take a founding role in the design of the PPSR Core, a set of global, transdisciplinary data and metadata standards for use in Public Participation in Scientific Research (Citizen Science) projects. Through an international partnership between the Citizen Science Association, European Citizen Science Association, and Australian Citizen Science Association, the PPSR team and associated standards enable interoperability of citizen science projects, datasets, and observations. Here we share our experience over the past 10+ years of supporting biodiversity research both as developers of the CitSci.org platform and as stewards of, and contributors to, the PPSR Core standard. Specifically, we share details about: the origin, development, and informatics infrastructure for CitSci our support for biodiversity projects such as population and community surveys our experiences in platform interoperability through PPSR Core working with the Zooniverse, SciStarter, and CyberTracker data quality data sharing goals and use cases. the origin, development, and informatics infrastructure for CitSci our support for biodiversity projects such as population and community surveys our experiences in platform interoperability through PPSR Core working with the Zooniverse, SciStarter, and CyberTracker data quality data sharing goals and use cases. We conclude by sharing overall successes, limitations, and recommendations as they pertain to trust and rigor in citizen science data sharing and interoperability. As the scientific community moves forward, we show that Citizen Science is a key tool to enabling a systems-based approach to ecosystem problems.  more » « less
Award ID(s):
1835272
PAR ID:
10310933
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Biodiversity Information Science and Standards
Volume:
5
ISSN:
2535-0897
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Involving the public in scientific discovery offers opportunities for engagement, learning, participation, and action. Since its launch in 2007, the CitSci.org platform has supported hundreds of community-driven citizen science projects involving thousands of participants who have generated close to a million scientific measurements around the world. Members using CitSci.org follow their curiosities and concerns to develop, lead, or simply participate in research projects. While professional scientists are trained to make ethical determinations related to the collection of, access to, and use of information, citizen scientists and practitioners may be less aware of such issues and more likely to become involved in ethical dilemmas. In this era of big and open data, where data sharing is encouraged and open science is promoted, privacy and openness considerations can often be overlooked. Platforms that support the collection, use, and sharing of data and personal information need to consider their responsibility to protect the rights to and ownership of data, the provision of protection options for data and members, and at the same time provide options for openness. This requires critically considering both intended and unintended consequences of the use of platforms, data, and volunteer information. Here, we use our journey developing CitSci.org to argue that incorporating customization into platforms through flexible design options for project managers shifts the decision-making from top-down to bottom-up and allows project design to be more responsive to goals. To protect both people and data, we developed—and continue to improve—options that support various levels of “open” and “closed” access permissions for data and membership participation. These options support diverse governance styles that are responsive to data uses, traditional and indigenous knowledge sensitivities, intellectual property rights, personally identifiable information concerns, volunteer preferences, and sensitive data protections. We present a typology for citizen science openness choices, their ethical considerations, and strategies that we are actively putting into practice to expand privacy options and governance models based on the unique needs of individual projects using our platform. 
    more » « less
  2. Many scientific domains gather sufficient labels to train machine algorithms through human-in-the-loop techniques provided by the this http URL citizen science platform. As the range of projects, task types and data rates increase, acceleration of model training is of paramount concern to focus volunteer effort where most needed. The application of Transfer Learning (TL) between Zooniverse projects holds promise as a solution. However, understanding the effectiveness of TL approaches that pretrain on large-scale generic image sets vs. images with similar characteristics possibly from similar tasks is an open challenge. We apply a generative segmentation model on two Zooniverse project-based data sets: (1) to identify fat droplets in liver cells (FatChecker; FC) and (2) the identification of kelp beds in satellite images (Floating Forests; FF) through transfer learning from the first project. We compare and contrast its performance with a TL model based on the COCO image set, and subsequently with baseline counterparts. We find that both the FC and COCO TL models perform better than the baseline cases when using >75% of the original training sample size. The COCO-based TL model generally performs better than the FC-based one, likely due to its generalized features. Our investigations provide important insights into usage of TL approaches on multi-domain data hosted across different Zooniverse projects, enabling future projects to accelerate task completion. 
    more » « less
  3. This paper introduces the citizen science platform, LanguageARC, developed within the NIEUW (Novel Incentives and Workflows) project supported by the National Science Foundation under Grant No. 1730377. LanguageARC is a community- oriented online platform bringing together researchers and “citizen linguists” with the shared goal of contributing to linguistic research and language technology development. Like other Citizen Science platforms and projects, LanguageARC harnesses the power and efforts of volunteers who are motivated by the incentives of contributing to science, learning and discovery, and belonging to a community dedicated to social improvement. Citizen linguists contribute language data and judgments by participating in research tasks such as classifying regional accents from audio clips, recording audio of picture descriptions and answering personality questionnaires to create baseline data for NLP research into autism and neurodegenerative conditions. Researchers can create projects on Language ARC without any coding or HTML required using our Project Builder Toolkit. 
    more » « less
  4. Haldorai, Anandakumar (Ed.)
    Darwin Core, the data standard used for sharing modern biodiversity and paleodiversity occurrence records, has previously lacked proper mechanisms for reporting what is known about the estimated age range of specimens from deep time. This has led to data providers putting these data in fields where they cannot easily be found by users, which impedes the reuse and improvement of these data by other researchers. Here we describe the development of the Chronometric Age Extension to Darwin Core, a ratified, community-developed extension that enables the reporting of ages of specimens from deeper time and the evidence supporting these estimates. The extension standardizes reporting about the methods or assays used to determine an age and other critical information like uncertainty. It gives data providers flexibility about the level of detail reported, focusing on the minimum information needed for reuse while still allowing for significant detail if providers have it. Providing a standardized format for reporting these data will make them easier to find and search and enable researchers to pinpoint specimens of interest for data improvement or accumulate more data for broad temporal studies. The Chronometric Age Extension was also the first community-managed vocabulary to undergo the new Biodiversity Informatics Standards (TDWG) review and ratification process, thus providing a blueprint for future Darwin Core extension development. 
    more » « less
  5. Thanks to substantial support for biodiversity data mobilization in recent decades, billions of occurrence records are openly available, documenting life on Earth and enabling timely research, awareness raising, and policy-making. Initiatives across local to global scales have been separately funded to serve different, yet often overlapping audiences of data users, and have developed a variety of platforms and infrastructures to meet the needs of these audiences. The independent progress of biodiversity data providers has led to innovations as well as challenges for the community at large as we move towards connecting and linking a diversity of information from disparate sources as Digital Extended Specimens (DES).

    Recognizing a need for deeper and more frequent opportunities for communication and collaboration across the globe, an ad-hoc group of representatives of various international, national, and regional organizations have been meeting virtually since 2020 to provide a forum for updates, announcements, and shared progress. This group is provisionally named International Partners for the Digital Extended Specimen (IPDES), and is guided by these four concepts: Biodiversity, Connection, Knowledge and Agency. Participants in IPDES include representatives of the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), American Institute of Biological Sciences (AIBS), Biodiversity Collections Network (BCoN), Natural Science Collections Alliance (NSCA), Distributed System of Scientific Collections (DiSSCo), Atlas of Living Australia (ALA), Biodiversity Information Standards (TDWG), Society for the Preservation of Natural History Collections (SPNHC), National Specimen Information Infrastructure of China (NSII), and South African National Biodiversity Institute (SANBI), as well as individuals involved with biodiversity informatics initiatives, natural science collections, museums, herbaria, and universities. Our global partners group strives to increase representation from around the globe as we aim to enable research that contributes to novel discoveries and addresses the societal challenges leading to the biodiversity crisis. Our overarching mission is to expand on the community-driven successes to connect biodiversity data and knowledge through coordination of a globally integrated network of stakeholders to enable an extensible technical and social infrastructure of data, tools, and working practices in support of our vision.

    The main work of our group thus far includes publishing a paper on the Digital Extended Specimen (Hardisty et al. 2022), organizing and hosting an array of activities at conferences, and asynchronous online work and forum-based exchanges. We aim to advance discussion on topics of broad interest to our community such as social and technical capacity building, broadening participation, expanding social and data networks, improving data models and building a backbone for the DES, and identifying international funding solutions.

    This presentation will highlight some of these activities and detail progress towards a roadmap for the development of the human network and technical infrastructure necessary to support the DES. It provides an opportunity for feedback from and engagement by stakeholder communities such as TDWG and other initiatives with a focus on data standards and biodiversity informatics, as we solidify our plans for the future in support of integrated and interconnected biodiversity data and credit for those doing the work.

     
    more » « less