skip to main content

Title: How Do Young Community and Citizen Science Volunteers Support Scientific Research on Biodiversity? The Case of iNaturalist
Online community and citizen science (CCS) projects have broadened access to scientific research and enabled different forms of participation in biodiversity research; however, little is known about whether and how such opportunities are taken up by young people (aged 5–19). Furthermore, when they do participate, there is little research on whether their online activity makes a tangible contribution to scientific research. We addressed these knowledge gaps using quantitative analytical approaches and visualisations to investigate 249 youths’ contributions to CCS on the iNaturalist platform, and the potential for the scientific use of their contributions. We found that nearly all the young volunteers’ observations were ‘verifiable’ (included a photo, location, and date/time) and therefore potentially useful to biodiversity research. Furthermore, more than half were designated as ‘Research Grade’, with a community agreed-upon identification, making them more valuable and accessible to biodiversity science researchers. Our findings show that young volunteers with lasting participation on the platform and those aged 16–19 years are more likely to have a higher proportion of Research Grade observations than younger, or more ephemeral participants. This study enhances our understanding of young volunteers’ contributions to biodiversity research, as well as the important role professional scientists and data users can more » play in helping verify youths’ contributions to make them more accessible for biodiversity research. « less
Authors:
; ; ; ; ; ; ;
Award ID(s):
1647276
Publication Date:
NSF-PAR ID:
10344381
Journal Name:
Diversity
Volume:
13
Issue:
7
Page Range or eLocation-ID:
318
ISSN:
1424-2818
Sponsoring Org:
National Science Foundation
More Like this
  1. Lepczyk, Christopher A. (Ed.)
    Online citizen science projects have broadened options for accessing science and enabled different forms of participation in scientific research for adult and young volunteers. Yet, little is known regarding participation patterns among youth participants. Quantitative approaches were used to investigate the contribution of 183 young volunteers to citizen science on the iNaturalist platform and the participation behaviour that relates to their contribution. The participants accessed and used iNaturalist as part of one-day field-based events (bioblitzes) facilitated by museums. Compared to the observation behaviour of all iNaturalist users, as documented on the platform, the young volunteers observe fewer plants and birds, and more molluscs, arachnids and insects. The average daily contributions of young volunteers were found to be positively associated with a large proportion of active days on iNaturalist and a systematic contribution behaviour, yet negatively related to a long duration on the platform. This study enhances our understanding of young volunteers’ contributions to citizen science and provides insights for research on participation in online citizen science. Our findings have implications on how museums design the field-based events to encourage follow-up systematic participation and maintain active contribution.
  2. International collaboration between collections, aggregators, and researchers within the biodiversity community and beyond is becoming increasingly important in our efforts to support biodiversity, conservation and the life of the planet. The social, technical, logistical and financial aspects of an equitable biodiversity data landscape – from workforce training and mobilization of linked specimen data, to data integration, use and publication – must be considered globally and within the context of a growing biodiversity crisis. In recent years, several initiatives have outlined paths forward that describe how digital versions of natural history specimens can be extended and linked with associated data. In the United States, Webster (2017) presented the “extended specimen”, which was expanded upon by Lendemer et al. (2019) through the work of the Biodiversity Collections Network (BCoN). At the same time, a “digital specimen” concept was developed by DiSSCo in Europe (Hardisty 2020). Both the extended and digital specimen concepts depict a digital proxy of an analog natural history specimen, whose digital nature provides greater capabilities such as being machine-processable, linkages with associated data, globally accessible information-rich biodiversity data, improved tracking, attribution and annotation, additional opportunities for data use and cross-disciplinary collaborations forming the basis for FAIR (Findable, Accessible, Interoperable,more »Reproducible) and equitable sharing of benefits worldwide, and innumerable other advantages, with slight variation in how an extended or digital specimen model would be executed. Recognizing the need to align the two closely-related concepts, and to provide a place for open discussion around various topics of the Digital Extended Specimen (DES; the current working name for the joined concepts), we initiated a virtual consultation on the discourse platform hosted by the Alliance for Biodiversity Knowledge through GBIF. This platform provided a forum for threaded discussions around topics related and relevant to the DES. The goals of the consultation align with the goals of the Alliance for Biodiversity Knowledge: expand participation in the process, build support for further collaboration, identify use cases, identify significant challenges and obstacles, and develop a comprehensive roadmap towards achieving the vision for a global specification for data integration. In early 2021, Phase 1 launched with five topics: Making FAIR data for specimens accessible; Extending, enriching and integrating data; Annotating specimens and other data; Data attribution; and Analyzing/mining specimen data for novel applications. This round of full discussion was productive and engaged dozens of contributors, with hundreds of posts and thousands of views. During Phase 1, several deeper, more technical, or additional topics of relevance were identified and formed the foundation for Phase 2 which began in May 2021 with the following topics: Robust access points and data infrastructure alignment; Persistent identifier (PID) scheme(s); Meeting legal/regulatory, ethical and sensitive data obligations; Workforce capacity development and inclusivity; Transactional mechanisms and provenance; and Partnerships to collaborate more effectively. In Phase 2 fruitful progress was made towards solutions to some of these complex functional and technical long-term goals. Simultaneously, our commitment to open participation was reinforced, through increased efforts to involve new voices from allied and complementary fields. Among a wealth of ideas expressed, the community highlighted the need for unambiguous persistent identifiers and a dedicated agent to assign them, support for a fully linked system that includes robust publishing mechanisms, strong support for social structures that build trustworthiness of the system, appropriate attribution of legacy and new work, a system that is inclusive, removed from colonial practices, and supportive of creative use of biodiversity data, building a truly global data infrastructure, balancing open access with legal obligations and ethical responsibilities, and the partnerships necessary for success. These two consultation periods, and the myriad activities surrounding the online discussion, produced a wide variety of perspectives, strategies, and approaches to converging the digital and extended specimen concepts, and progressing plans for the DES -- steps necessary to improve access to research-ready data to advance our understanding of the diversity and distribution of life. Discussions continue and we hope to include your contributions to the DES in future implementation plans.« less
  3. Light microscopy provides a window into another world that is not visible to the unaided eye. Because of this and its importance in biological discoveries, the light microscope is an essential tool for scientific studies. It can also be used with a variety of easily obtained specimens to provide dramatic demonstrations of previously unknown features of common plants and animals. Thus, one way to interest young people in science is to start with an introduction to light microscopy. This is an especially effective strategy for individuals who attend less advantaged or under-resourced schools, as they may not have been previously exposed to scientific concepts in their classes. However, introducing light microscopy lessons in the classroom can be challenging because of the high cost of light microscopes, even those that are relatively basic, in addition to their usual large size. Efforts are underway by our laboratory in collaboration with the Biophysical Society (BPS) to introduce young people to light microscopy using small, easy-to-assemble wooden microscopes developed by Echo Laboratories. The microscopes are available online as low-cost kits ($10 each with shipping), each consisting of 19 parts printed onto an 81⁄2 x 11 inch sheet of light-weight wood (Fig. 1). After punchingmore »out the pieces, they can be assembled into a microscope with a moveable stage and a low-power lens, also provided in the kit (Fig. 2). Photos taken with a cell phone through the microscope lens can give magnifications of ~16-18x, or higher. At these magnifications, features of specimens that are not visible to the unaided eye can be easily observed, e.g., small hairs on the margins of leaves or lichens [1]. As a member of the BPS Education Committee, one of us (SAE) wrote a Lesson Plan on Light Microscopy specifically for use with the wooden microscopes. SAE was also able to obtain a gift of 500 wooden microscope kits for the BPS from Echo Laboratories and Chroma Technology Corp in 2016. The wooden microscope kits, together with the lesson plan, have provided the materials for our present outreach efforts. Rather than giving out the wooden microscope kits to individuals, the BPS asked the Education Committee to maximize the impact of the gift by distributing the microscopes with the Lesson Plan on Light Microscopy to teachers, e.g., through teachers’ workshops or outreach sessions. This strategy was devised to enable the Society to reach a larger number of young people than by giving the microscopes to individuals. The Education Committee first evaluated the microscopes as a tool to introduce students to scientific concepts by providing microscopes to a BPS member at the National University of Colombia who conducted a workshop on Sept 19-24, 2016 in Tumaco, Columbia. During the workshop, which involved 120 high school girls and 80 minority students, including Afro-Colombian and older students, the students built the wooden microscopes and examined specimens, and compared the microscopes to a conventional light microscope. Assembling the wooden microscopes was found to be a useful procedure that was similar to a scientific protocol, and encouraged young girls and older students to participate in science. This was especially promising in Colombia, where there are few women in science and little effort to increase women in STEM fields. Another area of outreach emerged recently when one of us, USP, an undergraduate student at Duke University, who was taught by SAE how to assemble the wooden microscopes and how to use the lesson plan, took three wooden microscopes on a visit to her family in Bangalore, India in summer 2018 [2]. There she organized and led three sessions in state run, under-resourced government schools, involving classes of ~25-40 students each. This was very successful – the students enjoyed learning about the microscopes and building them, and the science teachers were interested in expanding the sessions to other government schools. USP taught the teachers how to assemble and use the microscopes and gave the teachers the microscopes and lesson plan, which is also available to the public at the BPS web site. She also met with a founder of the organization, Whitefield Rising, which is working to improve teaching in government schools, and taught her and several volunteers how to assemble the microscopes and conduct the sessions. The Whitefield Rising members have been able to conduct nine further sessions in Bangalore over the past ~18 months (Fig. 3), using microscope kits provided to them by the BPS. USP has continued to work with members of the Whitefield Rising group during her summer and winter breaks on visits to Bangalore. Recently she has been working with another volunteer group that has expanded the outreach efforts to New Delhi. The light microscopy outreach that our laboratory is conducting in India in collaboration with the BPS is having a positive impact because we have been able to develop a partnership with volunteers in Bangalore and New Delhi. The overall goal is to enhance science education globally, especially in less advantaged schools, by providing a low-cost microscope that can be used to introduce students to scientific concepts.« less
  4. The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert datamore »scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating. The challenge: enabling wide participation With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5]. In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles: 1. No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.) 2. No participant should need to be an expert data scientist. 3. No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.) 4. No participant should need to provide their own computing resources. In addition to the above, our platform should further • guide participants through the entire process from sign-up to model submission, • facilitate collaboration, and • provide instant feedback to the participants through data visualisation and intermediate online leaderboards. The platform The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team. Figure 1: High-level architecture of the challenge platform Measuring success The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication. Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists. Figure 2: Primary occupation as reported by challenge participants Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience. Conclusion Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge.« less
  5. ABSTRACT During the COVID-19 pandemic, biology educators were forced to think of ways to communicate with their students, engaging them in science and with the scientific community. For educators using course-based undergraduate research experiences (CUREs), the challenge to have students perform real science, analyze their work, and present their results to a larger scientific audience was difficult as the world moved online. Many instructors were able to adapt CUREs utilizing online data analysis and virtual meeting software for class discussions and synchronous learning. However, interaction with the larger scientific community, an integral component of making science relevant for students and allowing them to network with other young scientists and experts in their fields, was still missing. Even before COVID-19, a subset of students would travel to regional or national meetings to present their work, but most did not have these opportunities. With over 300 million active users, Twitter provided a unique platform for students to present their work to a large and varied audience. The Cell Biology Education Consortium hosted an innovative scientific poster session entirely on Twitter to engage undergraduate researchers with one another and with the much broader community. The format for posting on this popular social mediamore »platform challenged students to simplify their science and make their points using only a few words and slides. Nineteen institutions and over one hundred students participated in this event. Even though these practices emerged as a necessity during the COVID-19 pandemic, the Twitter presentation strategy shared in this paper can be used widely.« less