skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Supersense and Sensibility: Proxy Tasks for Semantic Annotation of Prepositions
Prepositional supersense annotation is time-consuming and requires expert training. Here, we present two sensible methods for obtaining prepositional supersense annotations indirectly by eliciting surface substitution and similarity judgments. Four pilot studies suggest that both methods have potential for producing prepositional supersense annotations that are comparable in quality to expert annotations.  more » « less
Award ID(s):
1812778
PAR ID:
10318049
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 14th Linguistic Annotation Workshop
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper explores the application of sensemaking theory to support non-expert crowds in intricate data annotation tasks. We investigate the influence of procedural context and data context on the annotation quality of novice crowds, defining procedural context as completing multiple related annotation tasks on the same data point, and data context as annotating multiple data points with semantic relevance. We conducted a controlled experiment involving 140 non-expert crowd workers, who generated 1400 event annotations across various procedural and data context levels. Assessments of annotations demonstrate that high procedural context positively impacts annotation quality, although this effect diminishes with lower data context. Notably, assigning multiple related tasks to novice annotators yields comparable quality to expert annotations, without costing additional time or effort. We discuss the trade-offs associated with procedural and data contexts and draw design implications for engaging non-experts in crowdsourcing complex annotation tasks. 
    more » « less
  2. Producing dense 3D reconstructions from biological imaging data is a challenging instance segmentation task that requires significant ground-truth training data for effective and accurate deep learning-based models. Generating training data requires intense human effort to annotate each instance of an object across serial section images. Our focus is on the especially complicated brain neuropil, comprising an extensive interdigitation of dendritic, axonal, and glial processes visualized through serial section electron microscopy. We developed a novel deep learning-based method to generate dense 3D segmentations rapidly from sparse 2D annotations of a few objects on single sections. Models trained on the rapidly generated segmentations achieved similar accuracy as those trained on expert dense ground-truth annotations. Human time to generate annotations was reduced by three orders of magnitude and could be produced by non-expert annotators. This capability will democratize generation of training data for large image volumes needed to achieve brain circuits and measures of circuit strengths. 
    more » « less
  3. Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/. 
    more » « less
  4. This paper presents a domain-guided approach for learning representations of scalp-electroencephalograms (EEGs) without relying on expert annotations. Expert labeling of EEGs has proven to be an unscalable process with low inter-reviewer agreement because of the complex and lengthy nature of EEG recordings. Hence, there is a need for machine learning (ML) approaches that can leverage expert domain knowledge without incurring the cost of labor-intensive annotations. Self-supervised learning (SSL) has shown promise in such settings, although existing SSL efforts on EEG data do not fully exploit EEG domain knowledge. Furthermore, it is unclear to what extent SSL models generalize to unseen tasks and datasets. Here we explore whether SSL tasks derived in a domain-guided fashion can learn generalizable EEG representations. Our contributions are three-fold: 1) we propose novel SSL tasks for EEG based on the spatial similarity of brain activity, underlying behavioral states, and age-related differences; 2) we present evidence that an encoder pretrained using the proposed SSL tasks shows strong predictive performance on multiple downstream classifications; and 3) using two large EEG datasets, we show that our encoder generalizes well to multiple EEG datasets during downstream evaluations. 
    more » « less
  5. Current methods used to quantify brain size and compartmental scaling relationships in studies of social insect brain evolution involve manual annotations of images from histological samples, confocal microscopy or other sources. This process is susceptible to human bias and error and requires time-consuming effort by expert annotators. Standardized brain atlases, constructed through 3D registration and automatic segmentation, surmount these issues while increasing throughput to robustly sample diverse morphological and behavioral phenotypes. Here we design and evaluate three strategies to construct statistical brain atlases, or templates, using ants as a model taxon. The first technique creates a template by registering multiple brains of the same species. Brain regions are manually annotated on the template, and the labels are transformed back to each individual brain to obtain an automatic annotation, or to any other brain aligned with the template. The second strategy also creates a template from multiple brain images but obtains labels as a consensus from multiple manual annotations of individual brains comprising the template. The third technique is based on a template comprising brains from multiple species and the consensus of their labels. We used volume similarity as a metric to evaluate the automatic segmentation produced by each method against the inter- and intra-individual variability of human expert annotators. We found that automatic and manual methods are equivalent in volume accuracy, making the template technique an extraordinary tool to accelerate data collection and reduce human bias in the study of the evolutionary neurobiology of ants and other insects. 
    more » « less