This paper explores the application of sensemaking theory to support non-expert crowds in intricate data annotation tasks. We investigate the influence of procedural context and data context on the annotation quality of novice crowds, defining procedural context as completing multiple related annotation tasks on the same data point, and data context as annotating multiple data points with semantic relevance. We conducted a controlled experiment involving 140 non-expert crowd workers, who generated 1400 event annotations across various procedural and data context levels. Assessments of annotations demonstrate that high procedural context positively impacts annotation quality, although this effect diminishes with lower data context. Notably, assigning multiple related tasks to novice annotators yields comparable quality to expert annotations, without costing additional time or effort. We discuss the trade-offs associated with procedural and data contexts and draw design implications for engaging non-experts in crowdsourcing complex annotation tasks.
This content will become publicly available on March 4, 2025
Are You Serious? Handling Disagreement When Annotating Conspiracy Theory Texts
We often assume that annotation tasks, such as annotating for the presence of conspiracy theories, can be annotated with hard labels, without definitions or guidelines. Our annotation experiments, comparing students and experts, show that there is little agreement on basic annotations even among experts. For this reason, we conclude that we need to accept disagreement as an integral part of such annotations.
more »
« less
- Award ID(s):
- 2123635
- PAR ID:
- 10515010
- Editor(s):
- Henning, Sophie; Stede, Manfred
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Journal Name:
- Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
- Format(s):
- Medium: X
- Location:
- St. Julians, Malta
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is especially the case when the recognition domain is an expert domain. We introduce a new field-guide-inspired approach to zero-shot annotation where the learner model interactively asks for the most useful attributes that define a class. We evaluate our method on classification benchmarks with attribute annotations like CUB, SUN, and AWA2 and show that our model achieves the performance of a model with full annotations at the cost of significantly fewer number of annotations. Since the time of experts is precious, decreasing annotation cost can be very valuable for real-world deployment.more » « less
-
With the increased popularity of electronic textbooks, there is a growing interest in developing a new generation of “intelligent textbooks,” which have the ability to guide readers according to their learning goals and current knowledge. Intelligent textbooks extend regular textbooks by integrating machine-manipulable knowledge, and the most popular type of integrated knowledge is a list of relevant concepts mentioned in the textbooks. With these concepts, multiple intelligent operations, such as content linking, content recommendation, or student modeling, can be performed. However, existing automatic keyphrase extraction methods, even supervised ones, cannot deliver sufficient accuracy to be practically useful in this task. Manual annotation by experts has been demonstrated to be a preferred approach for producing high-quality labeled data for training supervised models. However, most researchers in the education domain still consider the concept annotation process as an ad-hoc activity rather than a carefully executed task, which can result in low-quality annotated data. Using the annotation of concepts for the Introduction to Information Retrieval textbook as a case study, this paper presents a knowledge engineering method to obtain reliable concept annotations. As demonstrated by the data we collected, the inter-annotator agreement gradually increased along with our procedure, and the concept annotations we produced led to better results in document linking and student modeling tasks. The contributions of our work include a validated knowledge engineering procedure, a codebook for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as a gold standard in further intelligent textbook research.more » « less
-
There are huge on-going challenges to timely access of accurate online biomedical content due to exponential growth of unstructured biomedical data. Therefore, semantic annotations are essentially required with the biomedical content in order to improve search engines’ context-aware indexing, search efficiency, and precision of the retrieved results. In this study, we propose a personalized semantic annotation recommendations approach to biomedical content through an expanded socio-technical approach. Our layered architecture generates annotations on the users’ entered text in the first layer. To optimize the yielded annotations, users can seek help from professional experts by posing specific questions to them. The socio-technical system also connects help seekers (users) to help providers (experts) employing the pre-trained BERT embedding, which matches the profile similarity scores of users and experts at various levels and suggests a run-time compatible match (of the help seeker and the help provider). Our approach overcomes previous systems’ limitations as they are predominantly non-collaborative and laborious. While performing experiments, we analyzed the performance enhancements offered by our socio-technical approach in improving the semantic annotations in three scenarios in various contexts. Our results show overall achievement of 89.98% precision, 89.61% recall, and an 89.45% f1-score at the system level. Comparatively speaking, a high accuracy of 90% was achieved with the socio-technical approach whereas the traditional approach could only reach 87% accuracy. Our novel socio-technical approach produces apt annotation recommendations that would definitely be helpful for various secondary uses ranging from context-aware indexing to retrieval accuracy improvements.more » « less
-
Vlachos, Andreas ; Augenstein, Isabelle (Ed.)Large-scale, high-quality corpora are critical for advancing research in coreference resolution. However, existing datasets vary in their definition of coreferences and have been collected via complex and lengthy guidelines that are curated for linguistic experts. These concerns have sparked a growing interest among researchers to curate a unified set of guidelines suitable for annotators with various backgrounds. In this work, we develop a crowdsourcing-friendly coreference annotation methodology, ezCoref, consisting of an annotation tool and an interactive tutorial. We use ezCoref to re-annotate 240 passages from seven existing English coreference datasets (spanning fiction, news, and multiple other domains) while teaching annotators only cases that are treated similarly across these datasets. Surprisingly, we find that reasonable quality annotations were already achievable (90% agreement between the crowd and expert annotations) even without extensive training. On carefully analyzing the remaining disagreements, we identify the presence of linguistic cases that our annotators unanimously agree upon but lack unified treatments (e.g., generic pronouns, appositives) in existing datasets. We propose the research community should revisit these phenomena when curating future unified annotation guidelines.more » « less