Obtaining annotations for large training sets is expen- sive, especially in settings where domain knowledge is re- quired, such as behavior analysis. Weak supervision has been studied to reduce annotation costs by using weak la- bels from task-specific labeling functions (LFs) to augment ground truth labels. However, domain experts still need to hand-craft different LFs for different tasks, limiting scal- ability. To reduce expert effort, we present AutoSWAP: a framework for automatically synthesizing data-efficient task-level LFs. The key to our approach is to efficiently represent expert knowledge in a reusable domain-specific language and more general domain-level LFs, with which we use state-of-the-art program synthesis techniques and a small labeled dataset to generate task-level LFs. Addition- ally, we propose a novel structural diversity cost that allows for efficient synthesis of diverse sets of LFs, further improv- ing AutoSWAP’s performance. We evaluate AutoSWAP in three behavior analysis domains and demonstrate that Au- toSWAP outperforms existing approaches using only a frac- tion of the data. Our results suggest that AutoSWAP is an effective way to automatically generate LFs that can signif- icantly reduce expert effort for behavior analysis.
more »
« less
Task Programming: Learning Data Efficient Behavior Representations
Specialized domain knowledge is often necessary to ac- curately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from do- main experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annotation-sample efficient trajectory embedding for be- havior analysis, based on multi-task self-supervised learn- ing. The tasks in our method can be efficiently engineered by domain experts through a process we call “task program- ming”, which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks. We evaluate this trade-off using data from behavioral neuro- science, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy com- pared to state-of-the-art features. Our results thus suggest that task programming and self-supervision can be an ef- fective way to reduce annotation effort for domain experts.
more »
« less
- Award ID(s):
- 1918865
- PAR ID:
- 10325777
- Date Published:
- Journal Name:
- 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- Page Range / eLocation ID:
- 2879 - 2884
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With the increased popularity of electronic textbooks, there is a growing interest in developing a new generation of “intelligent textbooks,” which have the ability to guide readers according to their learning goals and current knowledge. Intelligent textbooks extend regular textbooks by integrating machine-manipulable knowledge, and the most popular type of integrated knowledge is a list of relevant concepts mentioned in the textbooks. With these concepts, multiple intelligent operations, such as content linking, content recommendation, or student modeling, can be performed. However, existing automatic keyphrase extraction methods, even supervised ones, cannot deliver sufficient accuracy to be practically useful in this task. Manual annotation by experts has been demonstrated to be a preferred approach for producing high-quality labeled data for training supervised models. However, most researchers in the education domain still consider the concept annotation process as an ad-hoc activity rather than a carefully executed task, which can result in low-quality annotated data. Using the annotation of concepts for the Introduction to Information Retrieval textbook as a case study, this paper presents a knowledge engineering method to obtain reliable concept annotations. As demonstrated by the data we collected, the inter-annotator agreement gradually increased along with our procedure, and the concept annotations we produced led to better results in document linking and student modeling tasks. The contributions of our work include a validated knowledge engineering procedure, a codebook for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as a gold standard in further intelligent textbook research.more » « less
-
Modern recognition systems require large amounts of supervision to achieve accuracy. Adapting to new domains requires significant data from experts, which is onerous and can become too expensive. Zero-shot learning requires an annotated set of attributes for a novel category. Annotating the full set of attributes for a novel category proves to be a tedious and expensive task in deployment. This is especially the case when the recognition domain is an expert domain. We introduce a new field-guide-inspired approach to zero-shot annotation where the learner model interactively asks for the most useful attributes that define a class. We evaluate our method on classification benchmarks with attribute annotations like CUB, SUN, and AWA2 and show that our model achieves the performance of a model with full annotations at the cost of significantly fewer number of annotations. Since the time of experts is precious, decreasing annotation cost can be very valuable for real-world deployment.more » « less
-
Recent progress in data-driven vision and language-based tasks demands developing training datasets enriched with multiple modalities representing human intelligence. The link between text and image data is one of the crucial modalities for developing AI models. The development process of such datasets in the video domain requires much effort from researchers and annotators (experts and non-experts). Researchers re-design annotation tools to extract knowledge from annotators to answer new research questions. The whole process repeats for each new question which is timeconsuming. However, since the last decade, there has been little change in how the researchers and annotators interact with the annotation process. We revisit the annotation workflow and propose a concept of an adaptable and scalable annotation tool. The concept emphasizes its users’ interactivity to make annotation process design seamless and efficient. Researchers can conveniently add newer modalities to or augment the extant datasets using the tool. The annotators can efficiently link free-form text to image objects. For conducting human-subject experiments on any scale, the tool supports the data collection for attaining group ground truth. We have conducted a case study using a prototype tool between two groups with the participation of 74 non-expert people. We find that the interactive linking of free-form text to image objects feels intuitive and evokes a thought process resulting in a high-quality annotation. The new design shows ≈ 35% improvement in the data annotation quality. On UX evaluation, we receive above-average positive feedback from 25 people regarding convenience, UI assistance, usability, and satisfaction.more » « less
-
Frasson, C.; Mylonas, P.; Troussas, C. (Ed.)Domain modeling is an important task in designing, developing, and deploying intelligent tutoring systems and other adaptive instructional systems. We focus here on the more specific task of automatically extracting a domain model from textbooks. In particular, this paper explores using multiple textbook indexes to extract a domain model for computer programming. Our approach is based on the observation that different experts, i.e., authors of intro-to-programming textbooks in our case, break down a domain in slightly different ways, and identifying the commonalities and differences can be very revealing. To this end, we present automated approaches to extracting domain models from multiple textbooks and compare the resulting common domain model with a domain model created by experts. Specifically, we use approximate string-matching approaches to increase coverage of the resulting domain model and majority voting across different textbooks to discover common domain terms related to computer programming. Our results indicate that using approximate string matching gives more accurate domain models for computer programming with increased precision and recall. By automating our approach, we can significantly reduce the time and effort required to construct high-quality domain models, making it easy to develop and deploy tutoring systems. Furthermore, we obtain a common domain model that can serve as a benchmark or skeleton that can be used broadly and adapted to specific needs by others.more » « less
An official website of the United States government

