Specialized domain knowledge is often necessary to ac- curately annotate training sets for in-depth analysis, but can be burdensome and time-consuming to acquire from do- main experts. This issue arises prominently in automated behavior analysis, in which agent movements or actions of interest are detected from video tracking data. To reduce annotation effort, we present TREBA: a method to learn annotation-sample efficient trajectory embedding for be- havior analysis, based on multi-task self-supervised learn- ing. The tasks in our method can be efficiently engineered by domain experts through a process we call “task program- ming”, which uses programs to explicitly encode structured knowledge from domain experts. Total domain expert effort can be reduced by exchanging data annotation time for the construction of a small number of programmed tasks. We evaluate this trade-off using data from behavioral neuro- science, in which specialized domain knowledge is used to identify behaviors. We present experimental results in three datasets across two domains: mice and fruit flies. Using embeddings from TREBA, we reduce annotation burden by up to a factor of 10 without compromising accuracy com- pared to state-of-the-art features. Our results thus suggest that task programming and self-supervision can be an ef- fective way to reduce annotation effort for domain experts.
more »
« less
ADQuaTe2: A Data Quality Test Approach for Automated Constraint Discovery and Fault Detection
The quality of data is extremely important for data analytics. Data quality tests typically involve checking constraints specified by domain experts. Existing approaches detect trivial constraint violations and identify outliers without explaining the constraints that were violated. Moreover, domain experts may specify constraints in an ad hoc manner and miss important ones. We describe an automated data quality test approach, ADQuaTe2, which uses an autoencoder to (1) discover constraints that may have been missed by experts, (2) label as suspicious those records that violate the constraints, and (3) provide explanations about the violations. An interactive learning technique incorporates expert feedback, which improves the accuracy. We evaluate the effectiveness of ADQuaTe2 on real-world datasets from health and plant domains. We also use datasets from the UCI repository to evaluate the improvement in the accuracy after incorporating ground truth knowledge.
more »
« less
- Award ID(s):
- 1931324
- PAR ID:
- 10185896
- Date Published:
- Journal Name:
- ACM TAPIA
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Ground beetles are a highly sensitive and speciose biolog- ical indicator, making them vital for monitoring biodiver- sity. However, they are currently an underutilized resource due to the manual effort required by taxonomic experts to perform challenging species differentiations based on sub- tle morphological differences, precluding widespread ap- plications. In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets spanning over 230 genera and 1769 species, with images ranging from controlled laboratory settings to chal- lenging field-collected (in-situ) photographs. We further ex- plore taxonomic classification in two important real-world contexts: sample efficiency and domain adaptation. Our re- sults show that the Vision and Language Transformer com- bined with an MLP head is the best performing model, with 97% accuracy at genus and 94% at species level. Sample efficiency analysis shows that we can reduce train data re- quirements by up to 50% with minimal compromise in per- formance. The domain adaptation experiments reveal sig- nificant challenges when transferring models from lab to in-situ images, highlighting a critical domain gap. Overall, our study lays a foundation for large-scale automated tax- onomic classification of beetles, and beyond that, advances sample-efficient learning and cross-domain adaptation for diverse long-tailed ecological datasets.more » « less
-
A key challenge with supervised learning (e.g., image classification) is the shift of data distribution and domain from training to testing datasets, so-called âdomain shiftâ (or âdistribution shiftâ), which usually leads to a reduction of model accuracy. Various meta-learning approaches have been proposed to prevent the accuracy loss by learning an adaptable model with training data, and adapting it to test time data from a new data domain. However, when the domain shift occurs in multiple domain dimensions (e.g., images may be transformed by rotations, transitions, and expansions), the average predictive power of the adapted model will deteriorate. To tackle this problem, we propose a domain disentangled meta-learning (DDML) framework. DDML disentangles the data domain by dimensions, learns the representations of domain dimensions independently, and adapts to the domain of test time data. We evaluate our DDML on image classification problems using three datasets with distribution shifts over multiple domain dimensions. Comparing to various baselines in meta-learning and empirical risk minimization, our DDML approach achieves consistently higher classification accuracy with the test time data. These results demonstrate that domain disentanglement reduces the complexity of the model adaptation, thus increases the model generalizability, and prevents it from overfitting. https://doi.org/10.1137/1.9781611977653.ch61more » « less
-
Thematic Analysis (TA) is a fundamental method in healthcare research for analyzing transcript data, but it is resource-intensive and difficult to scale for large, complex datasets. This study investigates the potential of large language models (LLMs) to augment the inductive TA process in high-stakes healthcare settings. Focusing on interview transcripts from parents of children with Anomalous Aortic Origin of a Coronary Artery (AAOCA), a rare congenital heart disease, we propose an LLM-Enhanced Thematic Analysis (LLM-TA) pipeline. Our pipeline integrates an affordable state-of-the-art LLM (GPT-4o mini), LangChain, and prompt engineering with chunking techniques to analyze nine detailed transcripts following the inductive TA framework. We evaluate the LLM-generated themes against human-generated results using thematic similarity metrics, LLM-assisted assessments, and expert reviews. Results demonstrate that our pipeline outperforms existing LLM-assisted TA methods significantly. While the pipeline alone has not yet reached human-level quality in inductive TA, it shows great potential to improve scalability, efficiency, and accuracy while reducing analyst workload when working collaboratively with domain experts. We provide practical recommendations for incorporating LLMs into high-stakes TA workflows and emphasize the importance of close collaboration with domain experts to address challenges related to real-world applicability and dataset complexity.more » « less
-
Healthcare applications on Voice Personal Assistant System (e.g., Amazon Alexa), have shown a great promise to deliver personalized health services via a conversational interface. However, concerns are also raised about privacy, safety, and service quality. In this paper, we propose VerHealth, to systematically assess health-related applications on Alexa for how well they comply with existing privacy and safety policies. VerHealth contains a static module and a dynamic module based on machine learning that can trigger and detect violation behaviors hidden deep in the interaction threads. We use VerHealth to analyze 813 health-related applications on Alexa by sending over 855,000 probing questions and analyzing 863,000 responses. We also consult with three medical school students (domain experts) to confirm and assess the potential violations. We show that violations are quite common, e.g., 86.36% of them miss disclaimers when providing medical information; 30.23% of them store user physical or mental health data without approval. Domain experts believe that the applications' medical suggestions are often factually-correct but are of poor relevance, and applications should have asked more questions before providing suggestions for over half of the cases. Finally, we use our results to discuss possible directions for improvements.more » « less
An official website of the United States government

