Multitask learning models provide benefits by reducing model complexity and improving accuracy by concurrently learning multiple tasks with shared representations. Leveraging inductive knowledge transfer, these models mitigate the risk of overfitting on any specific task, leading to enhanced overall performance. However, supervised multitask learning models, like many neural networks, require substantial amounts of labeled data. Given the cost associated with data labeling, there is a need for an efficient label acquisition mechanism, known as multitask active learning (MTAL). In wearable sensor systems, success of MTAL largely hinges on its query strategies because active learning in such settings involves interaction with end-users (e.g., patients) for annotation. However, these strategies have not been studied in mobile health settings and wearable systems to date. While strategies like one-sided sampling, alternating sampling, and rank-combination-based sampling have been proposed in the past, their applicability in mobile sensor settings—a domain constrained by label deficit—remains largely unexplored. This study investigates the MTAL querying approaches and addresses crucial questions related to the choice of sampling methods and the effectiveness of multitask learning in mobile health applications. Utilizing two datasets on activity recognition and emotion classification, our findings reveal that rank-based sampling outperforms other techniques, particularly in tasks with high correlation. However, sole reliance on informativeness for sample selection may introduce biases into models. To address this issue, we also propose a Clustered Stratified Sampling (CSS) method in tandem with the multitask active learning query process. CSS identifies clustered mini-batches of samples, optimizing budget utilization and maximizing performance. When employed alongside rank-based query selection, our proposed CSS algorithm demonstrates up to 9% improvement in accuracy over traditional querying approaches for a 2000-query budget.
more »
« less
Efficient Argument Structure Extraction with Transfer Learning and Active Learning
The automation of extracting argument structures faces a pair of challenges on (1) encoding long-term contexts to facilitate comprehensive understanding, and (2) improving data efficiency since constructing high-quality argument structures is time-consuming. In this work, we propose a novel context-aware Transformer-based argument structure prediction model which, on five different domains, significantly outperforms models that rely on features or only encode limited contexts. To tackle the difficulty of data annotation, we examine two complementary methods: (i) transfer learning to leverage existing annotated data to boost model performance in a new target domain, and (ii) active learning to strategically identify a small amount of samples for annotation. We further propose model-independent sample acquisition strategies, which can be generalized to diverse domains. With extensive experiments, we show that our simple-yet-effective acquisition strategies yield competitive results against three strong comparisons. Combined with transfer learning, substantial F1 score boost (5-25) can be further achieved during the early iterations of active learning across domains.
more »
« less
- Award ID(s):
- 2100885
- PAR ID:
- 10354156
- Date Published:
- Journal Name:
- Findings of the Association for Computational Linguistics: ACL 2022
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Labeled data can be expensive to acquire in several application domains, including medical imaging, robotics, computer vision and wireless networks to list a few. To efficiently train machine learning models under such high labeling costs, active learning (AL) judiciously selects the most informative data instances to label on-the-fly. This active sampling process can benefit from a statistical function model, that is typically captured by a Gaussian process (GP) with well-documented merits especially in the regression task. While most GP-based AL approaches rely on a single kernel function, the present contribution advocates an ensemble of GP (EGP) models with weights adapted to the labeled data collected incrementally. Building on this novel EGP model, a suite of acquisition functions emerges based on the uncertainty and disagreement rules. An adaptively weighted ensemble of EGP-based acquisition functions is advocated to further robustify performance. Extensive tests on synthetic and real datasets in the regression task showcase the merits of the proposed EGP-based approaches with respect to the single GP-based AL alternatives.more » « less
-
Abstract Inspired by the recent achievements of machine learning in diverse domains, data-driven metamaterials design has emerged as a compelling paradigm that can unlock the potential of the multiscale architectures. The model-centric research trend, however, lacks principled frameworks dedicated to data acquisition, whose quality propagates into the downstream tasks. Built by naive space-filling design in shape descriptor space, metamaterial datasets suffer from property distributions that are either highly imbalanced or at odds with design tasks of interest. To this end, we present t-METASET: an active-learning-based data acquisition framework aiming to guide both balanced and task-aware data generation. Uniquely, we seek a solution to a commonplace yet frequently overlooked scenario at early stages of data-driven design: when a massive shape-only library has been prepared with no properties evaluated. The key idea is to harness a data-driven shape descriptor learned from generative models, fit a sparse regressor as a start-up agent, and leverage metrics related to diversity to drive data acquisition to areas that help designers fulfill design goals. We validate the proposed framework in three deployment cases, which encompass general use, task-specific use, and tailorable use. Two large-scale mechanical metamaterial datasets (∼ O(104)) are used to demonstrate the efficacy. Applicable to general design representations, t-METASET can boost future advancements in data-driven design.more » « less
-
Modern machine learning models require a large amount of labeled data for training to perform well. A recently emerging paradigm for reducing the reliance of large model training on massive labeled data is to take advantage of abundantly available labeled data from a related source task to boost the performance of the model in a desired target task where there may not be a lot of data available. This approach, which is called transfer learning, has been applied successfully in many application domains. However, despite the fact that many transfer learning algorithms have been developed, the fundamental understanding of "when" and "to what extent" transfer learning can reduce sample complexity is still limited. In this work, we take a step towards foundational understanding of transfer learning by focusing on binary classification with linear models and Gaussian features and develop statistical minimax lower bounds in terms of the number of source and target samples and an appropriate notion of similarity between source and target tasks. To derive this bound, we reduce the transfer learning problem to hypothesis testing via constructing a packing set of source and target parameters by exploiting Gilbert-Varshamov bound, which in turn leads to a lower bound on sample complexity. We also evaluate our theoretical results by experiments on real data sets.more » « less
-
null (Ed.)Zero-shot transfer learning for multi-domain dialogue state tracking can allow us to handle new domains without incurring the high cost of data acquisition. This paper proposes new zero-short transfer learning technique for dialogue state tracking where the in-domain training data are all synthesized from an abstract dialogue model and the ontology of the domain. We show that data augmentation through synthesized data can improve the accuracy of zero-shot learning for both the TRADE model and the BERT-based SUMBT model on the MultiWOZ 2.1 dataset. We show training with only synthesized in-domain data on the SUMBT model can reach about 2/3 of the accuracy obtained with the full training dataset. We improve the zero-shot learning state of the art on average across domains by 21%.more » « less
An official website of the United States government

