Multitask learning models provide benefits by reducing model complexity and improving accuracy by concurrently learning multiple tasks with shared representations. Leveraging inductive knowledge transfer, these models mitigate the risk of overfitting on any specific task, leading to enhanced overall performance. However, supervised multitask learning models, like many neural networks, require substantial amounts of labeled data. Given the cost associated with data labeling, there is a need for an efficient label acquisition mechanism, known as multitask active learning (MTAL). In wearable sensor systems, success of MTAL largely hinges on its query strategies because active learning in such settings involves interaction with end-users (e.g., patients) for annotation. However, these strategies have not been studied in mobile health settings and wearable systems to date. While strategies like one-sided sampling, alternating sampling, and rank-combination-based sampling have been proposed in the past, their applicability in mobile sensor settings—a domain constrained by label deficit—remains largely unexplored. This study investigates the MTAL querying approaches and addresses crucial questions related to the choice of sampling methods and the effectiveness of multitask learning in mobile health applications. Utilizing two datasets on activity recognition and emotion classification, our findings reveal that rank-based sampling outperforms other techniques, particularly in tasks with high correlation. However, sole reliance on informativeness for sample selection may introduce biases into models. To address this issue, we also propose a Clustered Stratified Sampling (CSS) method in tandem with the multitask active learning query process. CSS identifies clustered mini-batches of samples, optimizing budget utilization and maximizing performance. When employed alongside rank-based query selection, our proposed CSS algorithm demonstrates up to 9% improvement in accuracy over traditional querying approaches for a 2000-query budget. 
                        more » 
                        « less   
                    This content will become publicly available on December 14, 2025
                            
                            Direct Acquisition Optimization for Low-Budget Active Learning
                        
                    
    
            Active Learning (AL) has gained prominence in integrating data-intensive machine learning (ML) models into domains with limited labeled data. However, its effectiveness diminishes significantly when the labeling budget is low. In this paper, we first empirically observe the performance degradation of existing AL algorithms in the low-budget settings, and then introduce Direct Acquisition Optimization (DAO), a novel AL algorithm that optimizes sample selections based on expected true loss reduction. Specifically, DAO utilizes influence functions to update model parameters and incorporates an additional acquisition strategy to mitigate bias in loss estimation. This approach facilitates a more accurate estimation of the overall error reduction, without extensive computations or reliance on labeled data. Experiments demonstrate DAO's effectiveness in low budget settings, outperforming state-of-the-arts approaches across seven benchmarks. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2313131
- PAR ID:
- 10618154
- Publisher / Repository:
- Workshop on Bayesian Decision-making and Uncertainty, 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Timely and accurate bearing fault detection plays an important role in various industries. Data-driven deep learning methods have recently become a prevailing approach for bearing fault detection. Despite the success of deep learning, fault diagnosis performance is hinged upon the size of labeled data, the acquisition of which oftentimes is expensive in actual practice. Unlabeled data, on the other hand, are inexpensive. To fully utilize a large amount of unlabeled data together with limited labeled data to enhance fault detection performance, in this research, we develop a semi-supervised learning method built upon the autoencoder. In this method, a joint loss is established to account for the effects of both the labeled and unlabeled data, which is subsequently used to direct the backpropagation training. Systematic case studies using the Case Western Reserve University (CWRU) rolling bearing dataset are carried out, in which the effectiveness of this new method is verified by comparing it with other benchmark models.more » « less
- 
            Labeled data can be expensive to acquire in several application domains, including medical imaging, robotics, computer vision and wireless networks to list a few. To efficiently train machine learning models under such high labeling costs, active learning (AL) judiciously selects the most informative data instances to label on-the-fly. This active sampling process can benefit from a statistical function model, that is typically captured by a Gaussian process (GP) with well-documented merits especially in the regression task. While most GP-based AL approaches rely on a single kernel function, the present contribution advocates an ensemble of GP (EGP) models with weights adapted to the labeled data collected incrementally. Building on this novel EGP model, a suite of acquisition functions emerges based on the uncertainty and disagreement rules. An adaptively weighted ensemble of EGP-based acquisition functions is advocated to further robustify performance. Extensive tests on synthetic and real datasets in the regression task showcase the merits of the proposed EGP-based approaches with respect to the single GP-based AL alternatives.more » « less
- 
            Rolling bearing is a critical component of machinery that has been widely applied in manufacturing, transportation, aerospace, and power and energy industries. The timely and accurate bearing fault detection thus is of vital importance. Computational data-driven deep learning has recently become a prevailing approach for bearing fault detection. Despite the progress of the deep learning approach, the deep learning performance is hinged upon the size of labeled data, the acquisition of which is expensive in actual implementation. Unlabeled data, on the other hand, are inexpensive. In this research, we develop a new semi-supervised learning method built upon the autoencoder to fully utilize a large amount of unlabeled data together with limited labeled data to enhance fault detection performance. Compared with the state-of-the-art semi-supervised learning methods, this proposed method can be more conveniently implemented with fewer hyperparameters to be tuned. In this method, a joint loss is established to account for the effects of labeled and unlabeled data, which is subsequently used to direct the backpropagation training. Systematic case studies using the Case Western Reserve University (CWRU) rolling bearing dataset are carried out, in which the effectiveness of this new method is verified by comparing it with other well-established baseline methods. Specifically, nearly all emulation runs using the proposed methodology can lead to around 2%–5% accuracy increase, indicating its robustness in performance enhancement.more » « less
- 
            Bonneau, Joseph; Weinberg, S Matthew (Ed.)In a typical decentralized autonomous organization (DAO), people organize themselves into a group that is programmatically managed. DAOs can act as bidders in auctions (with ConstitutionDAO being one notable example), with a DAO’s bid typically treated by the auctioneer as if it had been submitted by an individual, without regard to any details of the internal DAO dynamics. The goal of this paper is to study auctions in which the bidders are DAOs. More precisely, we consider the design of two-level auctions in which the "participants" are groups of bidders rather than individuals. Bidders form DAOs to pool resources, but must then also negotiate the terms by which the DAO’s winnings are shared. We model the outcome of a DAO’s negotiations through an aggregation function (which aggregates DAO members' bids into a single group bid) and a budget-balanced cost-sharing mechanism (that determines DAO members' access to the DAO’s allocation and distributes the aggregate payment demanded from the DAO to its members). DAOs' bids are processed by a direct-revelation mechanism that has no knowledge of the DAO structure (and thus treats each DAO as an individual). Within this framework, we pursue two-level mechanisms that are incentive-compatible (with truthful bidding a dominant strategy for each member of each DAO) and approximately welfare-optimal. We prove that, even in the case of a single-item auction, the DAO dynamics hidden from the outer mechanism preclude incentive-compatible welfare maximization: No matter what the outer mechanism and the cost-sharing mechanisms used by DAOs, the welfare of the resulting two-level mechanism can be a ≈ ln n factor less than the optimal welfare (in the worst case over DAOs and valuation profiles). We complement this lower bound with a natural two-level mechanism that achieves a matching approximate welfare guarantee. This upper bound also extends to multi-item auctions in which individuals have additive valuations. Finally, we show that our positive results cannot be extended much further: Even in multi-item settings in which bidders have unit-demand valuations, truthful two-level mechanisms form a highly restricted class and as a consequence cannot guarantee any non-trivial approximation of the maximum social welfare.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
