Crowdsourcing is an effective and efficient paradigm for obtaining labels for unlabeled corpus employing crowd workers. This work considers the budget allocation problem for a generalized setting on a graph of instances to be labeled where edges encode instance dependencies. Specifically, given a graph and a labeling budget, we propose an optimal policy to allocate the budget among the instances to maximize the overall labeling accuracy. We formulate the problem as a Bayesian Markov Decision Process (MDP), where we define our task as an optimization problem that maximizes the overall label accuracy under budget constraints. Then, we propose a novel stage-wise reward function that considers the effect of worker labels on the whole graph at each timestamp. This reward function is utilized to find an optimal policy for the optimization problem. Theoretically, we show that our proposed policies are consistent when the budget is infinite. We conduct extensive experiments on five real-world graph datasets and demonstrate the effectiveness of the proposed policies to achieve a higher label accuracy under budget constraints. 
                        more » 
                        « less   
                    
                            
                            Optimal Budget Allocation for Crowdsourcing Labels for Graphs
                        
                    
    
            Crowdsourcing is an effective and efficient paradigm for obtaining labels for unlabeled corpus employing crowd workers. This work considers the budget allocation problem for a generalized setting on a graph of instances to be labeled where edges encode instance dependencies. Specifically, given a graph and a labeling budget, we propose an optimal policy to allocate the budget among the instances to maximize the overall labeling accuracy. We formulate the problem as a Bayesian Markov Decision Process (MDP), where we define our task as an optimization problem that maximizes the overall label accuracy under budget constraints. Then, we propose a novel stage-wise reward function that considers the effect of worker labels on the whole graph at each timestamp. This reward function is utilized to find an optimal policy for the optimization problem. Theoretically, we show that our proposed policies are consistent when the budget is infinite. We conduct extensive experiments on five real-world graph datasets and demonstrate the effectiveness of the proposed policies to achieve a higher label accuracy under budget constraints. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1909879
- PAR ID:
- 10436387
- Date Published:
- Journal Name:
- Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI)
- Volume:
- 216
- Page Range / eLocation ID:
- 1154--1163
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Evans, Robin J.; Shpitser, Ilya (Ed.)Crowdsourcing is an effective and efficient paradigm for obtaining labels for unlabeled corpus employing crowd workers. This work considers the budget allocation problem for a generalized setting on a graph of instances to be labeled where edges encode instance dependencies. Specifically, given a graph and a labeling budget, we propose an optimal policy to allocate the budget among the instances to maximize the overall labeling accuracy. We formulate the problem as a Bayesian Markov Decision Process (MDP), where we define our task as an optimization problem that maximizes the overall label accuracy under budget constraints. Then, we propose a novel stage-wise reward function that considers the effect of worker labels on the whole graph at each timestamp. This reward function is utilized to find an optimal policy for the optimization problem. Theoretically, we show that our proposed policies are consistent when the budget is infinite. We conduct extensive experiments on five real-world graph datasets and demonstrate the effectiveness of the proposed policies to achieve a higher label accuracy under budget constraints.more » « less
- 
            Poupart, Pascal (Ed.)Incentive design, also known as model design or environment design for Markov decision processes(MDPs), refers to a class of problems in which a leader can incentivize his follower by modifying the follower's reward function, in anticipation that the follower's optimal policy in the resulting MDP can be desirable for the leader's objective. In this work, we propose gradient-ascent algorithms to compute the leader's optimal incentive design, despite the lack of knowledge about the follower's reward function. First, we formulate the incentive design problem as a bi-level optimization problem and demonstrate that, by the softmax temporal consistency between the follower's policy and value function, the bi-level optimization problem can be reduced to single-level optimization, for which a gradient-based algorithm can be developed to optimize the leader's objective. We establish several key properties of incentive design in MDPs and prove the convergence of the proposed gradient-based method. Next, we show that the gradient terms can be estimated from observations of the follower's best response policy, enabling the use of a stochastic gradient-ascent algorithm to compute a locally optimal incentive design without knowing or learning the follower's reward function. Finally, we analyze the conditions under which an incentive design remains optimal for two different rewards which are policy invariant. The effectiveness of the proposed algorithm is demonstrated using a small probabilistic transition system and a stochastic gridworld.more » « less
- 
            Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO. Although DPO has rapidly gained popularity due to its straightforward training process and competitive results, there is an open question of whether there remain practical advantages of using a discriminator, like a reward model, to evaluate responses. We propose D2PO, discriminator-guided DPO, an approach for the online setting where preferences are being collected throughout learning. As we collect gold preferences, we use these not only to train our policy, but to train a discriminative response evaluation model to silver-label even more synthetic data for policy training. We explore this approach across a set of diverse tasks, including a realistic chat setting, we find that our approach leads to higher-quality outputs compared to DPO with the same data budget, and greater efficiency in terms of preference data requirements. Furthermore, we show conditions under which silver labeling is most helpful: it is most effective when training the policy with DPO, outperforming traditional PPO, and benefits from maintaining a separate discriminator from the policy model.more » « less
- 
            Wearable internet of things (IoT) devices can enable a variety of biomedical applications, such as gesture recognition, health monitoring, and human activity tracking. Size and weight constraints limit the battery capacity, which leads to frequent charging requirements and user dissatisfaction. Minimizing the energy consumption not only alleviates this problem, but also paves the way for self-powered devices that operate on harvested energy. This paper considers an energy-optimal gesture recognition application that runs on energy-harvesting devices. We first formulate an optimization problem for maximizing the number of recognized gestures when energy budget and accuracy constraints are given. Next, we derive an analytical energy model from the power consumption measurements using a wearable IoT device prototype. Then, we prove that maximizing the number of recognized gestures is equivalent to minimizing the duration of gesture recognition. Finally, we utilize this result to construct an optimization technique that maximizes the number of gestures recognized under the energy budget constraints while satisfying the recognition accuracy requirements. Our extensive evaluations demonstrate that the proposed analytical model is valid for wearable IoT applications, and the optimization approach increases the number of recognized gestures by up to 2.4× compared to a manual optimization.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    