skip to main content

Title: Hierarchical Learning Algorithms for Multi-scale Expert Problems
In this paper, we study the multi-scale expert problem, where the rewards of different experts vary in different reward ranges. The performance of existing algorithms for the multi-scale expert problem degrades linearly proportional to the maximum reward range of any expert or the best expert and does not capture the non-uniform heterogeneity in the reward ranges among experts. In this work, we propose learning algorithms that construct a hierarchical tree structure based on the heterogeneity of the reward range of experts and then determine differentiated learning rates based on the reward upper bounds and cumulative empirical feedback over time. We then characterize the regret of the proposed algorithms as a function of non-uniform reward ranges and show that their regrets outperform prior algorithms when the rewards of experts exhibit non-uniform heterogeneity in different ranges. Last, our numerical experiments verify our algorithms' efficiency compared to previous algorithms.  more » « less
Award ID(s):
2106299 2136199 2045641 2102963 1908298
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
Page Range / eLocation ID:
1 to 29
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ribonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the “folding problem”—given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms largely rely on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs are also investigated and recently been shown to outperform free energy–based algorithms on several experimental data sets. In this work, we introduce the new ExpertRNA algorithm that provides a modular framework that can easily incorporate an arbitrary number of rewards (free energy or nonparametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions of nonpseudoknotted secondary structures than the structure prediction algorithm used, thus validating the promise of the approach. Summary of Contribution: ExpertRNA is a new algorithm inspired by a biological problem. It is applied to solve the problem of secondary structure prediction for RNA molecules given an input sequence. The computational contribution is given by the design of a multibranch, multiexpert rollout algorithm that enables the use of several state-of-the-art approaches as base heuristics and allowing several experts to evaluate partial candidate solutions generated, thus avoiding assuming the reward being optimized by an RNA molecule when folding. Our implementation allows for the effective use of parallel computational resources as well as to control the size of the rollout tree as the algorithm progresses. The problem of RNA secondary structure prediction is of primary importance within the biology field because the molecule structure is strongly related to its functionality. Whereas the contribution of the paper is in the algorithm, the importance of the application makes ExpertRNA a showcase of the relevance of computationally efficient algorithms in supporting scientific discovery. 
    more » « less
  2. A long-standing question in robot hand design is how accurate tactile sensing must be. This paper uses simulated tactile signals and the reinforcement learning (RL) framework to study the sensing needs in grasping systems. Our first experiment investigates the need for rich tactile sensing in the rewards of RL-based grasp refinement algorithms for multi-fingered robotic hands. We systematically integrate different levels of tactile data into the rewards using analytic grasp stability metrics. We find that combining information on contact positions, normals, and forces in the reward yields the highest average success rates of 95.4% for cuboids, 93.1% for cylinders, and 62.3% for spheres across wrist position errors between 0 and 7 centimeters and rotational errors between 0 and 14 degrees. This contact-based reward outperforms a non-tactile binary-reward baseline by 42.9%. Our follow-up experiment shows that when training with tactile-enabled rewards, the use of tactile information in the control policy’s state vector is drastically reducible at only a slight performance decrease of at most 6.6% for no tactile sensing in the state. Since policies do not require access to the reward signal at test time, our work implies that models trained on tactile-enabled hands are deployable to robotic hands with a smaller sensor suite, potentially reducing cost dramatically. 
    more » « less
  3. We consider the bandit problem of selecting K out of N arms at each time step. The joint reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among all possible combinations, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in N. Further, DART achieves a regret bound of Õ(K√KNT) for a time horizon T, which matches the lower bound in bandit feedback up to a factor of √log 2NT. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments. 
    more » « less
  4. Cancer screening is a large, population-based intervention that would benefit from tools enabling individually-tailored decision making to decrease unintended consequences such as overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized approaches. Partially observable Markov decision processes (POMDPs) can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDP models. Using data from the National Lung Screening Trial and our institution's breast screening registry, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was used to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards in the POMDP. The lung and breast cancer screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. Results are comparable with experts' decisions. The lung POMDP demonstrated an improved performance in terms of recall and false positive rate in the second screening and post-screening stages. Precision (0.02-0.05) was comparable to experts' (0.02-0.06). The breast POMDP has excellent recall (0.97-1.00), matching the physicians and a satisfactory false positive rate (<0.03). The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. 
    more » « less
  5. Recent advances in Augmented Reality (AR) devices and their maturity as a technology offers new modalities for interaction between learners and their learning environments. Such capabilities are particularly important for learning that involves hands-on activities where there is a compelling need to: (a) make connections between knowledge-elements that have been taught at different times, (b) apply principles and theoretical knowledge in a concrete experimental setting, (c) understand the limitations of what can be studied via models and via experiments, (d) cope with increasing shortages in teaching-support staff and instructional material at the intersection of disciplines, and (e) improve student engagement in their learning. AR devices that are integrated into training and education systems can be effectively used to deliver just-in-time informatics to augment physical workspaces and learning environments with virtual artifacts. We present a system that demonstrates a solution to a critical registration problem and enables a multi-disciplinary team to develop the pedagogical content without the need for extensive coding. The most popular approach for developing AR applications is to develop a game using a standard game engine such as UNITY or UNREAL. These engines offer a powerful environment for developing a large variety of games and an exhaustive library of digital assets. In contrast, the framework we offer supports a limited range of human environment interactions that are suitable and effective for training and education. Our system offers four important capabilities – annotation, navigation, guidance, and operator safety. These capabilities are presented and described in detail. The above framework motivates a change of focus – from game development to AR content development. While game development is an intensive activity that involves extensive programming, AR content development is a multi-disciplinary activity that requires contributions from a large team of graphics designers, content creators, domain experts, pedagogy experts, and learning evaluators. We have demonstrated that such a multi-disciplinary team of experts working with our framework can use popular content creation tools to design and develop the virtual artifacts required for the AR system. These artifacts can be archived in a standard relational database and hosted on robust cloud-based backend systems for scale up. The AR content creators can own their content and Non-fungible Tokens to sequence the presentations either to improve pedagogical novelty or to personalize the learning. 
    more » « less