skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: More for less: predicting and maximizing genomic variant discovery via Bayesian nonparametrics
Abstract While the cost of sequencing genomes has decreased dramatically in recent years, this expense often remains non-trivial. Under a fixed budget, scientists face a natural trade-off between quantity and quality: spending resources to sequence a greater number of genomes or spending resources to sequence genomes with increased accuracy. Our goal is to find the optimal allocation of resources between quantity and quality. Optimizing resource allocation promises to reveal as many new variations in the genome as possible. In this paper, we introduce a Bayesian nonparametric methodology to predict the number of new variants in a follow-up study based on a pilot study. When experimental conditions are kept constant between the pilot and follow-up, we find that our prediction is competitive with the best existing methods. Unlike current methods, though, our new method allows practitioners to change experimental conditions between the pilot and the follow-up. We demonstrate how this distinction allows our method to be used for more realistic predictions and for optimal allocation of a fixed budget between quality and quantity.  more » « less
Award ID(s):
1750286
PAR ID:
10282763
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Biometrika
ISSN:
0006-3444
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider a practically motivated variant of the canonical online fair allocation problem: a decision-maker has a budget of resources to allocate over a fixed number of rounds. Each round sees a random number of arrivals, and the decision-maker must commit to an allocation for these individuals before moving on to the next round. In contrast to prior work, we consider a setting in which resources are perishable and individuals' utilities are potentially non-linear (e.g., goods exhibit complementarities). The goal is to construct a sequence of allocations that is envy-free and efficient. We design an algorithm that takes as input (i) a prediction of the perishing order, and (ii) a desired bound on envy. Given the remaining budget in each period, the algorithm uses forecasts of future demand and perishing to adaptively choose one of two carefully constructed guardrail quantities. We characterize conditions under which our algorithm achieves the optimal envy-efficiency Pareto frontier. We moreover demonstrate its strong numerical performance using data from a partnering food bank. 
    more » « less
  2. The model selection problem in the pure exploration linear bandit setting is introduced and studied in both the fixed confidence and fixed budget settings. The model selection problem considers a nested sequence of hypothesis classes of increasing complexities. Our goal is to automatically adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model, rather than suffering from the complexity measure related to the largest hypothesis class. We provide evidence showing that a standard doubling trick over dimension fails to achieve the optimal instance-dependent sample complexity. Our algorithms define a new optimization problem based on experimental design that leverages the geometry of the action set to efficiently identify a near-optimal hypothesis class. Our fixed budget algorithm uses a novel application of a selection-validation trick in bandits. This provides a new method for the understudied fixed budget setting in linear bandits (even without the added challenge of model selection). We further generalize the model selection problem to the misspecified regime, adapting our algorithms in both fixed confidence and fixed budget settings. 
    more » « less
  3. This paper studies online resource allocation with replenishable budgets, where budgets can be replenished on top of the initial budget and an agent sequentially chooses online allocation decisions without violating the available budget constraint at each round. We propose a novel online algorithm, called OACP (Opportunistic Allocation with Conservative Pricing), that conservatively adjusts dual variables while opportunistically utilizing available resources. OACP achieves a bounded asymptotic competitive ratio in adversarial settings as the number of decision rounds T gets large. Importantly, the asymptotic competitive ratio of OACP is optimal in the absence of additional assumptions on budget replenishment. To further improve the competitive ratio, we make a mild assumption that there is budget replenishment every T* ≥ 1 decision rounds and propose OACP+ to dynamically adjust the total budget assignment for online allocation. Next, we move beyond the worst-case and propose LA-OACP (Learning-Augmented OACP/OACP+), a novel learning-augmented algorithm for online allocation with replenishable budgets. We prove that LA-OACP can improve the average utility compared to OACP/OACP+ when the ML predictor is properly trained, while still offering worst-case utility guarantees when the ML predictions are arbitrarily wrong. Finally, we run simulation studies of sustainable AI inference powered by renewables, validating our analysis and demonstrating the empirical benefits of LA-OACP. 
    more » « less
  4. We study exploration in stochastic multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull, but might have reduced throughput due to nonlinear scaling. For example, in simulation-based scientific studies, an expensive simulation can be sped up by running it on multiple cores. This speed-up however, is partly offset by the communication among cores, which results in lower throughput than if fewer cores were allocated to run more trials in parallel. In this paper, we explore these trade-offs in two settings. First, in a fixed confidence setting, we need to find the best arm with a given target success probability as quickly as possible. We propose an algorithm which trades off between information accumulation and throughput and show that the time taken can be upper bounded by the solution of a dynamic program whose inputs are the gaps between the sub-optimal and optimal arms. We also prove a matching hardness result. Second, we present an algorithm for a fixed deadline setting, where we are given a time deadline and need to maximize the probability of finding the best arm. We corroborate our theoretical insights with simulation experiments that show that the algorithms consistently match or outperform baseline algorithms on a variety of problem instances. 
    more » « less
  5. Dynamic network topology can pose important challenges to communication and control protocols in networks of autonomous vehicles. For instance, maintaining connectivity is a key challenge in unmanned aerial vehicle (UAV) networks. However, tracking and computational resources of the observer module might not be sufficient for constant monitoring of all surrounding nodes in large-scale networks. In this paper, we propose an optimal measurement policy for network topology monitoring under constrained resources. To this end, We formulate the localization of multiple objects in terms of linear networked systems and solve it using Kalman filtering with intermittent observation. The proposed policy includes two sequential steps. We first find optimal measurement attempt probabilities for each target using numerical optimization methods to assign the limited number of resources among targets. The optimal resource allocation follows a waterfall-like solution to assign more resources to targets with lower measurement success probability. This provides a 10% to 60% gain in prediction accuracy. The second step is finding optimal on-off patterns for measurement attempts for each target over time. We show that a regular measurement pattern that evenly distributed resources over time outperforms the two extreme cases of using all measurement resources either in the beginning or at the end of the measurement cycle. Our proof is based on characterizing the fixed-point solution of the error covariance matrix for regular patterns. Extensive simulation results confirm the optimality of the most alternating pattern with up to 10-fold prediction improvement for different scenarios. These two guidelines define a general policy for target tracking under constrained resources with applications to network topology prediction of autonomous systems 
    more » « less