Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a 24× speedup.
more »
« less
Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This collapse problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores—a family of interpretable similarity-based diversity metrics—to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%–170% increase in the number of effective discoveries compared to baselines.
more »
« less
- Award ID(s):
- 2118201
- PAR ID:
- 10535334
- Publisher / Repository:
- MLR
- Date Published:
- ISSN:
- 2640-3498
- Format(s):
- Medium: X
- Location:
- International Conference on Machine Learning
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Active search is a setting in adaptive experimental design where we aim to uncover members of rare, valuable class(es) subject to a budget constraint. An important consideration in this problem is diversity among the discovered targets – in many applications, diverse discoveries offer more insight and may be preferable in downstream tasks. However, most existing active search policies either assume that all targets belong to a common positive class or encourage diversity via simple heuristics. We present a novel formulation of active search with multiple target classes, characterized by a utility function chosen from a flexible family whose members encourage diversity among discoveries via a diminishing returns mechanism. We then study this problem under the Bayesian lens and prove a hardness result for approximating the optimal policy for arbitrary positive, increasing, and concave utility functions. Finally, we design an efficient, nonmyopic approximation to the optimal policy for this class of utilities and demonstrate its superior empirical performance in a variety of experimental settings, including drug discovery.more » « less
-
Sheet Metal (SM) fabrication is perhaps one of the most common metalworking technique. Despite its prevalence, SM design is manual and costly, with rigorous practices that restrict the search space, yielding suboptimal results. In contrast, we present a framework for the first automatic design of SM parts. Focusing on load bearing applications, our novel system generates a high-performing manufacturable SM that adheres to the numerous constraints that SM design entails: The resulting part minimizes manufacturing costs while adhering to structural, spatial, and manufacturing constraints. In other words, the part should be strong enough, not disturb the environment, and adhere to the manufacturing process. These desiderata sum up to an elaborate, sparse, and expensive search space. Our generative approach is a carefully designed exploration process, comprising two steps. In Segment Discovery connections from the input load to attachable regions are accumulated, and during Segment Composition the most performing valid combination is searched for. For Discovery, we define a slim grammar, and sample it for parts using a Markov-Chain Monte Carlo (MCMC) approach, ran in intercommunicating instances (i.e, chains) for diversity. This, followed by a short continuous optimization, enables building a diverse and high-quality library of substructures. During Composition, a valid and minimal cost combination of the curated substructures is selected. To improve compliance significantly without additional manufacturing costs, we reinforce candidate parts onto themselves --- a unique SM capability called self-riveting. we provide our code and data in https://github.com/amir90/AutoSheetMetal. We show our generative approach produces viable parts for numerous scenarios. We compare our system against a human expert and observe improvements in both part quality and design time. We further analyze our pipeline's steps with respect to resulting quality, and have fabricated some results for validation. We hope our system will stretch the field of SM design, replacing costly expert hours with minutes of standard CPU, making this cheap and reliable manufacturing method accessible to anyone.more » « less
-
Single-objective optimization algorithms search for the single highest quality solution with respect to an objective. Quality diversity (QD) optimization algorithms, such as Covariance Matrix Adaptation MAP-Elites (CMA-ME), search for a collection of solutions that are both high quality with respect to an objective and diverse with respect to specified measure functions. However, CMA-ME suffers from three major limitations highlighted by the QD community: prematurely abandoning the objective in favor of exploration, struggling to explore flat objectives, and having poor performance for low-resolution archives. We propose a new QD algorithm, CMA MAP-Annealing (CMA-MAE), and its differentiable QD variant, CMA-MAE via a Gradient Arborescence (CMA-MAEGA), that address all three limitations. We provide theoretical justifications for the new algorithm with respect to each limitation. Our theory informs our experiments, which support the theory and show that CMA-MAE achieves state-of-the-art performance and robustness on standard QD benchmark and reinforcement learning domains.more » « less
-
Millimeter wave (mmWave) technology is gaining momentum because of its ability to provide high data rates. However, in addition to other challenges in the operation of mmWave systems, developing cell search algorithms is a challenge due to high path loss, directional transmission, and excessive sensitivity to blockage at mmWave frequencies. Thus, the cell search schemes of long term evolution (LTE) cannot be used with mmWave networks. Exhaustive and iterative search algorithms have been proposed in literature for carrying out cell search in mmWave systems. The exhaustive search offers high probability of detection with high discovery delay while the iterative approach offers low probability of detection with low discovery delay. In this paper, we propose a hybrid algorithm that combines the strengths of exhaustive and iterative methods. We compare the three algorithms in terms of misdetection probability and discovery delay and show that hybrid search is a smarter algorithm that achieves a desired balance between probability of detection performance and discovery delay.more » « less
An official website of the United States government

