Adaptive data collection for robust learning across multiple distributions

Zang, Chengbo; Turkcan, Mehme; Zussman, Gil; Kostic, Zoran; Ghaderi, Javad

Citation Details

This content will become publicly available on May 1, 2026

Adaptive data collection for robust learning across multiple distributions

We propose a framework for adaptive data collection aimed at robust learning in multi-distribution scenarios under a fixed data collection budget. In each round, the algorithm selects a distribution source to sample from for data collection and updates the model parameters accordingly. The objective is to find the model parameters that minimize the expected loss across all the data sources. Our approach integrates upper-confidence-bound (UCB) sampling with online gradient descent (OGD) to dynamically collect and annotate data from multiple sources. By bridging online optimization and multi-armed bandits, we provide theoretical guarantees for our UCB-OGD approach, demonstrating that it achieves a minimax regret of O(T 1 2 (K ln T) 1 2 ) over K data sources after T rounds. We further provide a lower bound showing that the result is optimal up to a ln T factor. Extensive evaluations on standard datasets and a real-world testbed for object detection in smartcity intersections validate the consistent performance improvements of our method compared to baselines such as random sampling and various active learning methods. more »

Award ID(s):: 2038984

PAR ID:: 10639774

Author(s) / Creator(s):: Zang, Chengbo; Turkcan, Mehme; Zussman, Gil; Kostic, Zoran; Ghaderi, Javad

Publisher / Repository:: in Proc. ICML’25, 2025

Date Published:: 2025-05-01

Format(s):: Medium: X

Location:: Vancouver, Canada

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on May 1, 2026
Conference Proceeding:
The DOI is not currently available.

More Like this