NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Vantage Point Selection Algorithms for Bottleneck Capacity Estimation

https://doi.org/10.4230/lipics.wads.2025.6

Ashvinkumar, Vikrant; Chowdhury, Rezaul; Gao, Jie; Goswami, Mayank; Mitchell, Joseph_S B; Polishchuk, Valentin (August 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Morin, Pat; Oh, Eunjin (Ed.)
Motivated by the problem of estimating bottleneck capacities on the Internet, we formulate and study the problem of vantage point selection. We are given a graph G = (V, E) whose edges E have unknown capacity values that are to be discovered. Probes from a vantage point, i.e, a vertex v ∈ V, along shortest paths from v to all other vertices, reveal bottleneck edge capacities along each path. Our goal is to select k vantage points from V that reveal the maximum number of bottleneck edge capacities. We consider both a non-adaptive setting where all k vantage points are selected before any bottleneck capacity is revealed, and an adaptive setting where each vantage point selection instantly reveals bottleneck capacities along all shortest paths starting from that point. In the non-adaptive setting, by considering a relaxed model where edge capacities are drawn from a random permutation (which still leaves the problem of maximizing the expected number of revealed edges NP-hard), we are able to give a 1-1/e approximate algorithm. In the adaptive setting we work with the least permissive model where edge capacities are arbitrarily fixed but unknown. We compare with the best solution for the particular input instance (i.e. by enumerating all choices of k tuples), and provide both lower bounds on instance optimal approximation algorithms and upper bounds for trees and planar graphs.
more » « less
Free, publicly-accessible full text available August 11, 2026
Maximizing Truth Learning in a Social Network is NP-hard

Uradnik, Filip; Wang, Amanda; Gao, Jie (May 2025, The 24th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS’25))

Free, publicly-accessible full text available May 23, 2026
Differentially Private Range Queries with Correlated Input Perturbation

Dharangutte, Prathamesh; Gao, Jie; Gong, Ruobin; Wang, Guanyang (May 2025, The 28th International Conference on Artificial Intelligence and Statistics (AISTATS 2025))

Free, publicly-accessible full text available May 5, 2026
On the Price of Differential Privacy for Hierarchical Clustering

Deng, Chengyuan; Gao, Jie; Upadhyay, Jalaj; Wang, Chen; Zhou, Samson (April 2025, International Conference on Representation Learning 2025 (ICLR 2025))

Free, publicly-accessible full text available April 28, 2026
Low Sensitivity Hopsets

https://doi.org/10.4230/LIPIcs.ITCS.2025.13

Ashvinkumar, Vikrant; Bernstein, Aaron; Deng, Chengyuan; Gao, Jie; Wein, Nicole (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Meka, Raghu (Ed.)
Given a weighted graph G = (V,E,w), a (β, ε)-hopset H is an edge set such that for any s,t ∈ V, where s can reach t in G, there is a path from s to t in G ∪ H which uses at most β hops whose length is in the range [dist_G(s,t), (1+ε)dist_G(s,t)]. We break away from the traditional question that asks for a hopset H that achieves small |H| and small diameter β and instead study the sensitivity of H, a new quality measure. The sensitivity of a vertex (or edge) given a hopset H is, informally, the number of times a single hop in G ∪ H bypasses it; a bit more formally, assuming shortest paths in G are unique, it is the number of hopset edges (s,t) ∈ H such that the vertex (or edge) is contained in the unique st-path in G having length exactly dist_G(s,t). The sensitivity associated with H is then the maximum sensitivity over all vertices (or edges). The highlights of our results are: - A construction for (Õ(√n), 0)-hopsets on undirected graphs with O(log n) sensitivity, complemented with a lower bound showing that Õ(√n) is tight up to polylogarithmic factors for any construction with polylogarithmic sensitivity. - A construction for (n^o(1), ε)-hopsets on undirected graphs with n^o(1) sensitivity for any ε > 0 that is at least inverse polylogarithmic, complemented with a lower bound on the tradeoff between β, ε, and the sensitivity. - We define a notion of sensitivity for β-shortcut sets (which are the reachability analogues of hopsets) and give a construction for Õ(√n)-shortcut sets on directed graphs with O(log n) sensitivity, complemented with a lower bound showing that β = Ω̃(n^{1/3}) for any construction with polylogarithmic sensitivity. We believe hopset sensitivity is a natural measure in and of itself, and could potentially find use in a diverse range of contexts. More concretely, the notion of hopset sensitivity is also directly motivated by the Differentially Private All Sets Range Queries problem [Deng et al. WADS 23]. Our result for O(log n) sensitivity (Õ(√n), 0)-hopsets on undirected graphs immediately improves the current best-known upper bound on utility from Õ(n^{1/3}) to Õ(n^{1/4}) in the pure-DP setting, which is tight up to polylogarithmic factors.
more » « less
Full Text Available
Hardness and Approximation Algorithms for Balanced Districting Problems

https://doi.org/10.4230/LIPIcs.FORC.2025.4

Dharangutte, Prathamesh; Gao, Jie; Huang, Shang-En; Yu, Fang-Yi (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Bun, Mark (Ed.)
We introduce and study the problem of balanced districting, where given an undirected graph with vertices carrying two types of weights (different population, resource types, etc) the goal is to maximize the total weights covered in vertex disjoint districts such that each district is a star or (in general) a connected induced subgraph with the two weights to be balanced. This problem is strongly motivated by political redistricting, where contiguity, population balance, and compactness are essential. We provide hardness and approximation algorithms for this problem. In particular, we show NP-hardness for an approximation better than n^{1/2-δ} for any constant δ > 0 in general graphs even when the districts are star graphs, as well as NP-hardness on complete graphs, tree graphs, planar graphs and other restricted settings. On the other hand, we develop an algorithm for balanced star districting that gives an O(√n)-approximation on any graph (which is basically tight considering matching hardness of approximation results), an O(log n) approximation on planar graphs with extensions to minor-free graphs. Our algorithm uses a modified Whack-a-Mole algorithm [Bhattacharya, Kiss, and Saranurak, SODA 2023] to find a sparse solution of a fractional packing linear program (despite exponentially many variables) which requires a new design of a separation oracle specific for our balanced districting problem. To turn the fractional solution to a feasible integer solution, we adopt the randomized rounding algorithm by [Chan and Har-Peled, SoCG 2009]. To get a good approximation ratio of the rounding procedure, a crucial element in the analysis is the balanced scattering separators for planar graphs and minor-free graphs - separators that can be partitioned into a small number of k-hop independent sets for some constant k - which may find independent interest in solving other packing style problems. Further, our algorithm is versatile - the very same algorithm can be analyzed in different ways on various graph classes, which leads to class-dependent approximation ratios. We also provide a FPTAS algorithm for complete graphs and tree graphs, as well as greedy algorithms and approximation ratios when the district cardinality is bounded, the graph has bounded degree or the weights are binary. We refer the readers to the full version of the paper for complete set of results and proofs.
more » « less
Full Text Available
Neuc-MDS: Non-Euclidean Multidimensional Scaling Through Bilinear Forms

Deng, Chengyuan; Gao, Jie; Lu, Kevin; Luo, Feng; Sun, Hongbin; Xin, Cheng (December 2024, NIPS '24: Proceedings of the 38th International Conference on Neural Information Processing Systems)

Full Text Available
Optimally Improving Cooperative Learning in a Social Setting

Haddadan, Shahrzad; Xin, Cheng; Gao, Jie (July 2024, Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024)

Full Text Available
Optimally Improving Cooperative Learning in a Social Setting

Haddadan, Shahrzad; Xin, Cheng; Gao, Jie (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Composite Active Learning: Towards Multi-Domain Active Learning with Theoretical Guarantees

https://doi.org/10.1609/AAAI.V38I11.29119

Hao, Guang-Yuan; Huang, Hengguan; Wang, Haotian; Gao, Jie; Wang, Hao (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Active learning (AL) aims to improve model performance within a fixed labeling budget by choosing the most informative data points to label. Existing AL focuses on the single-domain setting, where all data come from the same domain (e.g., the same dataset). However, many real-world tasks often involve multiple domains. For example, in visual recognition, it is often desirable to train an image classifier that works across different environments (e.g., different backgrounds), where images from each environment constitute one domain. Such a multi-domain AL setting is challenging for prior methods because they (1) ignore the similarity among different domains when assigning labeling budget and (2) fail to handle distribution shift of data across different domains. In this paper, we propose the first general method, dubbed composite active learning (CAL), for multi-domain AL. Our approach explicitly considers the domain-level and instance-level information in the problem; CAL first assigns domain-level budgets according to domain-level importance, which is estimated by optimizing an upper error bound that we develop; with the domain-level budgets, CAL then leverages a certain instance-level query strategy to select samples to label from each domain. Our theoretical analysis shows that our method achieves a better error bound compared to current AL methods. Our empirical results demonstrate that our approach significantly outperforms the state-of-the-art AL methods on both synthetic and real-world multi-domain datasets. Code is available at https://github.com/Wang-ML-Lab/multi-domain-active-learning.
more » « less
Full Text Available

« Prev Next »

Search for: All records