skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Editors contains: "Bun, Mark"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Bun, Mark (Ed.)
    Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the ε-contamination model, where an adversary can inspect and replace up to an ε-fraction of the samples, a fundamental open problem is determining the optimal rates for robust stochastic convex optimization (SCO) under such contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the ε-contamination model. Our approach improves over existing algorithms, which are not only suboptimal but also require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. By contrast, our optimal algorithms do not require these stringent assumptions, assuming only population-level smoothness of the loss. Moreover, our algorithms can be adapted to handle the case in which the covariance parameter is unknown, and can be extended to nonsmooth population risks via convolutional smoothing. We complement our algorithmic developments with a tight information-theoretic lower bound for robust SCO. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  2. Bun, Mark (Ed.)
    We introduce and study the problem of balanced districting, where given an undirected graph with vertices carrying two types of weights (different population, resource types, etc) the goal is to maximize the total weights covered in vertex disjoint districts such that each district is a star or (in general) a connected induced subgraph with the two weights to be balanced. This problem is strongly motivated by political redistricting, where contiguity, population balance, and compactness are essential. We provide hardness and approximation algorithms for this problem. In particular, we show NP-hardness for an approximation better than n^{1/2-δ} for any constant δ > 0 in general graphs even when the districts are star graphs, as well as NP-hardness on complete graphs, tree graphs, planar graphs and other restricted settings. On the other hand, we develop an algorithm for balanced star districting that gives an O(√n)-approximation on any graph (which is basically tight considering matching hardness of approximation results), an O(log n) approximation on planar graphs with extensions to minor-free graphs. Our algorithm uses a modified Whack-a-Mole algorithm [Bhattacharya, Kiss, and Saranurak, SODA 2023] to find a sparse solution of a fractional packing linear program (despite exponentially many variables) which requires a new design of a separation oracle specific for our balanced districting problem. To turn the fractional solution to a feasible integer solution, we adopt the randomized rounding algorithm by [Chan and Har-Peled, SoCG 2009]. To get a good approximation ratio of the rounding procedure, a crucial element in the analysis is the balanced scattering separators for planar graphs and minor-free graphs - separators that can be partitioned into a small number of k-hop independent sets for some constant k - which may find independent interest in solving other packing style problems. Further, our algorithm is versatile - the very same algorithm can be analyzed in different ways on various graph classes, which leads to class-dependent approximation ratios. We also provide a FPTAS algorithm for complete graphs and tree graphs, as well as greedy algorithms and approximation ratios when the district cardinality is bounded, the graph has bounded degree or the weights are binary. We refer the readers to the full version of the paper for complete set of results and proofs. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  3. Bun, Mark (Ed.)
    We consider the setting where a user with sensitive features wishes to obtain a recommendation from a server in a differentially private fashion. We propose a "multi-selection" architecture where the server can send back multiple recommendations and the user chooses one from these that matches best with their private features. When the user feature is one-dimensional - on an infinite line - and the accuracy measure is defined w.r.t some increasing function 𝔥(.) of the distance on the line, we precisely characterize the optimal mechanism that satisfies differential privacy. The specification of the optimal mechanism includes both the distribution of the noise that the user adds to its private value, and the algorithm used by the server to determine the set of results to send back as a response. We show that Laplace is an optimal noise distribution in this setting. Furthermore, we show that this optimal mechanism results in an error that is inversely proportional to the number of results returned when the function 𝔥(.) is the identity function. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  4. Bun, Mark (Ed.)
    We consider the problem of assigning students to schools when students have different utilities for schools and schools have limited capacities. The students belong to demographic groups, and fairness over these groups is captured either by concave objectives, or additional constraints on the utility of the groups. We present approximation algorithms for this assignment problem with group fairness via convex program rounding. These algorithms achieve various trade-offs between capacity violation and running time. We also show that our techniques easily extend to the setting where there are arbitrary constraints on the feasible assignment, capturing multi-criteria optimization. We present simulation results that demonstrate that the rounding methods are practical even on large problem instances, with the empirical capacity violation being much better than the theoretical bounds. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  5. Bun, Mark (Ed.)
    Given a differentially private unbiased estimate q̃ = q(D) +ν of a statistic q(D), we wish to obtain unbiased estimates of functions of q(D), such as 1/q(D), solely through post-processing of q̃, with no further access to the confidential dataset D. To this end, we adapt the deconvolution method used for unbiased estimation in the statistical literature, deriving unbiased estimators for a broad family of twice-differentiable functions - those that are tempered distributions - when the privacy-preserving noise ν is drawn from the Laplace distribution (Dwork et al., 2006). We further extend this technique to functions other than tempered distributions, deriving approximately optimal estimators that are unbiased for values in a user-specified interval (possibly extending to ± ∞). We use these results to derive an unbiased estimator for private means when the size n of the dataset is not publicly known. In a numerical application, we find that a mechanism that uses our estimator to return an unbiased sample size and mean outperforms a mechanism that instead uses the previously known unbiased privacy mechanism for such means (Kamath et al., 2023). We also apply our estimators to develop unbiased transformation mechanisms for per-record differential privacy, a privacy concept in which the privacy guarantee is a public function of a record’s value (Seeman et al., 2024). Our mechanisms provide stronger privacy guarantees than those in prior work (Finley et al., 2024) by using Laplace, rather than Gaussian, noise. Finally, using a different approach, we go beyond Laplace noise by deriving unbiased estimators for polynomials under the weak condition that the noise distribution has sufficiently many moments. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026
  6. Bun, Mark (Ed.)
    Predefined demographic groups often overlook the subpopulations most impacted by model errors, leading to a growing emphasis on data-driven methods that pinpoint where models underperform. The emerging field of multi-group fairness addresses this by ensuring models perform well across a wide range of group-defining functions, rather than relying on fixed demographic categories. We demonstrate that recently introduced notions of multi-group fairness can be equivalently formulated as integral probability metrics (IPM). IPMs are the common information-theoretic tool that underlie definitions such as multiaccuracy, multicalibration, and outcome indistinguishably. For multiaccuracy, this connection leads to a simple, yet powerful procedure for achieving multiaccuracy with respect to an infinite-dimensional class of functions defined by a reproducing kernel Hilbert space (RKHS): first perform a kernel regression of a model’s errors, then subtract the resulting function from a model’s predictions. We combine these results to develop a post-processing method that improves multiaccuracy with respect to bounded-norm functions in an RKHS, enjoys provable performance guarantees, and, in binary classification benchmarks, achieves favorable multiaccuracy relative to competing methods. 
    more » « less
    Free, publicly-accessible full text available January 1, 2026