skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on November 25, 2025

Title: Improved performance guarantees for Tukey's median
Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with the name of John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics, graph theory, and the study of elections and social choice. We present improved performance guarantees for empirical Tukey's median, a deepest point associated with the given sample, when the data-generating distribution is elliptically symmetric and possibly anisotropic. Some of our results remain valid in the wider class of affine equivariant estimators. As a corollary of our bounds, we show that the typical diameter of the set of all empirical Tukey's medians scales like $$o(n^{-1/2})$$ where $$n$$ is the sample size. Moreover, when the data are 2-dimensional, we prove that with high probability, the diameter is of order $$O(n^{-3/4}\log^{3/2}(n))$$.  more » « less
Award ID(s):
2045068
PAR ID:
10583954
Author(s) / Creator(s):
;
Publisher / Repository:
arXivorg
Date Published:
Journal Name:
arXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    The min-diameter of a directed graph G is a measure of the largest distance between nodes. It is equal to the maximum min-distance d_{min}(u,v) across all pairs u,v ∈ V(G), where d_{min}(u,v) = min(d(u,v), d(v,u)). Min-diameter approximation in directed graphs has attracted attention recently as an offshoot of the classical and well-studied diameter approximation problem. Our work provides a 3/2-approximation algorithm for min-diameter in DAGs running in time O(m^{1.426} n^{0.288}), and a faster almost-3/2-approximation variant which runs in time O(m^{0.713} n). (An almost-α-approximation algorithm determines the min-diameter to within a multiplicative factor of α plus constant additive error.) This is the first known algorithm to solve 3/2-approximation for min-diameter in sparse DAGs in truly subquadratic time O(m^{2-ε}) for ε > 0; previously only a 2-approximation was known. By a conditional lower bound result of [Abboud et al, SODA 2016], a better than 3/2-approximation can't be achieved in truly subquadratic time under the Strong Exponential Time Hypothesis (SETH), so our result is conditionally tight. We additionally obtain a new conditional lower bound for min-diameter approximation in general directed graphs, showing that under SETH, one cannot achieve an approximation factor below 2 in truly subquadratic time. Our work also presents the first study of approximating bichromatic min-diameter, which is the maximum min-distance between oppositely colored vertices in a 2-colored graph. We show that SETH implies that in DAGs, a better than 2 approximation cannot be achieved in truly subquadratic time, and that in general graphs, an approximation within a factor below 5/2 is similarly out of reach. We then obtain an O(m)-time algorithm which determines if bichromatic min-diameter is finite, and an almost-2-approximation algorithm for bichromatic min-diameter with runtime Õ(min(m^{4/3} n^{1/3}, m^{1/2} n^{3/2})). 
    more » « less
  2. Many machine learning problems optimize an objective that must be measured with noise. The primary method is a first order stochastic gradient descent using one or more Monte Carlo (MC) samples at each step. There are settings where ill-conditioning makes second order methods such as limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) more effective. We study the use of randomized quasi-Monte Carlo (RQMC) sampling for such problems. When MC sampling has a root mean squared error (RMSE) of O(n−1/2) then RQMC has an RMSE of o(n−1/2) that can be close to O(n−3/2) in favorable settings. We prove that improved sampling accuracy translates directly to improved optimization. In our empirical investigations for variational Bayes, using RQMC with stochastic quasi-Newton method greatly speeds up the optimization, and sometimes finds a better parameter value than MC does. 
    more » « less
  3. Guruswami, Venkatesan (Ed.)
    A fundamental problem in circuit complexity is to find explicit functions that require large depth to compute. When considering the natural DeMorgan basis of {OR,AND}, where negations incur no cost, the best known depth lower bounds for an explicit function in NP have the form (3-o(1))log₂ n, established by Håstad (building on others) in the early 1990s. We make progress on the problem of improving this factor of 3, in two different ways: - We consider an "algorithmic method" approach to proving stronger depth lower bounds for non-uniform circuits in the DeMorgan basis. We show that slightly faster algorithms (than what is known) for counting the number of satisfying assignments on subcubic-size DeMorgan formulas would imply supercubic-size DeMorgan formula lower bounds, implying that the depth must be at least (3+ε)log₂ n for some ε > 0. For example, if #SAT on formulas of size n^{2+2ε} can be solved in 2^{n - n^{1-ε}log^k n} time for some ε > 0 and a sufficiently large constant k, then there is a function computable in 2^{O(n)} time with a SAT oracle which does not have n^{3+ε}-size formulas. In fact, the #SAT algorithm only has to work on formulas that are a conjunction of n^{1-ε} subformulas, each of which is n^{1+3ε} size, in order to obtain the supercubic lower bound. As a proof of concept, we show that our new algorithms-to-lower-bounds connection can be applied to prove new lower bounds for "hybrid" DeMorgan formula models which compute interesting functions at their leaves. - Turning to the {NAND} basis, we establish a greater-than-(3 log₂ n) depth lower bound against uniform circuits solving the SAT problem, using an extension of the "indirect diagonalization" method for NAND formulas. Note that circuits over the NAND basis are a special case of circuits over the DeMorgan basis; however, hard functions such as Andreev’s function (known to require depth (3-o(1))log₂ n in the DeMorgan basis) can still be computed with NAND circuits of depth (3+o(1))log₂ n. Our results imply that SAT requires polylogtime-uniform NAND circuits of depth at least 3.603 log₂ n. 
    more » « less
  4. We present the first near-linear-time algorithm that computes a (1+ε)-approximation of the diameter of a weighted unit-disk graph of n vertices. Our algorithm requires O(n log^2 n) time for any constant ε>0, so we considerably improve upon the near-O(n^{3/2})-time algorithm of Gao and Zhang (2005). Using similar ideas we develop (1+ε)-approximate \emph{distance oracles} of O(1) query time with a likewise improvement in the preprocessing time, specifically from near O(n^{3/2}) to O(n log^3 n). We also obtain similar new results for a number of related problems in the weighted unit-disk graph metric such as the radius and the bichromatic closest pair. As a further application we employ our distance oracle, along with additional ideas, to solve the (1+ε)-approximate \emph{all-pairs bounded-leg shortest paths\/} (apBLSP) problem for a set of n planar points. Our data structure requires O(n^2 log n) space, O(loglog n) query time, and nearly O(n^{2.579}) preprocessing time for any constant ε>0, and is the first that breaks the near-cubic preprocessing time bound given by Roditty and Segal (2011). 
    more » « less
  5. It is known that the classical O(N) O ( N ) model in dimension d > 3 d gt; 3 at its bulk critical point admits three boundary universality classes:the ordinary, the extra-ordinary and the special. For the ordinarytransition the bulk and the boundary order simultaneously; theextra-ordinary fixed point corresponds to the bulk transition occurringin the presence of an ordered boundary, while the special fixed pointcorresponds to a boundary phase transition between the ordinary and theextra-ordinary classes. While the ordinary fixed point survives in d = 3 d = 3 ,it is less clear what happens to the extra-ordinary and special fixedpoints when d = 3 d = 3 and N \ge 2 N ≥ 2 .Here we show that formally treating N N as a continuous parameter, there exists a critical value N_c > 2 N c gt; 2 separating two distinct regimes. For 2 \leq N < N_c 2 ≤ N < N c the extra-ordinary fixed point survives in d = 3 d = 3 ,albeit in a modified form: the long-range boundary order is lost,instead, the order parameter correlation function decays as a power of \log r log r .For N > N_c N gt; N c there is no fixed point with order parameter correlations decayingslower than power law. We discuss several scenarios for the evolution ofthe phase diagram past N = N_c N = N c .Our findings appear to be consistent with recent Monte Carlo studies ofclassical models with N = 2 N = 2 and N = 3 N = 3 .We also compare our results to numerical studies of boundary criticalityin 2+1D quantum spin models. 
    more » « less