skip to main content

Search for: All records

Creators/Authors contains: "Hopkins, Max"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The notion of replicable algorithms was introduced by Impagliazzo, Lei, Pitassi, and Sorrell (STOC’22) to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of δ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions. 
    more » « less
  2. Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) empirical risk minimization (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes (Montasser et al., 2019a), a bound we show extends even to `nice' settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called tolerant robust learning (Ashtiani et al., 2022) where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for γ-tolerant robust learning VC classes over ℝd, and requires only Õ (VC(H)dlogDγδϵ2) samples for robustness regions of (maximum) diameter D. 
    more » « less
  3. Higher order random walks (HD-walks) on high dimensional expanders (HDX) have seen an incredible amount of study and application since their introduction by Kaufman and Mass (ITCS 2016), yet their broader combinatorial and spectral properties remain poorly understood. We develop a combinatorial characterization of the spectral structure of HD-walks on two-sided local-spectral expanders (Dinur and Kaufman FOCS 2017), which offer a broad generalization of the well-studied Johnson and Grassmann graphs. Our characterization, which shows that the spectra of HD-walks lie tightly concentrated in a few combinatorially structured strips, leads to novel structural theorems such as a tight ℓ2-characterization of edge-expansion, as well as to a new understanding of local-to-global graph algorithms on HDX. Towards the latter, we introduce a novel spectral complexity measure called Stripped Threshold Rank, and show how it can replace the (much larger) threshold rank as a parameter controlling the performance of algorithms on structured objects. Combined with a sum-of-squares proof for the former ℓ2-characterization, we give a concrete application of this framework to algorithms for unique games on HD-walks, where in many cases we improve the state of the art (Barak, Raghavendra, and Steurer FOCS 2011, and Arora, Barak, and Steurer JACM 2015) from nearly-exponential to polynomial time (e.g. for sparsifications of Johnson graphs or of slices of the q-ary hypercube). Our characterization of expansion also holds an interesting connection to hardness of approximation, where an ℓ∞-variant for the Grassmann graphs was recently used to resolve the 2-2 Games Conjecture (Khot, Minzer, and Safra FOCS 2018). We give a reduction from a related ℓ∞-variant to our ℓ2-characterization, but it loses factors in the regime of interest for hardness where the gap between ℓ2 and ℓ∞ structure is large. Nevertheless, our results open the door for further work on the use of HDX in hardness of approximation and their general relation to unique games. 
    more » « less
  4. ABSTRACT We present a novel technique for cosmic microwave background (CMB) foreground subtraction based on the framework of blind source separation. Inspired by previous work incorporating local variation to generalized morphological component analysis (GMCA), we introduce hierarchical GMCA (HGMCA), a Bayesian hierarchical graphical model for source separation. We test our method on Nside = 256 simulated sky maps that include dust, synchrotron, free–free, and anomalous microwave emission, and show that HGMCA reduces foreground contamination by $25{{\ \rm per\ cent}}$ over GMCA in both the regions included and excluded by the Planck UT78 mask, decreases the error in the measurement of the CMB temperature power spectrum to the 0.02–0.03 per cent level at ℓ > 200 (and $\lt 0.26{{\ \rm per\ cent}}$ for all ℓ), and reduces correlation to all the foregrounds. We find equivalent or improved performance when compared to state-of-the-art internal linear combination type algorithms on these simulations, suggesting that HGMCA may be a competitive alternative to foreground separation techniques previously applied to observed CMB data. Additionally, we show that our performance does not suffer when we perturb model parameters or alter the CMB realization, which suggests that our algorithm generalizes well beyond our simplified simulations. Our results open a new avenue for constructing CMB maps through Bayesian hierarchical analysis. 
    more » « less