We study the sparsity of the solutions to systems of linear Diophantine equations with and without nonnegativity constraints. The sparsity of a solution vector is the number of its nonzero entries, which is referred to as the
Distributed Estimation with Multiple Samples per User: Sharp Rates and Phase Transition
We obtain tight minimax rates for the problem of distributed estimation of discrete distributions under communication constraints, where n users observing m samples each can broadcast only ℓ bits. Our main result is a tight characterization (up to logarithmic factors) of the error rate as a function of m, ℓ, the domain size, and the number of users under most regimes of interest.
more »
« less
 NSFPAR ID:
 10310529
 Date Published:
 Journal Name:
 Advances in neural information processing systems
 Volume:
 34
 ISSN:
 10495258
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract norm of the vector. Our main results are new improved bounds on the minimal$$\ell _0$$ ${\ell}_{0}$ norm of solutions to systems$$\ell _0$$ ${\ell}_{0}$ , where$$A\varvec{x}=\varvec{b}$$ $Ax=b$ ,$$A\in \mathbb {Z}^{m\times n}$$ $A\in {Z}^{m\times n}$ and$${\varvec{b}}\in \mathbb {Z}^m$$ $b\in {Z}^{m}$ is either a general integer vector (lattice case) or a nonnegative integer vector (semigroup case). In certain cases, we give polynomial time algorithms for computing solutions with$$\varvec{x}$$ $x$ norm satisfying the obtained bounds. We show that our bounds are tight. Our bounds can be seen as functions naturally generalizing the rank of a matrix over$$\ell _0$$ ${\ell}_{0}$ , to other subdomains such as$$\mathbb {R}$$ $R$ . We show that these new ranklike functions are all NPhard to compute in general, but polynomialtime computable for fixed number of variables.$$\mathbb {Z}$$ $Z$ 
A Noetherian local ring ( R , m ) (R,\frak {m}) is called Buchsbaum if the difference ℓ ( R / q ) − e ( q , R ) \ell (R/\mathfrak {q})e(\mathfrak {q}, R) , where q \mathfrak {q} is an ideal generated by a system of parameters, is a constant independent of q \mathfrak {q} . In this article, we study the tight closure analog of this condition. We prove that in an unmixed excellent local ring ( R , m ) (R,\frak {m}) of prime characteristic p > 0 p>0 and dimension at least one, the difference e ( q , R ) − ℓ ( R / q ∗ ) e(\mathfrak {q}, R)\ell (R/\mathfrak {q}^*) is independent of q \mathfrak {q} if and only if the parameter test ideal τ p a r ( R ) \tau _{\mathrm {par}}(R) contains m \frak {m} . We also provide a characterization of this condition via derived category which is analogous to Schenzel’s criterion for Buchsbaum rings.more » « less

We prove a new generalization bound that shows for any class of linear predictors in Gaussian space, the Rademacher complexity of the class and the training error under any continuous loss ℓ can control the test error under all Moreau envelopes of the loss ℓ . We use our finitesample bound to directly recover the “optimistic rate” of Zhou et al. (2021) for linear regression with the square loss, which is known to be tight for minimal ℓ2norm interpolation, but we also handle more general settings where the label is generated by a potentially misspecified multiindex model. The same argument can analyze noisy interpolation of maxmargin classifiers through the squared hinge loss, and establishes consistency results in spikedcovariance settings. More generally, when the loss is only assumed to be Lipschitz, our bound effectively improves Talagrand’s wellknown contraction lemma by a factor of two, and we prove uniform convergence of interpolators (Koehler et al. 2021) for all smooth, nonnegative losses. Finally, we show that application of our generalization bound using localized Gaussian width will generally be sharp for empirical risk minimizers, establishing a nonasymptotic Moreau envelope theorymore » « less

Ruiz, Francisco ; Dy, Jennifer ; van de Meent, JanWillem (Ed.)We study discrete distribution estimation under userlevel local differential privacy (LDP). In userlevel $\varepsilon$LDP, each user has $m\ge1$ samples and the privacy of all $m$ samples must be preserved simultaneously. We resolve the following dilemma: While on the one hand having more samples per user should provide more information about the underlying distribution, on the other hand, guaranteeing the privacy of all $m$ samples should make the estimation task more difficult. We obtain tight bounds for this problem under almost all parameter regimes. Perhaps surprisingly, we show that in suitable parameter regimes, having $m$ samples per user is equivalent to having $m$ times more users, each with only one sample. Our results demonstrate interesting phase transitions for $m$ and the privacy parameter $\varepsilon$ in the estimation risk. Finally, connecting with recent results on shuffled DP, we show that combined with random shuffling, our algorithm leads to optimal error guarantees (up to logarithmic factors) under the central model of userlevel DP in certain parameter regimes. We provide several simulations to verify our theoretical findings.more » « less

We study subtrajectory clustering under the Fréchet distance. Given one or more trajectories, the task is to split the trajectories into several parts, such that the parts have a good clustering structure. We approach this problem via a new set cover formulation, which we think provides a natural formalization of the problem as it is studied in many applications. Given a polygonal curve P with n vertices in fixed dimension, integers k, ℓ ≥ 1, and a real value Δ > 0, the goal is to find k center curves of complexity at most ℓ such that every point on P is covered by a subtrajectory that has small Fréchet distance to one of the k center curves (≤ Δ). In many application scenarios, one is interested in finding clusters of small complexity, which is controlled by the parameter ℓ. Our main result is a bicriterial approximation algorithm: if there exists a solution for given parameters k, ℓ, and Δ, then our algorithm finds a set of k' center curves of complexity at most ℓ with covering radius Δ' with k' in O(kℓ2 log (kℓ)), and Δ' ≤ 19Δ. Moreover, within these approximation bounds, we can minimize k while keeping the other parameters fixed. If ℓ is a constant independent of n, then, the approximation factor for the number of clusters k is O(log k) and the approximation factor for the radius Δ is constant. In this case, the algorithm has expected running time in Õ(km2 + mn) and uses space in O(n + m), where m=⌈L/Δ⌉ and L is the total arclength of the curve P.more » « less