Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Free, publiclyaccessible full text available June 20, 2023

We study the problem of fair kmedian where each cluster is required to have a fair representation of individuals from different groups. In the fair representation kmedian problem, we are given a set of points X in a metric space. Each point x ∈ X belongs to one of ℓ groups. Further, we are given fair representation parameters αj and β_j for each group j ∈ [ℓ]. We say that a kclustering C_1, ⋅⋅⋅, C_k fairly represents all groups if the number of points from group j in cluster C_i is between α_j C_i and β_j C_i for every j ∈ [ℓ] and i ∈ [k]. The goal is to find a set of k centers and an assignment such that the clustering defined by fairly represents all groups and minimizes the ℓ_1objective ∑_{x ∈ X} d(x, ϕ(x)). We present an O(log k)approximation algorithm that runs in time n^{O(ℓ)}. Note that the known algorithms for the problem either (i) violate the fairness constraints by an additive term or (ii) run in time that is exponential in both k and ℓ. We also consider an important special case of the problem where and for all j ∈ [ℓ]. For this special case,more »Free, publiclyaccessible full text available June 20, 2023

We investigate the capacity control provided by dropout in various machine learning problems. First, we study dropout for matrix completion, where it induces a distributiondependent regularizer that equals the weighted tracenorm of the product of the factors. In deep learning, we show that the distributiondependent regularizer due to dropout directly controls the Rademacher complexity of the underlying class of deep neural networks. These developments enable us to give concrete generalization error bounds for the dropout algorithm in both matrix completion as well as training deep neural networks.

We present and analyze a momentumbased gradient method for training linear classifiers with an exponentiallytailed loss (eg, the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of O (1/t^ 2). This contrasts with a rate of O (1/log (t)) for standard gradient descent, and O (1/t) for normalized gradient descent. The momentumbased method is derived via the convex dual of the maximummargin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive nonuniform sampling via the dual variables.

Buchin, Kevin ; Colin de Verdiere, Eric (Ed.)In this paper, we prove a twosided variant of the Kirszbraun theorem. Consider an arbitrary subset X of Euclidean space and its superset Y. Let f be a 1Lipschitz map from X to ℝ^m. The Kirszbraun theorem states that the map f can be extended to a 1Lipschitz map ̃ f from Y to ℝ^m. While the extension ̃ f does not increase distances between points, there is no guarantee that it does not decrease distances significantly. In fact, ̃ f may even map distinct points to the same point (that is, it can infinitely decrease some distances). However, we prove that there exists a (1 + ε)Lipschitz outer extension f̃:Y → ℝ^{m'} that does not decrease distances more than "necessary". Namely, ‖f̃(x)  f̃(y)‖ ≥ c √{ε} min(‖xy‖, inf_{a,b ∈ X} (‖x  a‖ + ‖f(a)  f(b)‖ + ‖by‖)) for some absolutely constant c > 0. This bound is asymptotically optimal, since no LLipschitz extension g can have ‖g(x)  g(y)‖ > L min(‖xy‖, inf_{a,b ∈ X} (‖x  a‖ + ‖f(a)  f(b)‖ + ‖by‖)) even for a single pair of points x and y. In some applications, one is interested in the distances ‖f̃(x)  f̃(y)‖more »

In the Correlation Clustering problem, we are given a complete weighted graph G with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering C of graph G, a similar edge is in disagreement with C, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with C if its endpoints belong to the same cluster. The disagreements vector is a vector indexed by the vertices of G such that the vth coordinate of the disagreements vector equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1. We study the ℓ_p objective in Correlation Clustering under the following assumption: Every similar edge has weight in [αw,w] and every dissimilar edge has weight at least αw (where α ≤ 1 and w > 0 is a scaling parameter). We give an O((1/α)^{1/2−1/(2p)} log 1/α) approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.

Meila, Marina ; Zhang, Tong (Ed.)In the Correlation Clustering problem, we are given a complete weighted graph $G$ with its edges labeled as “similar" and “dissimilar" by a noisy binary classifier. For a clustering $\mathcal{C}$ of graph $G$, a similar edge is in disagreement with $\mathcal{C}$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\mathcal{C}$ if its endpoints belong to the same cluster. The disagreements vector, $\mathbf{disagree}$, is a vector indexed by the vertices of $G$ such that the $v$th coordinate $\mathbf{disagree}_v$ equals the weight of all disagreeing edges incident on $v$. The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$. We study the $\ell_p$ objective in Correlation Clustering under the following assumption: Every similar edge has weight in $[\alpha\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $\alpha\mathbf{w}$ (where $\alpha \leq 1$ and $\mathbf{w}>0$ is a scaling parameter). We give an $O\left((\frac{1}{\alpha})^{\frac{1}{2}\frac{1}{2p}}\cdot \log\frac{1}{\alpha}\right)$ approximation algorithm for this problem. Furthermore, we show an almost matching convex programming integrality gap.

Belkin, Mikhail ; Kpotufe, Samor (Ed.)We present an $e^{O(p)} (\log \ell) / (\log \log \ell)$approximation algorithm for socially fair clustering with the $\ell_p$objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$medians, $k$means, or, more generally, $\ell_p$clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group } j} d(u, C)^p$. The socially fair clustering problem was independently proposed by Abbasi, Bhaskara, and Venkatasubramanian (2021) and Ghadiri, Samadi, and Vempala (2021). Our algorithm improves and generalizes their $O(\ell)$approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $\Omega(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $\Theta((\log \ell) / (\log \log \ell))$ for a fixed p. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. (2021).