Given a set of facilities and clients, and costs to open facilities, the classic facility location problem seeks to
open a set of facilities and assign each client to one open facility to minimize the cost of opening the chosen
facilities and the total distance of the clients to their assigned open facilities. Such an objective may induce an
unequal cost over certain socioeconomic groups of clients (i.e., total distance traveled by clients in such a
group). This is important when planning the location of socially relevant facilities such as emergency rooms.
In this work, we consider a fair version of the problem where we are given π clients groups that partition
the set of clients, and the distance of a given group is defined as the average distance of clients in the group
to their respective open facilities. The objective is to minimize the Minkowski π-norm of vector of group
distances, to penalize high access costs to open facilities across π groups of clients. This generalizes classic
facility location (π = 1) and the minimization of the maximum group distance (π = β). However, in practice,
fairness criteria may not be explicit or even known to a decision maker, and it is often unclear how to select a
specific "π" to model the cost of unfairness. To get around this, we study the notion of solution portfolios where
for a fixed problem instance, we seek a small portfolio of solutions such that for any Minkowski norm π, one
of these solutions is an π(1)-approximation. Using the geometric relationship between various π-norms, we
show the existence of a portfolio of cardinality π(log π), and a lower bound of (\sqrt{log r}).
There may not be common structure across different solutions in this portfolio, which can make planning
difficult if the notion of fairness changes over time or if the budget to open facilities is disbursed over time. For
example, small changes in π could lead to a completely different set of open facilities in the portfolio. Inspired
by this, we introduce the notion of refinement, which is a family of solutions for each π-norm satisfying a
combinatorial property. This property requires that (1) the set of facilities open for a higher π-norm must be
a subset of the facilities open for a lower π-norm, and (2) all clients assigned to an open facility for a lower
π-norm must be assigned to the same open facility for any higher π-norm. A refinement is πΌ-approximate if
the solution for each π-norm problem is an πΌ-approximation for it. We show that it is sufficient to consider
only π(log π) norms instead of all π-norms, π β [1, β] to construct refinements. A natural greedy algorithm
for the problem gives a poly(π)-approximate refinement, which we improve to poly(r^1/\sqrt{log π})-approximate
using a recursive algorithm. We improve this ratio to π(log π) for the special case of tree metric for uniform
facility open cost. Our recursive algorithm extends to other settings, including to a hierarchical facility location
problem that models facility location problems at several levels, such as public works departments and schools.
A full version of this paper can be found at https://arxiv.org/abs/2211.14873.
more »
« less
Sample Complexity of Adversarially Robust Linear Classification on Separated Data
We consider the sample complexity of learning with adversarial robustness. Most prior theoretical results for this problem have considered a setting where different classes in the data are close together or overlapping. We consider, in contrast, the well-separated case where there exists a classifier with perfect accuracy and robustness, and show that the sample complexity narrates an entirely different story. Specifically, for linear classifiers, we show a large class of well-separated distributions where the expected robust loss of any algorithm is at least Ξ©(ππ), whereas the max margin algorithm has expected standard loss π(1π). This shows a gap in the standard and robust losses that cannot be obtained via prior techniques. Additionally, we present an algorithm that, given an instance where the robustness radius is much smaller than the gap between the classes, gives a solution with expected robust loss is π(1π). This shows that for very well-separated data, convergence rates of π(1π) are achievable, which is not the case otherwise. Our results apply to robustness measured in any βπ norm with π>1 (including π=β).
more »
« less
- Award ID(s):
- 1804829
- NSF-PAR ID:
- 10282040
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Issue:
- 139
- ISSN:
- 2640-3498
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider the problem of spherical Gaussian Mixture models with πβ₯3 components when the components are well separated. A fundamental previous result established that separation of Ξ©(logπβΎβΎβΎβΎβΎβ) is necessary and sufficient for identifiability of the parameters with \textit{polynomial} sample complexity (Regev and Vijayaraghavan, 2017). In the same context, we show that πΜ (ππ/π2) samples suffice for any πβ²1/π, closing the gap from polynomial to linear, and thus giving the first optimal sample upper bound for the parameter estimation of well-separated Gaussian mixtures. We accomplish this by proving a new result for the Expectation-Maximization (EM) algorithm: we show that EM converges locally, under separation Ξ©(logπβΎβΎβΎβΎβΎβ). The previous best-known guarantee required Ξ©(πβΎβΎβ) separation (Yan, et al., 2017). Unlike prior work, our results do not assume or use prior knowledge of the (potentially different) mixing weights or variances of the Gaussian components. Furthermore, our results show that the finite-sample error of EM does not depend on non-universal quantities such as pairwise distances between means of Gaussian components.more » « less
-
null (Ed.)This paper revisits the k-mismatch shortest unique substring finding problem and demonstrates that a technique recently presented in the context of solving the k-mismatch average common substring problem can be adapted and combined with parts of the existing solution, resulting in a new algorithm which has expected time complexity of O(n log^k n), while maintaining a practical space complexity at O(kn), where n is the string length. When , which is the hard case, our new proposal significantly improves the any-case O(n^2) time complexity of the prior best method for k-mismatch shortest unique substring finding. Experimental study shows that our new algorithm is practical to implement and demonstrates significant improvements in processing time compared to the prior best solution's implementation when k is small relative to n. For example, our method processes a 200 KB sample DNA sequence with k=1 in just 0.18 seconds compared to 174.37 seconds with the prior best solution. Further, it is observed that significant portions of the adapted technique can be executed in parallel, using two different simple concurrency models, resulting in further significant practical performance improvement. As an example, when using 8 cores, the parallel implementations both achieved processing times that are less than 1/4 of the serial implementation's time cost, when processing a 10 MB sample DNA sequence with k=2. In an age where instances with thousands of gigabytes of RAM are readily available for use through Cloud infrastructure providers, it is likely that the trade-off of additional memory usage for significantly improved processing times will be desirable and needed by many users. For example, the best prior solution may spend years to finish a DNA sample of 200MB for any , while this new proposal, using 24 cores, can finish processing a sample of this size with k=1 in 206.376 seconds with a peak memory usage of 46 GB, which is both easily available and affordable on Cloud. It is expected that this new efficient and practical algorithm for k-mismatch shortest unique substring finding will prove useful to those using the measure on long sequences in fields such as computational biology. We also give a theoretical bound that the k-mismatch shortest unique substring finding problem can be solved using O(n log^k n) time and O(n) space, asymptotically much better than the one we implemented, serving as a new discovery of interest.more » « less
-
Data augmentation by incorporating cheap unlabeled data from multiple domains is a powerful way to improve prediction especially when there is limited labeled data. In this work, we investigate how adversarial robustness can be enhanced by leveraging out-of-domain unlabeled data. We demonstrate that for broad classes of distributions and classifiers, there exists a sample complexity gap between standard and robust classification. We quantify the extent to which this gap can be bridged by leveraging unlabeled samples from a shifted domain by providing both upper and lower bounds. Moreover, we show settings where we achieve better adversarial robustness when the unlabeled data come from a shifted domain rather than the same domain as the labeled data. We also investigate how to leverage out-of-domain data when some structural information, such as sparsity, is shared between labeled and unlabeled domains. Experimentally, we augment object recognition datasets (CIFAR-10, CINIC-10, and SVHN) with easy-to-obtain and unlabeled out-of-domain data and demonstrate substantial improvement in the modelβs robustness against l_infty adversarial attacks on the original domain.more » « less
-
null (Ed.)We present novel information-theoretic limits on detecting sparse changes in Isingmodels, a problem that arises in many applications where network changes can occur due to some external stimuli. We show that the sample complexity for detecting sparse changes, in a minimax sense, is no better than learning the entire model even in settings with local sparsity. This is a surprising fact in light of prior work rooted in sparse recovery methods, which suggest that sample complexity in this context scales only with the number of network changes. To shed light on when change detection is easier than structured learning, we consider testing of edge deletion in forest-structured graphs, and high-temperature ferromagnets as case studies. We show for these that testing of small changes is similarly hard, but testing of large changes is well-separated from structure learning. These results imply that testing of graphical models may not be amenable to concepts such as restricted strong convexity leveraged for sparsity pattern recovery, and algorithm development instead should be directed towards detection of large changes.more » « less