Given a set of facilities and clients, and costs to open facilities, the classic facility location problem seeks to open a set of facilities and assign each client to one open facility to minimize the cost of opening the chosen facilities and the total distance of the clients to their assigned open facilities. Such an objective may induce an unequal cost over certain socioeconomic groups of clients (i.e., total distance traveled by clients in such a group). This is important when planning the location of socially relevant facilities such as emergency rooms. In this work, we consider a fair version of the problem where we are given π clients groups that partition the set of clients, and the distance of a given group is defined as the average distance of clients in the group to their respective open facilities. The objective is to minimize the Minkowski π-norm of vector of group distances, to penalize high access costs to open facilities across π groups of clients. This generalizes classic facility location (π = 1) and the minimization of the maximum group distance (π = β). However, in practice, fairness criteria may not be explicit or even known to a decision maker, and it is often unclear how to select a specific "π" to model the cost of unfairness. To get around this, we study the notion of solution portfolios where for a fixed problem instance, we seek a small portfolio of solutions such that for any Minkowski norm π, one of these solutions is an π(1)-approximation. Using the geometric relationship between various π-norms, we show the existence of a portfolio of cardinality π(log π), and a lower bound of (\sqrt{log r}). There may not be common structure across different solutions in this portfolio, which can make planning difficult if the notion of fairness changes over time or if the budget to open facilities is disbursed over time. For example, small changes in π could lead to a completely different set of open facilities in the portfolio. Inspired by this, we introduce the notion of refinement, which is a family of solutions for each π-norm satisfying a combinatorial property. This property requires that (1) the set of facilities open for a higher π-norm must be a subset of the facilities open for a lower π-norm, and (2) all clients assigned to an open facility for a lower π-norm must be assigned to the same open facility for any higher π-norm. A refinement is πΌ-approximate if the solution for each π-norm problem is an πΌ-approximation for it. We show that it is sufficient to consider only π(log π) norms instead of all π-norms, π β [1, β] to construct refinements. A natural greedy algorithm for the problem gives a poly(π)-approximate refinement, which we improve to poly(r^1/\sqrt{log π})-approximate using a recursive algorithm. We improve this ratio to π(log π) for the special case of tree metric for uniform facility open cost. Our recursive algorithm extends to other settings, including to a hierarchical facility location problem that models facility location problems at several levels, such as public works departments and schools. A full version of this paper can be found at https://arxiv.org/abs/2211.14873. 
                        more » 
                        « less   
                    
                            
                            Sample Complexity of Adversarially Robust Linear Classification on Separated Data
                        
                    
    
            We consider the sample complexity of learning with adversarial robustness. Most prior theoretical results for this problem have considered a setting where different classes in the data are close together or overlapping. We consider, in contrast, the well-separated case where there exists a classifier with perfect accuracy and robustness, and show that the sample complexity narrates an entirely different story. Specifically, for linear classifiers, we show a large class of well-separated distributions where the expected robust loss of any algorithm is at least Ξ©(ππ), whereas the max margin algorithm has expected standard loss π(1π). This shows a gap in the standard and robust losses that cannot be obtained via prior techniques. Additionally, we present an algorithm that, given an instance where the robustness radius is much smaller than the gap between the classes, gives a solution with expected robust loss is π(1π). This shows that for very well-separated data, convergence rates of π(1π) are achievable, which is not the case otherwise. Our results apply to robustness measured in any βπ norm with π>1 (including π=β). 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1804829
- PAR ID:
- 10282040
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- Issue:
- 139
- ISSN:
- 2640-3498
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We consider the problem of spherical Gaussian Mixture models with πβ₯3 components when the components are well separated. A fundamental previous result established that separation of Ξ©(logπβΎβΎβΎβΎβΎβ) is necessary and sufficient for identifiability of the parameters with \textit{polynomial} sample complexity (Regev and Vijayaraghavan, 2017). In the same context, we show that πΜ (ππ/π2) samples suffice for any πβ²1/π, closing the gap from polynomial to linear, and thus giving the first optimal sample upper bound for the parameter estimation of well-separated Gaussian mixtures. We accomplish this by proving a new result for the Expectation-Maximization (EM) algorithm: we show that EM converges locally, under separation Ξ©(logπβΎβΎβΎβΎβΎβ). The previous best-known guarantee required Ξ©(πβΎβΎβ) separation (Yan, et al., 2017). Unlike prior work, our results do not assume or use prior knowledge of the (potentially different) mixing weights or variances of the Gaussian components. Furthermore, our results show that the finite-sample error of EM does not depend on non-universal quantities such as pairwise distances between means of Gaussian components.more » « less
- 
            We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) ββ robustness against several strong attacks via adversarial training and (ii) certified β2 and ββ robustness via randomized smoothing. On SVHN, adding the dataset's own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.more » « less
- 
            We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution p, extensive research has established optimal bounds for uniformity testing, identity testing (goodness of fit), and closeness testing (equivalence or two-sample testing). We explore these problems in a setting where a predicted data distribution, possibly derived from historical data or predictive machine learning models, is available. We demonstrate that such a predictor can indeed reduce the number of samples required for all three property testing tasks. The reduction in sample complexity depends directly on the predictorβs quality, measured by its total variation distance from p. A key advantage of our algorithms is their adaptability to the precision of the prediction. Specifically, our algorithms can self-adjust their sample complexity based on the accuracy of the available prediction, operating without any prior knowledge of the estimationβs accuracy (i.e. they are consistent). Additionally, we never use more samples than the standard approaches require, even if the predictions provide no meaningful information (i.e. they are also robust). We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal. Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach.more » « less
- 
            We consider sketched approximate matrix multiplication and ridge regression in the novel setting of localized sketching, where at any given point, only part of the data matrix is available. This corresponds to a block diagonal structure on the sketching matrix. We show that, under mild conditions, block diagonal sketching matrices require only π(\sr/π2) and π(\sdπ/π) total sample complexity for matrix multiplication and ridge regression, respectively. This matches the state-of-the-art bounds that are obtained using global sketching matrices. The localized nature of sketching considered allows for different parts of the data matrix to be sketched independently and hence is more amenable to computation in distributed and streaming settings and results in a smaller memory and computational footprint.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    