skip to main content


Title: H-nobs: Achieving Certified Fairness and Robustness in Distributed Learning on Heterogeneous Datasets
Fairness and robustness are two important goals in the design of modern distributed learning systems. Despite a few prior works attempting to achieve both fairness and robustness, some key aspects of this direction remain underexplored. In this paper, we try to answer three largely unnoticed and unaddressed questions that are of paramount significance to this topic: (i) What makes jointly satisfying fairness and robustness difficult? (ii) Is it possible to establish theoretical guarantee for the dual property of fairness and robustness? (iii) How much does fairness have to sacrifice at the expense of robustness being incorporated into the system? To address these questions, we first identify data heterogeneity as the key difficulty of combining fairness and robustness. Accordingly, we propose a fair and robust framework called H-nobs which can offer certified fairness and robustness through the adoption of two key components, a fairness-promoting objective function and a simple robust aggregation scheme called norm-based screening (NBS). We explain in detail why NBS is the suitable scheme in our algorithm in contrast to other robust aggregation measures. In addition, we derive three convergence theorems for H-nobs in cases of the learning model being nonconvex, convex, and strongly convex respectively, which provide theoretical guarantees for both fairness and robustness. Further, we empirically investigate the influence of the robust mechanism (NBS) on the fairness performance of H-nobs, the very first attempt of such exploration.  more » « less
Award ID(s):
2231209
NSF-PAR ID:
10481825
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
NeurIPS 2023
Date Published:
Journal Name:
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
Format(s):
Medium: X
Location:
New Orleans
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In this paper we propose and analyze a finite difference numerical scheme for the Poisson-Nernst-Planck equation (PNP) system. To understand the energy structure of the PNP model, we make use of the Energetic Variational Approach (EnVarA), so that the PNP system could be reformulated as a non-constant mobility H − 1 H^{-1} gradient flow, with singular logarithmic energy potentials involved. To ensure the unique solvability and energy stability, the mobility function is explicitly treated, while both the logarithmic and the electric potential diffusion terms are treated implicitly, due to the convex nature of these two energy functional parts. The positivity-preserving property for both concentrations, n n and p p , is established at a theoretical level. This is based on the subtle fact that the singular nature of the logarithmic term around the value of 0 0 prevents the numerical solution reaching the singular value, so that the numerical scheme is always well-defined. In addition, an optimal rate convergence analysis is provided in this work, in which many highly non-standard estimates have to be involved, due to the nonlinear parabolic coefficients. The higher order asymptotic expansion (up to third order temporal accuracy and fourth order spatial accuracy), the rough error estimate (to establish the ℓ ∞ \ell ^\infty bound for n n and p p ), and the refined error estimate have to be carried out to accomplish such a convergence result. In our knowledge, this work will be the first to combine the following three theoretical properties for a numerical scheme for the PNP system: (i) unique solvability and positivity, (ii) energy stability, and (iii) optimal rate convergence. A few numerical results are also presented in this article, which demonstrates the robustness of the proposed numerical scheme. 
    more » « less
  2. Cutting-edge machine learning techniques often require millions of labeled data objects to train a robust model. Because relying on humans to supply such a huge number of labels is rarely practical, automated methods for label generation are needed. Unfortunately, critical challenges in auto-labeling remain unsolved, including the following research questions: (1) which objects to ask humans to label, (2) how to automatically propagate labels to other objects, and (3) when to stop labeling. These three questions are not only each challenging in their own right, but they also correspond to tightly interdependent problems. Yet existing techniques provide at best isolated solutions to a subset of these challenges. In this work, we propose the first approach, called LANCET, that successfully addresses all three challenges in an integrated framework. LANCET is based on a theoretical foundation characterizing the properties that the labeled dataset must satisfy to train an effective prediction model, namely the Covariate-shift and the Continuity conditions. First, guided by the Covariate-shift condition, LANCET maps raw input data into a semantic feature space, where an unlabeled object is expected to share the same label with its near-by labeled neighbor. Next, guided by the Continuity condition, LANCET selects objects for labeling, aiming to ensure that unlabeled objects always have some sufficiently close labeled neighbors. These two strategies jointly maximize the accuracy of the automatically produced labels and the prediction accuracy of the machine learning models trained on these labels. Lastly, LANCET uses a distribution matching network to verify whether both the Covariate-shift and Continuity conditions hold, in which case it would be safe to terminate the labeling process. Our experiments on diverse public data sets demonstrate that LANCET consistently outperforms the state-of-the-art methods from Snuba to GOGGLES and other baselines by a large margin - up to 30 percentage points increase in accuracy. 
    more » « less
  3. Data aggregation is a key primitive in wireless sensor networks and refers to the process in which the sensed data are processed and aggregated en-route by intermediate sensor nodes. Since sensor nodes are commonly resource constrained, they may be compromised by attackers and instructed to launch various attacks. Despite the rich literature on secure data aggregation, most of the prior work focuses on detecting intermediate nodes from modifying partial aggregation results with two security challenges remaining. First, a compromised sensor node can report arbitrary reading of its own, which is fundamentally difficult to detect but widely considered to have limited impact on the final aggregation result. Second, a compromised sensor node can repeatedly attack the aggregation process to prevent the base station from receiving correct aggregation results, leading to a special form of Denial-of-Service attack. VMAT [1] (published in ICDCS 2011) is a representative secure data aggregation scheme with the capability of pinpointing and revoking compromised sensor nodes, which relies on a secure MIN aggregation scheme and converts other additive aggregation functions such as SUM and COUNT to MIN aggregations. In this paper, we introduce a novel enumeration attack against VMAT to highlight the security vulnerability of a sensor node reporting an arbitrary reading of its own. The enumeration attack allows a single compromised sensor node to significantly inflate the final aggregation result without being detected. As a countermeasure, we also introduce an effective defense against the enumeration attack. Theoretical analysis and simulation studies confirm the severe impact of the enumeration attack and the effectiveness of the countermeasure. 
    more » « less
  4. While implicit feedback (e.g., clicks, dwell times, etc.) is an abundant and attractive source of data for learning to rank, it can produce unfair ranking policies for both exogenous and endogenous reasons. Exogenous reasons typically manifest themselves as biases in the training data, which then get reflected in the learned ranking policy and often lead to rich-get-richer dynamics. Moreover, even after the correction of such biases, reasons endogenous to the design of the learning algorithm can still lead to ranking policies that do not allocate exposure among items in a fair way. To address both exogenous and endogenous sources of unfairness, we present the first learning-to-rank approach that addresses both presentation bias and merit-based fairness of exposure simultaneously. Specifically, we define a class of amortized fairness-of-exposure constraints that can be chosen based on the needs of an application, and we show how these fairness criteria can be enforced despite the selection biases in implicit feedback data. The key result is an efficient and flexible policy-gradient algorithm, called FULTR, which is the first to enable the use of counterfactual estimators for both utility estimation and fairness constraints. Beyond the theoretical justification of the framework, we show empirically that the proposed algorithm can learn accurate and fair ranking policies from biased and noisy feedback. 
    more » « less
  5. A robust multi-functional framework for widespread planning of nature-based solutions (NBS) must incorporate components of social equity and hydro-environmental performance in a cost-effective manner. NBS systems address stormwater mitigation by increasing on-site infiltration and evaporation through enhanced greenspace while also improving various components of societal well-being, such as physical health (e.g., heart disease, diabetes), mental health (e.g., post-traumatic stress disorder, depression), and social cohesion. However, current optimization tools for NBS systems rely on stormwater quantity abatement and, to a lesser extent, economic costs and environmental pollutant mitigation. Therefore, the objective of this study is to explore how NBS planning may be improved to maximize hydrological, environmental, and social co-benefits in an unequivocal and equitable manner. Here, a novel equity-based indexing framework is proposed to better understand how we might optimize social and physical functionalities of NBS systems as a function of transdisciplinary characteristics. Specifically, this study explores the spatial tradeoffs associated with NBS allocation by first optimizing a local watershed-scale model according to traditional metrics of stormwater efficacy (e.g., cost efficiency, hydrological runoff reduction, and pollutant load reduction) using SWMM modeling. The statistical dispersion of social health is then identified using the Area Deprivation Index (ADI), which is a high-resolution spatial account of socioeconomic disadvantages that have been linked to adverse health outcomes, according to United States census properties. As NBSs have been shown to mitigate various adverse health conditions through increased urban greening, this improved understanding of geospatial health characteristics may be leveraged to inform an explicit representation of social wellness within NBS planning frameworks. This study presents and demonstrates a novel framework for integrating hydro-environmental modeling, economic efficiency, and social health deprivation using a dimensionless Gini coefficient, which is intended to spur the positive connection of social and physical influences within robust NBS planning. Hydro-environmental risk (according to hydro-dynamic modeling) and social disparity (according to ADI distribution) are combined within a common measurement unit to capture variation across spatial domains and to optimize fair distribution across the study area. A comparison between traditional SWMM-based optimization and the proposed Gini-based framework reveals how the spatial allocation of NBSs within the watershed may be structured to address significantly more areas of social health deprivation while achieving similar hydro-environmental performance and cost-efficiency. The results of a case study for NBS planning in the White Oak Bayou watershed in Houston, Texas, USA revealed runoff volume reductions of 3.45% and 3.38%, pollutant load reductions of 11.15% and 11.28%, and ADI mitigation metrics of 16.84% and 35.32% for the SWMM-based and the Gini-based approaches, respectively, according to similar cost expenditures. As such, the proposed framework enables an analytical approach for balancing the spatial tradeoffs of overlapping human-water goals in NBS planning while maintaining hydro-environmental robustness and economic efficiency. 
    more » « less