skip to main content


Title: Strategyproof Linear Regression in High Dimensions
This paper is part of an emerging line of work at the intersection of machine learning and mechanism design, which aims to avoid noise in training data by correctly aligning the incentives of data sources. Specifically, we focus on the ubiquitous problem of linear regression, where strategyproof mechanisms have previously been identified in two dimensions. In our setting, agents have single-peaked preferences and can manipulate only their response variables. Our main contribution is the discovery of a family of group strategyproof linear regression mechanisms in any number of dimensions, which we call generalized resistant hyperplane mechanisms. The game-theoretic properties of these mechanisms --- and, in fact, their very existence --- are established through a connection to a discrete version of the Ham Sandwich Theorem.  more » « less
Award ID(s):
1718549
PAR ID:
10075968
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 2018 ACM Conference on Economics and Computation
Page Range / eLocation ID:
9-26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the group-fair obnoxious facility location problems from the mechanism design perspective where agents belong to different groups and have private location preferences on the undesirable locations of the facility. Our main goal is to design strategyproof mechanisms that elicit the true location preferences from the agents and determine a facility location that approximately optimizes several group-fair objectives. We first consider the maximum total and average group cost (group-fair) objectives. For these objectives, we propose deterministic mechanisms that achieve 3-approximation ratios and provide matching lower bounds. We then provide the characterization of 2-candidate strategyproof randomized mechanisms. Leveraging the characterization, we design randomized mechanisms with improved approximation ratios of 2 for both objectives. We also provide randomized lower bounds of 5/4 for both objectives. Moreover, we investigate intergroup and intragroup fairness (IIF) objectives, addressing fairness between groups and within each group. We present a mechanism that achieves a 4-approximation for the IIF objectives and provide tight lower bounds.

     
    more » « less
  2. We propose a new architecture to approximately learn incentive compatible, revenue-maximizing auctions from sampled valuations. Our architecture uses the Sinkhorn algorithm to perform a differentiable bipartite matching which allows the network to learn strategyproof revenue-maximizing mechanisms in settings not learnable by the previous RegretNet architecture. In particular, our architecture is able to learn mechanisms in settings without free disposal where each bidder must be allocated exactly some number of items. In experiments, we show our approach successfully recovers multiple known optimal mechanisms and high-revenue, low-regret mechanisms in larger settings where the optimal mechanism is unknown. 
    more » « less
  3. In this work, we introduce an alternative model for the design and analysis of strategyproof mechanisms that is motivated by the recent surge of work in “learning-augmented algorithms.” Aiming to complement the traditional worst-case analysis approach in computer science, this line of work has focused on the design and analysis of algorithms that are enhanced with machine-learned predictions. The algorithms can use the predictions as a guide to inform their decisions, aiming to achieve much stronger performance guarantees when these predictions are accurate (consistency), while also maintaining near-optimal worst-case guarantees, even if these predictions are inaccurate (robustness). We initiate the design and analysis of strategyproof mechanisms that are augmented with predictions regarding the private information of the participating agents. To exhibit the important benefits of this approach, we revisit the canonical problem of facility location with strategic agents in the two-dimensional Euclidean space. We study both the egalitarian and utilitarian social cost functions, and we propose new strategyproof mechanisms that leverage predictions to guarantee an optimal trade-off between consistency and robustness. Furthermore, we also prove parameterized approximation results as a function of the prediction error, showing that our mechanisms perform well, even when the predictions are not fully accurate.

    Funding: The work of E. Balkanski was supported in part by the National Science Foundation [Grants CCF-2210501 and IIS-2147361]. The work of V. Gkatzelis and X. Tan was supported in part by the National Science Foundation [Grant CCF-2210502] and [CAREER Award CCF-2047907].

     
    more » « less
  4. Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms. Finally, using results from random matrix theory, we reveal the information complexity of learning a linear map in high dimensions and unveil information-theoretic analogs of double and multiple descent phenomena. 
    more » « less
  5. ABSTRACT

    We address the challenge of estimating regression coefficients and selecting relevant predictors in the context of mixed linear regression in high dimensions, where the number of predictors greatly exceeds the sample size. Recent advancements in this field have centered on incorporating sparsity-inducing penalties into the expectation-maximization (EM) algorithm, which seeks to maximize the conditional likelihood of the response given the predictors. However, existing procedures often treat predictors as fixed or overlook their inherent variability. In this paper, we leverage the independence between the predictor and the latent indicator variable of mixtures to facilitate efficient computation and also achieve synergistic variable selection across all mixture components. We establish the non-asymptotic convergence rate of the proposed fast group-penalized EM estimator to the true regression parameters. The effectiveness of our method is demonstrated through extensive simulations and an application to the Cancer Cell Line Encyclopedia dataset for the prediction of anticancer drug sensitivity.

     
    more » « less