There is a growing need for models that are interpretable and have reduced energy/computational cost (e.g., in health care analytics and federated learning). Examples of algorithms to train such models include logistic regression and boosting. However, one challenge facing these algorithms is that they provably suffer from label noise; this has been attributed to the joint interaction be- tween oft-used convex loss functions and simpler hypothesis classes, resulting in too much emphasis being placed on outliers. In this work, we use the margin-based α-loss, which continuously tunes between canonical convex and quasi- convex losses, to robustly train simple models. We show that the α hyperparameter smoothly introduces non-convexity and offers the benefit of “giving up” on noisy training examples. We also provide results on the Long-Servedio dataset for boosting and a COVID-19 survey dataset for logistic regression, highlighting the efficacy of our approach across multiple relevant domains.
more »
« less
Smoothly Giving up: Robustness for Simple Models
There is a growing need for models that are interpretable and have reduced energy/computational cost (e.g., in health care analytics and federated learning). Examples of algorithms to train such models include logistic regression and boosting. However, one challenge facing these algorithms is that they provably suffer from label noise; this has been attributed to the joint interaction between oft-used convex loss functions and simpler hypothesis classes, resulting in too much emphasis being placed on outliers. In this work, we use the margin-based 𝛼-loss, which continuously tunes between canonical convex and quasi-convex losses, to robustly train simple models. We show that the 𝛼 hyperparameter smoothly introduces non-convexity and offers the benefit of “giving up” on noisy training examples. We also provide results on the Long-Servedio dataset for boosting and a COVID-19 survey dataset for logistic regression, highlighting the efficacy of our approach across multiple relevant domains.
more »
« less
- Award ID(s):
- 1901243
- PAR ID:
- 10417021
- Date Published:
- Journal Name:
- Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
- Volume:
- 206
- Page Range / eLocation ID:
- 5376 - 5410
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We give the first statistical-query lower bounds for agnostically learning any non-polynomial activation with respect to Gaussian marginals (e.g., ReLU, sigmoid, sign). For the specific problem of ReLU regression (equivalently, agnostically learning a ReLU), we show that any statistical-query algorithm with tolerance n−(1/ϵ)b must use at least 2ncϵ queries for some constant b,c>0, where n is the dimension and ϵ is the accuracy parameter. Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems. Our techniques involve a gradient boosting procedure for "amplifying" recent lower bounds due to Diakonikolas et al. (COLT 2020) and Goel et al. (ICML 2020) on the SQ dimension of functions computed by two-layer neural networks. The crucial new ingredient is the use of a nonstandard convex functional during the boosting procedure. This also yields a best-possible reduction between two commonly studied models of learning: agnostic learning and probabilistic concepts.more » « less
-
Privacy-preserving machine learning (PPML) enables multiple distrusting parties to jointly train ML models on their private data without revealing any information beyond the final trained models. In this work, we study the client-aided two-server setting where two non-colluding servers jointly train an ML model on the data held by a large number of clients. By involving the clients in the training process, we develop efficient protocols for training algorithms including linear regression, logistic regression, and neural networks. In particular, we introduce novel approaches to securely computing inner product, sign check, activation functions (e.g., ReLU, logistic function), and division on secret shared values, leveraging lightweight computation on the client side. We present constructions that are secure against semi-honest clients and further enhance them to achieve security against malicious clients. We believe these new client-aided techniques may be of independent interest. We implement our protocols and compare them with the two-server PPML protocols presented in SecureML (Mohassel and Zhang, S&P’17) across various settings and ABY2.0 (Patra et al., Usenix Security’21) theoretically. We demonstrate that with the assistance of untrusted clients in the training process, we can significantly improve both the communication and computational efficiency by orders of magnitude. Our protocols compare favorably in all the training algorithms on both LAN and WAN networks.more » « less
-
Pierre Alquier (Ed.)A systematic approach to finding variational approximation in an otherwise intractable non-conjugate model is to exploit the general principle of convex duality by minorizing the marginal likelihood that renders the problem tractable. While such approaches are popular in the context of variational inference in non-conjugate Bayesian models, theoretical guarantees on statistical optimality and algorithmic convergence are lacking. Focusing on logistic regression models, we provide mild conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the variational optima. We demonstrate that these assumptions can be completely relaxed if one considers a slight variation of the algorithm by raising the likelihood to a fractional power. Next, we utilize the theory of dynamical systems to provide convergence guarantees for such algorithms in logistic and multinomial logit regression. In particular, we establish local asymptotic stability of the algorithm without any assumptions on the data-generating process. We explore a special case involving a semi-orthogonal design under which a global convergence is obtained. The theory is further illustrated using several numerical studies.more » « less
-
null (Ed.)Three features are crucial for sequential forecasting and generation models: tractability, expressiveness, and theoretical backing. While neural autoregressive models are relatively tractable and offer powerful predictive and generative capabilities, they often have complex optimization landscapes, and their theoretical properties are not well understood. To address these issues, we present convex formulations of autoregressive models with one hidden layer. Specifically, we prove an exact equivalence between these models and constrained, regularized logistic regression by using semi-infinite duality to embed the data matrix onto a higher dimensional space and introducing inequality constraints. To make this formulation tractable, we approximate the constraints using a hinge loss or drop them altogether. Furthermore, we demonstrate faster training and competitive performance of these implementations compared to their neural network counterparts on a variety of data sets. Consequently, we introduce techniques to derive tractable, expressive, and theoretically-interpretable models that are nearly equivalent to neural autoregressive models.more » « less
An official website of the United States government

