skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Being Properly Improper
Properness for supervised losses stipulates that the loss function shapes the learning algorithm towards the true posterior of the data generating distribution. Unfortunately, data in modern machine learning can be corrupted or twisted in many ways. Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior. Many papers cope with specific twists (e.g., label/feature/adversarial noise), but there is a growing need for a unified and actionable understanding atop properness. Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to "untwist" the twisted posterior into the clean posterior. Notably, we show that a nontrivial extension of a loss function called alpha-loss, which was first introduced in information theory, is twist-proper. We study the twist-proper alpha-loss under a novel boosting algorithm, called PILBoost, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twist-proper alpha-loss outperforms the proper log-loss on several variants of twisted data.  more » « less
Award ID(s):
2007688 2134256 2031799 1815361 1901243
PAR ID:
10359161
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:20891-20932, 2022.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Properness for supervised losses stipulates that the loss function shapes the learning algorithm towards the true posterior of the data generating distribution. Unfortunately, data in modern machine learning can be corrupted or twisted in many ways. Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior. Many papers cope with specific twists (e.g., label/feature/adversarial noise), but there is a growing need for a unified and actionable understanding atop properness. Our chief theoretical contribution is a generalization of the properness framework with a notion called twist-properness, which delineates loss functions with the ability to "untwist" the twisted posterior into the clean posterior. Notably, we show that a nontrivial extension of a loss function called alpha-loss, which was first introduced in information theory, is twist-proper. We study the twist-proper alpha-loss under a novel boosting algorithm, called PILBoost, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twist-proper alpha-loss outperforms the proper log-loss on several variants of twisted data. 
    more » « less
  2. We solve generalizations of Hubbard's twisted rabbit problem for analogues of the rabbit polynomial of degree $$d\geq 2$$. The twisted rabbit problem asks: when a certain quadratic polynomial, called the Douady Rabbit polynomial, is twisted by a cyclic subgroup of a mapping class group, to which polynomial is the resulting map equivalent (as a function of the power of the generator)? The solution to the original quadratic twisted rabbit problem, given by Bartholdi--Nekrashevych, depended on the 4-adic expansion of the power of the mapping class by which we twist. In this paper, we provide a solution to a degree-$$d$$ generalization that depends on the $d^2$-adic expansion of the power of the mapping class element by which we twist. 
    more » « less
  3. Many modern machine learning tasks require models with high tail performance, i.e. high performance over the worst-off samples in the dataset. This problem has been widely studied in fields such as algorithmic fairness, class imbalance, and risk-sensitive decision making. A popular approach to maximize the model’s tail performance is to minimize the CVaR (Conditional Value at Risk) loss, which computes the average risk over the tails of the loss. However, for classification tasks where models are evaluated by the 0/1 loss, we show that if the classifiers are deterministic, then the minimizer of the average 0/1 loss also minimizes the CVaR 0/1 loss, suggesting that CVaR loss minimization is not helpful without additional assumptions. We circumvent this negative result by minimizing the CVaR loss over randomized classifiers, for which the minimizers of the average 0/1 loss and the CVaR 0/1 loss are no longer the same, so minimizing the latter can lead to better tail performance. To learn such randomized classifiers, we propose the Boosted CVaR Classification framework which is motivated by a direct relationship between CVaR and a classical boosting algorithm called LPBoost. Based on this framework, we design an algorithm called alpha-AdaLPBoost. We empirically evaluate our proposed algorithm on four benchmark datasets and show that it achieves higher tail performance than deterministic model training methods. 
    more » « less
  4. Abstract Motivated by recent experimental observations of opposite Chern numbers in R-type twisted MoTe2and WSe2homobilayers, we perform large-scale density-functional-theory calculations with machine learning force fields to investigate moiré band topology across a range of twist angles in both materials. We find that the Chern numbers of the moiré frontier bands change sign as a function of twist angle, and this change is driven by the competition between moiré ferroelectricity and piezoelectricity. Our large-scale calculations, enabled by machine learning methods, reveal crucial insights into interactions across different scales in twisted bilayer systems. The interplay between atomic-level relaxation effects and moiré-scale electrostatic potential variation opens new avenues for the design of intertwined topological and correlated states, including the possibility of mimicking higher Landau level physics in the absence of magnetic field. 
    more » « less
  5. Active learning aims to reduce the cost of labeling through selective sampling. Despite reported empirical success over passive learning, many popular active learning heuristics such as uncertainty sampling still lack satisfying theoretical guarantees. Towards closing the gap between practical use and theoretical understanding in active learning, we propose to characterize the exact behavior of uncertainty sampling for high-dimensional Gaussian mixture data, in a modern regime of big data where the numbers of samples and features are commensurately large. Through a sharp characterization of the learning results, our analysis sheds light on the important question of when uncertainty sampling works better than passive learning. Our results show that the effectiveness of uncertainty sampling is not always ensured. In fact it depends crucially on the choice of i) an adequate initial classifier used to start the active sampling process and ii) a proper loss function that allows an adaptive treatment of samples queried at various steps. 
    more » « less