Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

With the widespread use of machine learning systems in our daily lives, it is important to consider fairness as a basic requirement when designing these systems, especially when the systems make lifechanging decisions, e.g., \textit{COMPAS} algorithm helps judges decide whether to release an offender. For another thing, due to the cheap but imperfect data collection methods, such as crowdsourcing and web crawling, label noise is ubiquitous, which unfortunately makes fairnessaware algorithms even more prejudiced than fairnessunaware ones, and thereby harmful. To tackle these problems, we provide general frameworks for learning fair classifiers with \textit{instancedependent label noise}. For statistical fairness notions, we rewrite the classification risk and the fairness metric in terms of noisy data and thereby build robust classifiers. For the causalitybased fairness notion, we exploit the internal causal structure of data to model the label noise and \textit{counterfactual fairness} simultaneously. Experimental results demonstrate the effectiveness of the proposed methods on realworld datasets with controllable synthetic label noise.more » « less

Label smoothing (LS) is an arising learning paradigm that uses the positively weighted average of both the hard training labels and uniformly distributed soft labels. It was shown that LS serves as a regularizer for training data with hard labels and therefore improves the generalization of the model. Later it was reported LS even helps with improving robustness when learning with noisy labels. However, we observed that the advantage of LS vanishes when we operate in a high label noise regime. Intuitively speaking, this is due to the increased entropy of ℙ(noisy labelX) when the noise rate is high, in which case, further applying LS tends to "oversmooth" the estimated posterior. We proceeded to discover that several learningwithnoisylabels solutions in the literature instead relate more closely to negative/not label smoothing (NLS), which acts counter to LS and defines as using a negative weight to combine the hard and soft labels! We provide understandings for the properties of LS and NLS when learning with noisy labels. Among other established properties, we theoretically show NLS is considered more beneficial when the label noise rates are high. We provide extensive experimental results on multiple benchmarks to support our findings too.more » « less

Existing research on learning with noisy labels mainly focuses on synthetic label noise. Synthetic label noise, though has clean structures which greatly enable statistical analyses, often fails to model the realworld noise patterns. The recent literature has observed several efforts to offer realworld noisy datasets, e.g., Food101N, WebVision, and Clothing1M. Yet the existing efforts suffer from two caveats: firstly, the lack of groundtruth verification makes it hard to theoretically study the property and treatment of realworld label noise. Secondly, these efforts are often of large scales, which may result in unfair comparisons of robust methods within reasonable and accessible computation power. To better understand realworld label noise, it is important to establish controllable and moderatesized realworld noisy datasets with both groundtruth and noisy labels. This work presents two new benchmark datasets, which we name as CIFAR10N, CIFAR100N, equipping the training datasets of CIFAR10 and CIFAR100 with humanannotated realworld noisy labels that we collect from Amazon Mechanical Turk. We quantitatively and qualitatively show that realworld noisy labels follow an instancedependent pattern rather than the classically assumed and adopted ones (e.g., classdependent label noise). We then initiate an effort to benchmark a subset of the existing solutions using CIFAR10N and CIFAR100N. We further proceed to study the memorization of correct and wrong predictions, which further illustrates the difference between human noise and classdependent synthetic noise. We show indeed the realworld noise patterns impose new and outstanding challenges as compared to synthetic label noise. These observations require us to rethink the treatment of noisy labels, and we hope the availability of these two datasets would facilitate the development and evaluation of future learning with noisy label solutions. The corresponding datasets and the leaderboard are publicly available at http://noisylabels.com.more » « less

In labelnoise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., cleanlabel transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayeslabel transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is illposed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have less uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are onehot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, we estimate the BLTM parametrically by employing a deep neural network, leading to better generalization and superior classification performance.more » « less

null (Ed.)The presence of label noise often misleads the training of deep neural networks. Departing from the recent literature which largely assumes the label noise rate is only determined by the true label class, the errors in humanannotated labels are more likely to be dependent on the difficulty levels of tasks, resulting in settings with instancedependent label noise. We first provide evidences that the heterogeneous instancedependent label noise is effectively downweighting the examples with higher noise rates in a nonuniform way and thus causes imbalances, rendering the strategy of directly applying methods for classdependent label noise questionable. Built on a recent work peer loss [24], we then propose and study the potentials of a secondorder approach that leverages the estimation of several covariance terms defined between the instancedependent noise rates and the Bayes optimal label. We show that this set of secondorder statistics successfully captures the induced imbalances. We further proceed to show that with the help of the estimated secondorder statistics, we identify a new loss function whose expected risk of a classifier under instancedependent label noise is equivalent to a new problem with only classdependent label noise. This fact allows us to apply existing solutions to handle this betterstudied setting. We provide an efficient procedure to estimate these secondorder statistics without accessing either ground truth labels or prior knowledge of the noise rates. Experiments on CIFAR10 and CIFAR100 with synthetic instancedependent label noise and Clothing1M with realworld human label noise verify our approach. Our implementation is available at https://github.com/UCSCREAL/CAL.more » « less

Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y. ; Liang, P.S. ; Vaughan, J. Wortman (Ed.)

The majority of stateoftheart deep learning methods are discriminative approaches, which model the conditional distribution of labels given inputs features. The success of such approaches heavily depends on highquality labeled instances, which are not easy to obtain, especially as the number of candidate classes increases. In this paper, we study the complementary learning problem. Unlike ordinary labels, complementary labels are easy to obtain because an annotator only needs to provide a yes/no answer to a randomly chosen candidate class for each instance. We propose a generativediscriminative complementary learning method that estimates the ordinary labels by modeling both the conditional (discriminative) and instance (generative) distributions. Our method, we call Complementary Conditional GAN (CCGAN), improves the accuracy of predicting ordinary labels and is able to generate highquality instances in spite of weak supervision. In addition to the extensive empirical studies, we also theoretically show that our model can retrieve the true conditional distribution from the complementarilylabeled data.more » « less