This content will become publicly available on June 30, 2023
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Page Range or eLocation-ID:
- 7345 to 7353
- Sponsoring Org:
- National Science Foundation
More Like this
In differentially private stochastic gradient descent (DPSGD), gradient clipping and random noise addition disproportionately affect underrepresented and complex classes and subgroups. As a consequence, DPSGD has disparate impact: the accuracy of a model trained using DPSGD tends to decrease more on these classes and subgroups vs. the original, non-private model. If the original model is unfair in the sense that its accuracy is not the same across all subgroups, DPSGD exacerbates this unfairness. In this work, we study the inequality in utility loss due to differential privacy, which compares the changes in prediction accuracy w.r.t. each group between the private model and the non-private model. We analyze the cost of privacy w.r.t. each group and explain how the group sample size along with other factors is related to the privacy impact on group accuracy. Furthermore, we propose a modified DPSGD algorithm, called DPSGD-F, to achieve differential privacy, equal costs of differential privacy, and good utility. DPSGD-F adaptively adjusts the contribution of samples in a group depending on the group clipping bias such that differential privacy has no disparate impact on group accuracy. Our experimental evaluation shows the effectiveness of our removal algorithm on achieving equal costs of differential privacy withmore »
Normalizing flows provide an elegant approach to generative modeling that allows for efficient sampling and exact density evaluation of unknown data distributions. However, current techniques have significant limitations in their expressivity when the data distribution is supported on a lowdimensional manifold or has a non-trivial topology. We introduce a novel statistical framework for learning a mixture of local normalizing flows as “chart maps” over the data manifold. Our framework augments the expressivity of recent approaches while preserving the signature property of normalizing flows, that they admit exact density evaluation. We learn a suitable atlas of charts for the data manifold via a vector quantized autoencoder (VQ-AE) and the distributions over them using a conditional flow. We validate experimentally that our probabilistic framework enables existing approaches to better model data distributions over complex manifolds.
Normalizing flows—a popular class of deep generative models—often fail to represent extreme phenomena observed in real-world processes. In particular, existing normalizing flow architectures struggle to model multivariate extremes, characterized by heavy-tailed marginal distributions and asymmetric tail dependence among variables. In light of this shortcoming, we propose COMET (COpula Multivariate ExTreme) Flows, which decompose the process of modeling a joint distribution into two parts: (i) modeling its marginal distributions, and (ii) modeling its copula distribution. COMET Flows capture heavy-tailed marginal distributions by combining a parametric tail belief at extreme quantiles of the marginals with an empirical kernel density function at mid-quantiles. In addition, COMET Flows capture asymmetric tail dependence among multivariate extremes by viewing such dependence as inducing a low-dimensional manifold structure in feature space. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of COMET flows in capturing both heavy-tailed marginals and asymmetric tail dependence compared to other state-of-the-art baseline architectures. All code is available at https://github.com/andrewmcdonald27/COMETFlows.
Batch Normalization (BN) is essential to effectively train state-of-the-art deep Convolutional Neural Networks (CNN). It normalizes the layer outputs during training using the statistics of each mini-batch. BN accelerates training procedure by allowing to safely utilize large learning rates and alleviates the need for careful initialization of the parameters. In this work, we study BN from the viewpoint of Fisher kernels that arise from generative probability models. We show that assuming samples within a mini-batch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means batch normalizing transform can be explained in terms of kernels that naturally emerge from the probability density function that models the generative process of the underlying data distribution. Consequently, it promises higher discrimination power for the batch-normalized mini-batch. However, given the rectifying non-linearities employed in CNN architectures, distribution of the layer outputs show an asymmetric characteristic. Therefore, in order for BN to fully benefit from the aforementioned properties, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM), reveals that batch normalization can be improved by independently normalizing with respectmore »
A normalizing flow is an invertible mapping between an arbitrary probability distribution and a standard normal distribution; it can be used for density estimation and statistical inference. Computing the flow follows the change of variables formula and thus requires invertibility of the mapping and an efficient way to compute the determinant of its Jacobian. To satisfy these requirements, normalizing flows typically consist of carefully chosen components. Continuous normalizing flows (CNFs) are mappings obtained by solving a neural ordinary differential equation (ODE). The neural ODE's dynamics can be chosen almost arbitrarily while ensuring invertibility. Moreover, the log-determinant of the flow's Jacobian can be obtained by integrating the trace of the dynamics' Jacobian along the flow. Our proposed OT-Flow approach tackles two critical computational challenges that limit a more widespread use of CNFs. First, OT-Flow leverages optimal transport (OT) theory to regularize the CNF and enforce straight trajectories that are easier to integrate. Second, OT-Flow features exact trace computation with time complexity equal to trace estimators used in existing CNFs. On five high-dimensional density estimation and generative modeling tasks, OT-Flow performs competitively to state-of-the-art CNFs while on average requiring one-fourth of the number of weights with an 8x speedup in training timemore »