Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Ranzato, M.: ; Dauphin, Y. ; Liang, P.S. ; Wortman Vaughan, J. (Ed.)We consider a linesearch method for continuous optimization under a stochastic setting where the function values and gradients are available only through inexact probabilistic zeroth and firstorder oracles. These oracles capture multiple stan dard settings including expected loss minimization and zerothorder optimization. Moreover, our framework is very general and allows the function and gradient estimates to be biased. The proposed algorithm is simple to describe, easy to im plement, and uses these oracles in a similar way as the standard deterministic line search uses exact function and gradient values. Under fairly general conditions on the oracles, we derive a high probability tail bound on the iteration complexity of the algorithm when applied to nonconvex smooth functions. These results are stronger than those for other existing stochastic line search methods and apply in more general settings.

Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y. ; Liang, P.S. ; Wortman Vaughan, J. (Ed.)Tractably modelling distributions over manifolds has long been an important goal in the natural sciences. Recent work has focused on developing general machine learning models to learn such distributions. However, for many applications these distributions must respect manifold symmetries—a trait which most previous models disregard. In this paper, we lay the theoretical foundations for learning symmetryinvariant distributions on arbitrary manifolds via equivariant manifold flows. We demonstrate the utility of our approach by learning quantum field theorymotivated invariant SU(n) densities and by correcting meteor impact dataset bias.

Ranzato, M. ; Beygelzimer, A. ; Dauphin Y. ; Liang, P.S. ; Wortman Vaughan, J. (Ed.)Hyperbolic space is particularly useful for embedding data with hierarchical structure; however, representing hyperbolic space with ordinary floatingpoint numbers greatly affects the performance due to its \emph{ineluctable} numerical errors. Simply increasing the precision of floats fails to solve the problem and incurs a high computation cost for simulating greaterthandoubleprecision floats on hardware such as GPUs, which does not support them. In this paper, we propose a simple, feasibleonGPUs, and easytounderstand solution for numerically accurate learning on hyperbolic space. We do this with a new approach to represent hyperbolic space using multicomponent floatingpoint (MCF) in the Poincar{\'e} upperhalf space model. Theoretically and experimentally we show our model has small numerical error, and on embedding tasks across various datasets, models represented by multicomponent floatingpoints gain more capacity and run significantly faster on GPUs than prior work.

Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y ; Liang, P. S. ; Wortman Vaughan, J. (Ed.)Adversarial examples are a widely studied phenomenon in machine learning models. While most of the attention has been focused on neural networks, other practical models also suffer from this issue. In this work, we propose an algorithm for evaluating the adversarial robustness of knearest neighbor classification, i.e., finding a minimumnorm adversarial example. Diverging from previous proposals, we propose the first geometric approach by performing a search that expands outwards from a given input point. On a high level, the search radius expands to the nearby higherorder Voronoi cells until we find a cell that classifies differently from the input point. To scale the algorithm to a large k, we introduce approximation steps that find perturbation with smaller norm, compared to the baselines, in a variety of datasets. Furthermore, we analyze the structural properties of a dataset where our approach outperforms the competition.

Ranzato, M. ; Beygelzimer, A. ; Liang, P.S. ; Vaughan, J.W. ; Dauphin, Y. (Ed.)Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selectionbased algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence of data corruption. Observing that solving this optimization problem is strongly NPhard, we propose a greedy algorithm that is efficient and effective in practice. Experiments show that our method obtains fairness and robustness that are better than or comparable to the stateoftheart technique, both on synthetic and benchmark real datasets. Moreover, unlike other fair and robust training baselines, our algorithm can be used by only modifying the sampling step in batch selection without changing the training algorithm or leveraging additional clean data.

Ranzato, M. ; Beygelzimer, A. ; Liang, P.S. ; Vaughan, J.W. ; Dauphin, Y. (Ed.)Federated Learning (FL) is a distributed learning framework, in which the local data never leaves clients’ devices to preserve privacy, and the server trains models on the data via accessing only the gradients of those local data. Without further privacy mechanisms such as differential privacy, this leaves the system vulnerable against an attacker who inverts those gradients to reveal clients’ sensitive data. However, a gradient is often insufficient to reconstruct the user data without any prior knowledge. By exploiting a generative model pretrained on the data distribution, we demonstrate that data privacy can be easily breached. Further, when such prior knowledge is unavailable, we investigate the possibility of learning the prior from a sequence of gradients seen in the process of FL training. We experimentally show that the prior in a form of generative model is learnable from iterative interactions in FL. Our findings demonstrate that additional mechanisms are necessary to prevent privacy leakage in FL.

Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y. ; Liang, P.S. ; Wortman Vaughan, J. (Ed.)

Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y. ; Liang, P.S. ; Vaughan, J. Wortman (Ed.)The prevalence of graphbased data has spurred the rapid development of graph neural networks (GNNs) and related machine learning algorithms. Yet, despite the many datasets naturally modeled as directed graphs, including citation, website, and traffic networks, the vast majority of this research focuses on undirected graphs. In this paper, we propose MagNet, a GNN for directed graphs based on a complex Hermitian matrix known as the magnetic Laplacian. This matrix encodes undirected geometric structure in the magnitude of its entries and directional information in their phase. A charge parameter attunes spectral information to variation among directed cycles. We apply our network to a variety of directed graph node classification and link prediction tasks showing that MagNet performs well on all tasks and that its performance exceeds all other methods on a majority of such tasks. The underlying principles of MagNet are such that it can be adapted to other GNN architectures.

Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y. ; Liang, P.S. ; Vaughan, J. W. (Ed.)We provide theoretical guarantees for label consistency in generalized kmeans problems, with an emphasis on the overfitted case where the number of clusters used by the algorithm is more than the ground truth. We provide conditions under which the estimated labels are close to a refinement of the true cluster labels. We consider both exact and approximate recovery of the labels. Our results hold for any constantfactor approximation to the kmeans problem. The results are also modelfree and only based on bounds on the maximum or average distance of the data points to the true cluster centers. These centers themselves are loosely defined and can be taken to be any set of points for which the aforementioned distances can be controlled. We show the usefulness of the results with applications to some manifold clustering problems.

Ranzato, M. ; Beygelzimer, A. ; Dauphin, Y. ; Liang, P.S. ; Wortman Vaughan, J. (Ed.)We develop nested variational inference (NVI), a family of methods that learn proposals for nested importance samplers by minimizing an forward or reverse KL divergence at each level of nesting. NVI is applicable to many commonlyused importance sampling strategies and provides a mechanism for learning intermediate densities, which can serve as heuristics to guide the sampler. Our experiments apply NVI to (a) sample from a multimodal distribution using a learned annealing path (b) learn heuristics that approximate the likelihood of future observations in a hidden Markov model and (c) to perform amortized inference in hierarchical deep generative models. We observe that optimizing nested objectives leads to improved sample quality in terms of log average weight and effective sample size.