NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gradient Flows and Riemannian Structure in the Gromov-Wasserstein Geometry

https://doi.org/10.1007/s10208-025-09722-w

Zhang, Zhengxin; Goldfeld, Ziv; Greenewald, Kristjan; Mroueh, Youssef; Sriperumbudur, Bharath K (July 2025, Foundations of Computational Mathematics)

Free, publicly-accessible full text available July 8, 2026
Neural entropic multimarginal optimal transport

Tsur, Dor; Goldfeld, Ziv; Greenewald, Kristjan; Permuter, Haim H (December 2024, Optimization for Machine Learning Workshop at NeurIPS (OPT-2024))

Full Text Available
Score Distillation via Reparametrized DDIM

Lukoianov, Artem; Borde, Haitz; Greenewald, Kristjan; Guizilini, Vitor; Bagautdinov, Timur; Sitzmann, Vincent; Solomon, Justin (December 2024, NeurIPS Proceedings)

While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, we show that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice of a noise term. In particular, after a change of variables, SDS resembles a high-variance version of Denoising Diffusion Implicit Models (DDIM) with a differently-sampled noise term: SDS introduces noise i.i.d. randomly at each step, while DDIM infers it from the previous noise predictions. This excessive variance can lead to over-smoothing and unrealistic outputs. We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step. This modification makes SDS's generative process for 2D images almost identical to DDIM. In 3D, it removes over-smoothing, preserves higher-frequency detail, and brings the generation quality closer to that of 2D samplers. Experimentally, our method achieves better or similar 3D generation quality compared to other state-of-the-art Score Distillation methods, all without training additional neural networks or multi-view supervision, and providing useful insights into relationship between 2D and 3D asset generation with diffusion models.
more » « less
Full Text Available
Max-sliced mutual information

Tsur, Dor; Goldfeld, Ziv; Greenewald, Kristjan (February 2024, Advances in neural information processing systems)

Quantifying dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
more » « less
Full Text Available
Max-sliced mutual information

Tsur, Dor; Goldfeld, Ziv; Greenewald, Kristjan (December 2023, Advances in neural information processing systems)

Full Text Available
k-Mixup Regularization for Deep Learning via Optimal Transport

Greenewald, Kristjan; Gu, Anming; Yurochkin, Mikhail; Solomon, Justin; Chien, Edward (November 2023, Transactions on Machine Learning Research)

Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to k-mixup, which perturbs k-batches of training points in the direction of other k-batches. The perturbation is done with displacement interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that k-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the k-mixup case. Our empirical results show that training with k-mixup further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. For the wide variety of real datasets considered, the performance gains of k-mixup over standard mixup are similar to or larger than the gains of mixup itself over standard ERM after hyperparameter optimization. In several instances, in fact, k-mixup achieves gains in settings where standard mixup has negligible to zero improvement over ERM.
more » « less
Full Text Available
Identifiability Guarantees for Causal Disentanglement from Soft Interventions

Zhang, Jiaqi; Greenewald, Kristjan H; Squires, Chandler; Srivastava, Akash; Shanmugam, Karthikeyan; Uhler, Caroline (December 2023, Conference on Neural Information Processing Systems)

Full Text Available
Learning Proximal Operators to Discover Multiple Optima

Li, Lingxiao; Aigerman, Noam; Kim, Vladimir; Li, Jiajin; Greenewald, Kristjan; Yurochkin, Mikhail; Solomon, Justin (May 2023, International Conference on Learning Representations)

Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method.
more » « less
Full Text Available
$$k$$-Variance: A Clustered Notion of Variance

https://doi.org/10.1137/20M1385895

Solomon, Justin; Greenewald, Kristjan; Nagaraja, Haikady (September 2022, SIAM Journal on Mathematics of Data Science)

Full Text Available
Learning Proximal Operators to Discover Multiple Optima

Li, Lingxiao; Aigerman, Noam; Kim, Vladimir; Li, Jiajin; Greenewald, Kristjan; Yurochkin, Mikhail; Solomon, Justin (February 2023, International Conference on Learning Representations)

Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method.
more » « less
Full Text Available

« Prev Next »

Search for: All records