NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Contextures: Representations from Contexts

Zhai, Runtian; Yang, Kai; Varici, Burak; Tsai, Che-Ping; Kolter, Zico; Ravikumar, Pradeep (July 2025, International Conference on Machine Learning (ICML))

Despite the empirical success of foundation models, we do not have a systematic characterization of the representations that these models learn. In this paper, we establish the contexture theory. It shows that a large class of representation learning methods can be characterized as learning from the association between the input and a context variable. Specifically, we show that many popular methods aim to approximate the top-d singular functions of the expectation operator induced by the context, in which case we say that the representation learns the contexture. We demonstrate the generality of the contexture theory by proving that representation learning within various learning paradigms—supervised, self-supervised, and manifold learning—can all be studied from such a perspective. We also prove that the representations that learn the contexture are optimal on those tasks that are compatible with the context. One important implication of the contexture theory is that once the model is large enough to approximate the top singular functions, further scaling up the model size yields diminishing returns. Therefore, scaling is not all we need, and further improvement requires better contexts. To this end, we study how to evaluate the usefulness of a context without knowing the downstream tasks. We propose a metric and show by experiments that it correlates well with the actual performance of the encoder on many real datasets.
more » « less
Free, publicly-accessible full text available July 19, 2026
Neural Network Verification with Branch-and-Bound for General Nonlinearities

Shi, Zhouxing; Jin, Qirui; Kolter, Zico; Jana, Suman; Hsieh, Cho-Jui; Zhang, Huan (May 2025, 31st International Conference on Tools and Algorithms for the Construction and Analysis of Systems)

Free, publicly-accessible full text available May 3, 2026
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Zhai, Runtian; Liu, Bingbin; Risteski, Andrej; Kolter, Zico; Ravikumar, Pradeep (May 2024, International Conference on Learning Representations (ICLR), 2024)

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance.
more » « less
Full Text Available
Understanding Augmentation-Based Self-Supervised Representation Learning Via RKHS Approximation And Regression

Zhai, Runtian; Liu, Bingbin; Risteski, Andrej; Kolter, Zico; Ravikumar, Pradeep (May 2024, International Conference on Learning Representations (ICLR), 2024)

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary encoder, where the prediction error is decomposed as the sum of an estimation error incurred by fitting a linear probe with RKHS regression, and an approximation error entailed by RKHS approximation. Our second bound specifically addresses the case where the encoder is near-optimal, that is it approximates the top-d eigenspace of the RKHS induced by the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance.
more » « less
Full Text Available
UNDERSTANDING WHY GENERALIZED REWEIGHTING DOES NOT IMPROVE OVER ERM

Zhai, Runtian; Dan, Chen; Kolter, Zico; Ravikumar, Pradeep (May 2023, International Conference on Learning Representations)

Empirical risk minimization (ERM) is known to be non-robust in practice to distributional shift where the training and the test distributions are different. A suite of approaches, such as importance weighting, and variants of distributionally robust optimization (DRO), have been proposed to solve this problem. But a line of recent work has empirically shown that these approaches do not significantly improve over ERM in real applications with distribution shift. The goal of this work is to obtain a comprehensive theoretical understanding of this intriguing phenomenon. We first posit the class of Generalized Reweighting (GRW) algorithms, as a broad category of approaches that iteratively update model parameters based on iterative reweighting of the training samples. We show that when overparameterized models are trained under GRW, the resulting models are close to that obtained by ERM. We also show that adding small regularization which does not greatly affect the empirical training accuracy does not help. Together, our results show that a broad category of what we term GRW approaches are not able to achieve distributionally robust generalization. Our work thus has the following sobering takeaway: to make progress towards distributionally robust generalization, we either have to develop non-GRW approaches, or perhaps devise novel classification/regression loss functions that are adapted to GRW approaches.
more » « less
Full Text Available
Improving Adversarial Robustness via Joint Classification and Multiple Explicit Detection Classes

Baharlouei, Sina; Sheikholeslami, Fatemeh; Razaviyayn, Meisam; Kolter, Zico (April 2023, PMLR)

This work concerns the development of deep networks that are certifiably robust to adversarial attacks. Joint robust classification-detection was recently introduced as a certified defense mechanism, where adversarial examples are either correctly classified or assigned to the “abstain” class. In this work, we show that such a provable framework can benefit by extension to networks with multiple explicit abstain classes, where the adversarial examples are adaptively assigned to those. We show that naïvely adding multiple abstain classes can lead to “model degeneracy”, then we propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes. Our experiments demonstrate that the proposed approach consistently achieves favorable standard vs. robust verified accuracy tradeoffs, outperforming state-of-the-art algorithms for various choices of number of abstain classes
more » « less
Full Text Available
An Efficient Framework for Computing Tight Lipschitz Constants of Neural Networks

Shi, Zhouxing; Wang, Yihan; Zhang, Huan; Kolter, Zico; Hsieh, Cho-Jui (January 2022, Advances in neural information processing systems)

Full Text Available
A Branch and Bound Framework for Stronger Adversarial Attacks of ReLU Networks

Zhang, Huan; Wang, Shiqi; Xu, Kaidi; Wang, Yihan; Jana, Suman; Hsieh, Cho-Jui; Kolter, Zico (January 2022, International Conference on Machine Learning (ICML))

Full Text Available
General Cutting Planes for Bound-Propagation-Based Neural Network Verification

Zhang, Huan; Wang, Shiqi; Xu, Kaidi; Li, Linyi; Li, Bo; Jana, Suman; Hsieh, Cho-Jui; Kolter, Zico (January 2022, Advances in neural information processing systems)

Full Text Available
RATT: Leveraging Unlabeled Data to Guarantee Generalization

https://doi.org/https://doi.org/10.48550/arXiv.2105.00303

Garg, Saurabh; Balakrishnan, Sivaraman; Kolter, Zico; Lipton, Zachary (January 2021, ICML)

Full Text Available

« Prev Next »

Search for: All records