NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

Lim, Derek; Putterman, Moe; Walters, Robin; Maron, Haggai; Jegelka, Stefanie (December 2024, Neural Information Processing Systems (NeurIPS))

Full Text Available
A Poincare Inequality and Consistency Results for Signal Sampling on Large Graphs

Le, T; Ruiz, Luana; Jegelka, Stefanie (May 2024, International Conference on Learning Representations (ICLR))

Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this pa- per, we introduce a signal sampling theory for a type of graph limit—the graphon. We prove a Poincare ́ inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks.
more » « less
Full Text Available
On the hardness of learning under symmetries

Kiani, Bobak; Le, Thien; Lawrence, Hannah; Jegelka, Stefanie; Weber, Melanie (May 2024, International Conference on Learning Representations (ICLR), 2024)

We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for permutation subgroups, which all scale either superpolynomially or exponentially in the relevant input dimension. Therefore, in spite of the significant inductive bias imparted via symmetry, actually learning the complete classes of functions represented by equivariant neural networks via gradient descent remains hard.
more » « less
Full Text Available
Limits, approximation and size transferability for GNNs on sparse graphs via graphops

Le, Thien; Jegelka, Stefanie (December 2023, Neural Information Processing Systems (NeurIPS))

Can graph neural networks generalize to graphs that are different from the graphs they were trained on, e.g., in size? In this work, we study this question from a theoretical perspective. While recent work established such transferability and approximation results via graph limits, e.g., via graphons, these only apply nontrivially to dense graphs. To include frequently encountered sparse graphs such as bounded-degree or power law graphs, we take a perspective of taking limits of operators derived from graphs, such as the aggregation operation that makes up GNNs. This leads to the recently introduced limit notion of graphops (Backhausz and Szegedy, 2022). We demonstrate how the operator perspective allows us to develop quantitative bounds on the distance between a finite GNN and its limit on an infinite graph, as well as the distance between the GNN on graphs of different sizes that share structural properties, under a regularity assumption verified for various graph sequences. Our results hold for dense and sparse graphs, and various notions of graph limits.
more » « less
Full Text Available
The Exact Sample Complexity Gain from Invariances for Kernel Regression

Tahmasebi, Behrooz; Jegelka, Stefanie (December 2023, Conference on Neural Information Processing Systems)

Full Text Available
On the Stability of Expressive Positional Encodings for Graphs

Huang, Yinan; Lu, William; Robinson, Joshua; Yang, Yu; Zhang, Muhan; Jegelka, Stefanie; Li, Pan (April 2024, Openreview)

Designing effective positional encodings for graphs is key to building powerful graph transformers and enhancing message-passing graph neural networks. Although widespread, using Laplacian eigenvectors as positional encodings faces two fundamental challenges: (1) \emph{Non-uniqueness}: there are many different eigendecompositions of the same Laplacian, and (2) \emph{Instability}: small perturbations to the Laplacian could result in completely different eigenspaces, leading to unpredictable changes in positional encoding. Despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. We identify the cause of instability to be a ``hard partition'' of eigenspaces. Hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to ``softly partition'' eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis invariant functions whilst respecting all symmetries of eigenvectors. Besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. Finally, we evaluate the effectiveness of our method on molecular property prediction, and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods.
more » « less
Full Text Available
Adaptive Generalization and Optimization of Three-Layer Neural Networks

Gatmiry, Khashayar; Jegelka, Stefanie; Kelner, Jonathan (January 2022, The Tenth International Conference on Learning Representations (ICLR))

Full Text Available
Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification

Gotovos, Alkis; Burkholz, Rebekka; Quackenbush, John; Jegelka, Stefanie (December 2021, Advances in neural information processing systems)

Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecification. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.
more » « less
Full Text Available
Can contrastive learning avoid shortcut solutions?

Robinson, Joshua; Sun, Li; Yu, Ke; Batmanghelich, Kayhan; Jegelka, Stefanie; Sra, Suvrit (December 2021, Advances in neural information processing systems)

The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via “shortcuts”, i.e., by inadvertently suppressing important predictive features. We find that feature extraction is influenced by the difficulty of the so-called instance discrimination task (i.e., the task of discriminating pairs of similar points from pairs of dissimilar ones). Although harder pairs improve the representation of some features, the improvement comes at the cost of suppressing previously well represented features. In response, we propose implicit feature modification (IFM), a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features. Empirically, we observe that IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks. The code is available at: https://github. com/joshr17/IFM.
more » « less
Full Text Available
What training reveals about neural network complexity

Loukas, Andreas; Poiitis, Marinos; Jegelka, Stefanie (January 2021, 35th Conference on Neural Information Processing Systems (NeurIPS 2021))

Full Text Available

« Prev Next »

Search for: All records