skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2039863

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to build node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of Transformer-based methods for graphs that can consider full node connectivity beyond the original sparse structure, thus enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop message passing often fare better in several existing graph benchmarks when combined with positional feature representations, among other innovations, hence limiting the perceived utility and ranking of Transformer-like architectures. Here, we present the Long Range Graph Benchmark (LRGB)1 with 5 graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI. 
    more » « less
  2. Graph neural networks (GNNs) are the primary tool for processing graph-structured data. Unfortunately, the most commonly used GNNs, called Message Passing Neural Networks (MPNNs) suffer from several fundamental limitations. To overcome these limitations, recent works have adapted the idea of positional encodings to graph data. This paper draws inspiration from the recent success of Laplacian-based positional encoding and defines a novel family of positional encoding schemes for graphs. We accomplish this by generalizing the optimization problem that defines the Laplace embedding to more general dissimilarity functions rather than the 2-norm used in the original formulation. This family of positional encodings is then instantiated by considering p-norms. We discuss a method for calculating these positional encoding schemes, implement it in PyTorch and demonstrate how the resulting positional encoding captures different properties of the graph. Furthermore, we demonstrate that this novel family of positional encodings can improve the expressive power of MPNNs. Lastly, we present preliminary experimental results. 
    more » « less
  3. Ensemble learning, in its simplest form, entails the training of multiple models with the same training set. In a standard supervised setting, the training set can be viewed as a 'teacher' with an unbounded capacity of interactions with a single group of 'trainee' models. One can then ask the following broad question: How can we train an ensemble if the teacher has a bounded capacity of interactions with the trainees? Towards answering this question we consider how humans learn in peer groups. The problem of how to group individuals in order to maximize outcomes via cooperative learning has been debated for a long time by social scientists and policymakers. More recently, it has attracted research attention from an algorithmic standpoint which led to the design of grouping policies that appear to result in better aggregate learning in experiments with human subjects. Inspired by human peer learning, we hypothesize that using partially trained models as teachers to other less accurate models, i.e.~viewing ensemble learning as a peer process, can provide a solution to our central question. We further hypothesize that grouping policies, that match trainer models with learner models play a significant role in the overall learning outcome of the ensemble. We present a formalization and through extensive experiments with different types of classifiers, we demonstrate that: (i) an ensemble can reach surprising levels of performance with little interaction with the training set (ii) grouping policies definitely have an impact on the ensemble performance, in agreement with previous intuition and observations in human peer learning. 
    more » « less
  4. High-throughput technologies such as DNA microarrays and RNA-sequencing are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed to Gene Co-expression Networks (GCNs). In a GCN, nodes correspond to genes, and the weight of the connection between two nodes is a measure of similarity in the expression behavior of the two genes. In general, GCN construction and analysis includes three steps; 1) calculating a similarity value for each pair of genes 2) using these similarity values to construct a fully connected weighted network 3) finding clusters of genes in the network, commonly called modules. The specific implementation of these three steps can significantly impact the final output and the downstream biological analysis. GCN construction is a well-studied topic. Existing algorithms rely on relatively simple statistical and mathematical tools to implement these steps. Currently, software package WGCNA appears to be the most widely accepted standard. We hypothesize that the raw features provided by sequencing data can be leveraged to extract modules of higher quality. A novel preprocessing step of the gene expression data set is introduced that in effect calibrates the expression levels of individual genes, before computing pairwise similarities. Further, the similarity is computed as an inner-product of positive vectors. In experiments, this provides a significant improvement over WGCNA, as measured by aggregate p -values of the gene ontology term enrichment of the computed modules. 
    more » « less
  5. Peer groups leverage the presence of knowledgeable individuals in order to increase the knowledge level of other participants. The `smart' formation of peer groups can thus play a crucial role in educational settings, including online social networks and learning platforms. Indeed, the targeted groups formation problem, where the objective is to maximize a measure of aggregate knowledge, has received considerable attention in recent literature. In this paper we initiate a dynamic variant of the problem that, unlike previous works, allows the change of group composition over time while still targeting to maximize the aggregated knowledge level. The problem is studied in a principled way, using a realistic learning gain function and for two different interaction modes among the group members. On the algorithmic side, we present DyGroups, a generic algorithmic framework that is greedy in nature and highly scalable. We present non-trivial proofs to demonstrate theoretical guarantees for DyGroups in a special case. We also present real peer learning experiments with humans, and perform synthetic data experiments to demonstrate the effectiveness of our proposed solutions by comparing against multiple appropriately selected baseline algorithms. 
    more » « less