skip to main content


This content will become publicly available on December 10, 2024

Title: Bayesian Self-Supervised Learning Using Local and Global Graph Information
Graph-guided learning has well-documented impact in a gamut of network science applications. A prototypical graph-guided learning task deals with semi-supervised learning over graphs, where the goal is to predict the nodal values or labels of unobserved nodes, by leveraging a few nodal observations along with the underlying graph structure. This is particularly challenging under privacy constraints or generally when acquiring nodal observations incurs high cost. In this context, the present work puts forth a Bayesian graph-driven self-supervised learning (Self-SL) approach that: (i) learns powerful nodal embeddings emanating from easier to solve auxiliary tasks that map local to global connectivity information; and, (ii) adopts an ensemble of Gaussian processes (EGPs) with adaptive weights as nodal embeddings are processed online. Unlike most existing deterministic approaches, the novel approach offers accurate estimates of the unobserved nodal values along with uncertainty quantification that is important especially in safety critical applications. Numerical tests on synthetic and real graph datasets showcase merits of the novel EGP-based Self-SL method.  more » « less
Award ID(s):
2103256 2312547 2126052 2128593
NSF-PAR ID:
10494145
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2023 IEEE 9th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
ISBN:
979-8-3503-4452-3
Page Range / eLocation ID:
256 to 260
Format(s):
Medium: X
Location:
Herradura, Costa Rica
Sponsoring Org:
National Science Foundation
More Like this
  1. Graph-guided semi-supervised learning (SSL) has gained popularity in several network science applications, including biological, social, and financial ones. SSL becomes particularly challenging when the available nodal labels are scarce, what motivates naturally the active learning (AL) paradigm. AL seeks the most informative nodes to label in order to effectively estimate the nodal values of unobserved nodes. It is also referred to as active sampling, and boils down to learning the sought function mapping, and an acquisition function (AF) to identify the next node(s) to sample. To learn the mapping, this work leverages an adaptive Bayesian model comprising an ensemble (E) of Gaussian Processes (GPs) with enhanced expressiveness of the function space. Unlike most alternatives, the EGP model relies only on the one-hop connectivity of each node. Capitalizing on this EGP model, a suite of novel and intuitive AFs are developed to guide the active sampling process. These AFs are then combined with weights that are adapted incrementally to further robustify performance. Numerical tests on real and synthetic datasets corroborate the merits of the novel methods. 
    more » « less
  2. Few-shot node classification is tasked to provide accurate predictions for nodes from novel classes with only few representative labeled nodes. This problem has drawn tremendous attention for its projection to prevailing real-world applications, such as product categorization for newly added commodity categories on an E-commerce platform with scarce records or diagnoses for rare diseases on a patient similarity graph. To tackle such challenging label scarcity issues in the non-Euclidean graph domain, meta-learning has become a successful and predominant paradigm. More recently, inspired by the development of graph self-supervised learning, transferring pretrained node embeddings for few-shot node classification could be a promising alternative to meta-learning but remains unexposed. In this work, we empirically demonstrate the potential of an alternative framework, \textit{Transductive Linear Probing}, that transfers pretrained node embeddings, which are learned from graph contrastive learning methods. We further extend the setting of few-shot node classification from standard fully supervised to a more realistic self-supervised setting, where meta-learning methods cannot be easily deployed due to the shortage of supervision from training classes. Surprisingly, even without any ground-truth labels, transductive linear probing with self-supervised graph contrastive pretraining can outperform the state-of-the-art fully supervised meta-learning based methods under the same protocol. We hope this work can shed new light on few-shot node classification problems and foster future research on learning from scarcely labeled instances on graphs. 
    more » « less
  3. Graph-guided semi-supervised learning (SSL) and inference has emerged as an attractive research field thanks to its documented impact in a gamut of application domains, including transportation and power networks, biological, social, environmental, and financial ones. Distinct from SSL approaches that yield point estimates of the variables to be inferred, the present work puts forth a Bayesian interval learning framework that utilizes Gaussian processes (GPs) to allow for uncertainty quantification – a key component in safety-critical applications. An ensemble (E) of GPs is employed to offer an expressive model of the learning function that is updated incrementally as nodal observations become available – what caters also for delay-sensitive settings. For the first time in graph-guided SSL and inference, egonet features per node are utilized as input to the EGP learning function to account for higher order interactions than the one-hop connectivity of each node. Further enhancing these attributes through random features that encrypt sensitive information per node offers scalability and privacy for the EGP-based learning approach. Numerical tests on real and synthetic datasets corroborate the effectiveness of the novel method. 
    more » « less
  4. Graph Neural Networks (GNNs) have recently been used for node and graph classification tasks with great success, but GNNs model dependencies among the attributes of nearby neighboring nodes rather than dependencies among observed node labels. In this work, we consider the task of inductive node classification using GNNs in supervised and semi-supervised settings, with the goal of incorporating label dependencies. Because current GNNs are not universal (i.e., most-expressive) graph representations, we propose a general collective learning approach to increase the representation power of any existing GNN. Our framework combines ideas from collective classification with self-supervised learning, and uses a Monte Carlo approach to sampling embeddings for inductive learning across graphs. We evaluate performance on five real-world network datasets and demonstrate consistent, significant improvement in node classification accuracy, for a variety of state-of-the-art GNNs. 
    more » « less
  5. Forecasting the block maxima of a future time window is a challenging task due to the difficulty in inferring the tail distribution of a target variable. As the historical observations alone may not be sufficient to train robust models to predict the block maxima, domain-driven process models are often available in many scientific domains to supplement the observation data and improve the forecast accuracy. Unfortunately, coupling the historical observations with process model outputs is a challenge due to their disparate temporal coverage. This paper presents Self-Recover, a deep learning framework to predict the block maxima of a time window by employing self-supervised learning to address the varying temporal data coverage problem. Specifically Self-Recover uses a combination of contrastive and generative self-supervised learning schemes along with a denoising autoencoder to impute the missing values. The framework also combines representations of the historical observations with process model outputs via a residual learning approach and learns the generalized extreme value (GEV) distribution characterizing the block maxima values. This enables the framework to reliably estimate the block maxima of each time window along with its confidence interval. Extensive experiments on real-world datasets demonstrate the superiority of Self-Recover compared to other state-of-the-art forecasting methods.

     
    more » « less