Graph-guided semi-supervised learning (SSL) and inference has emerged as an attractive research field thanks to its documented impact in a gamut of application domains, including transportation and power networks, biological, social, environmental, and financial ones. Distinct from SSL approaches that yield point estimates of the variables to be inferred, the present work puts forth a Bayesian interval learning framework that utilizes Gaussian processes (GPs) to allow for uncertainty quantification – a key component in safety-critical applications. An ensemble (E) of GPs is employed to offer an expressive model of the learning function that is updated incrementally as nodal observations become available – what caters also for delay-sensitive settings. For the first time in graph-guided SSL and inference, egonet features per node are utilized as input to the EGP learning function to account for higher order interactions than the one-hop connectivity of each node. Further enhancing these attributes through random features that encrypt sensitive information per node offers scalability and privacy for the EGP-based learning approach. Numerical tests on real and synthetic datasets corroborate the effectiveness of the novel method.
more »
« less
Active Sampling over Graphs for Bayesian Reconstruction with Gaussian Ensembles
Graph-guided semi-supervised learning (SSL) has gained popularity in several network science applications, including biological, social, and financial ones. SSL becomes particularly challenging when the available nodal labels are scarce, what motivates naturally the active learning (AL) paradigm. AL seeks the most informative nodes to label in order to effectively estimate the nodal values of unobserved nodes. It is also referred to as active sampling, and boils down to learning the sought function mapping, and an acquisition function (AF) to identify the next node(s) to sample. To learn the mapping, this work leverages an adaptive Bayesian model comprising an ensemble (E) of Gaussian Processes (GPs) with enhanced expressiveness of the function space. Unlike most alternatives, the EGP model relies only on the one-hop connectivity of each node. Capitalizing on this EGP model, a suite of novel and intuitive AFs are developed to guide the active sampling process. These AFs are then combined with weights that are adapted incrementally to further robustify performance. Numerical tests on real and synthetic datasets corroborate the merits of the novel methods.
more »
« less
- PAR ID:
- 10424923
- Date Published:
- Journal Name:
- Asilomar Conference on Signals Systems and Computers
- Page Range / eLocation ID:
- 58 to 64
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The advent of diverse frequency bands in 5G networks has promoted measurement studies focused on 5G signal propagation, aiming to understand its pathloss, coverage, and channel quality characteristics. Nonetheless, conducting a thorough 5G measurement campaign is markedly laborious given the large number of 5G measurement samples that must be collected. To alleviate this burden, the present contribution leverages principled active learning (AL) methods to prudently select only a few, yet most informative locations to collect 5G measurements. The core idea is to rely on a Gaussian Process (GP) model to efficiently extrapolate 5G measurements throughout the coverage area. Specifically, an ensemble (E) of GP models is adopted that not only provides a rich learning function space, but also quantifies uncertainty, and can offer accurate predictions. Building on this EGP model, a suite of acquisition functions (AFs) are advocated to query new locations on-the-fly. To account for realistic 5G measurement campaigns, the proposed AFs are augmented with a novel distance-based AL rule that selects informative samples, while penalizing queries at long distances. Numerical tests on 5G data generated by the Sionna simulator and on real urban and suburban datasets, showcase the merits of the novel EGP-AL approaches.more » « less
-
Labeled data can be expensive to acquire in several application domains, including medical imaging, robotics, computer vision and wireless networks to list a few. To efficiently train machine learning models under such high labeling costs, active learning (AL) judiciously selects the most informative data instances to label on-the-fly. This active sampling process can benefit from a statistical function model, that is typically captured by a Gaussian process (GP) with well-documented merits especially in the regression task. While most GP-based AL approaches rely on a single kernel function, the present contribution advocates an ensemble of GP (EGP) models with weights adapted to the labeled data collected incrementally. Building on this novel EGP model, a suite of acquisition functions emerges based on the uncertainty and disagreement rules. An adaptively weighted ensemble of EGP-based acquisition functions is advocated to further robustify performance. Extensive tests on synthetic and real datasets in the regression task showcase the merits of the proposed EGP-based approaches with respect to the single GP-based AL alternatives.more » « less
-
Graph-guided learning has well-documented impact in a gamut of network science applications. A prototypical graph-guided learning task deals with semi-supervised learning over graphs, where the goal is to predict the nodal values or labels of unobserved nodes, by leveraging a few nodal observations along with the underlying graph structure. This is particularly challenging under privacy constraints or generally when acquiring nodal observations incurs high cost. In this context, the present work puts forth a Bayesian graph-driven self-supervised learning (Self-SL) approach that: (i) learns powerful nodal embeddings emanating from easier to solve auxiliary tasks that map local to global connectivity information; and, (ii) adopts an ensemble of Gaussian processes (EGPs) with adaptive weights as nodal embeddings are processed online. Unlike most existing deterministic approaches, the novel approach offers accurate estimates of the unobserved nodal values along with uncertainty quantification that is important especially in safety critical applications. Numerical tests on synthetic and real graph datasets showcase merits of the novel EGP-based Self-SL method.more » « less
-
Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics. BO hinges on a Bayesian surrogate model to sequentially select query points so as to balance exploration with exploitation of the search space. Most existing works rely on a single Gaussian process (GP) based surrogate model, where the kernel function form is typically preselected using domain knowledge. To bypass such a design process, this paper leverages an ensemble (E) of GPs to adaptively select the surrogate model fit on-the-fly, yielding a GP mixture posterior with enhanced expressiveness for the sought function. Acquisition of the next evaluation input using this EGP-based function posterior is then enabled by Thompson sampling (TS) that requires no additional design parameters. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. The novel EGP-TS readily accommodates parallel operation. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret for both sequential and parallel settings. Tests on synthetic functions and real-world applications showcase the merits of the proposed method.more » « less