Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this pa- per, we introduce a signal sampling theory for a type of graph limit—the graphon. We prove a Poincare ́ inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks.
more »
« less
Optimal Sampling Sets in Cographs
In this paper, we calculate the optimal sampling sets for bandlimited signals on cographs. We take into account the tree structure of the cograph to derive closed form results for the uniqueness sets of signals with a given bandwidth. These results do not require expensive spectral decompositions and represent a promising tool for the analysis of signals on graphs that can be approximated by cographs.
more »
« less
- Award ID(s):
- 1815992
- PAR ID:
- 10110223
- Date Published:
- Journal Name:
- 2019 IEEE Data Science Workshop (DSW)
- Page Range / eLocation ID:
- 165 to 169
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study.more » « less
-
Abstract Bacteria utilize cell-cell signaling to coordinate gene expression in populations of cells. Bacterial signal exchange was originally interpreted as a mechanism bacteria use to regulate gene expression in response to changes in cell density, denoted as quorum sensing. Bacterial communication is now known to encompass the exchange of multiple chemical signals between different species of bacteria. Such signal crosstalk within communities of bacteria can have unexpected consequences. Some bacterial species even utilize more than one orthogonal signaling molecule, enabling such species to simultaneously communicate within distinct subsets of species. Such cells utilizing two sets of signals act as a bridge to link gene expression states within the community. Here, a mathematical model was implemented to investigate the consequences of multi-signal communication within heterogeneous bacterial communities. The model was inspired by simple neural networks, with nodes representing bacterial species and directed weights between nodes accounting for the impacts of inter-species signal exchange on gene expression. The activity state of such a network is defined as the gene expression state of each species within the community. Using the model, the stability of the activity states of such networks to changes in signal concentration and population size were quantified. Networks exchanging one set of signals were compared to network exchanging two orthogonal sets of signals. A multilayer neural network model was developed to analyze such networks exchanging orthogonal sets of signals. The model reveals that signal crosstalk increased the activity of the network. These networks were largely resilient to perturbation, however networks were more sensitive to perturbations of the largest population size. Bacterial species utilizing two orthogonal signals, within multilayer networks, had the potential to couple activity states of species that cannot directly communicate. These results give insight into strategies for manipulating signal exchange to predict and control gene expression within bacterial communities.more » « less
-
Abstract Measured spectral shifts due to intrinsic stellar variability (e.g., pulsations, granulation) and activity (e.g., spots, plages) are the largest source of error for extreme-precision radial-velocity (EPRV) exoplanet detection. Several methods are designed to disentangle stellar signals from true center-of-mass shifts due to planets. The Extreme-precision Spectrograph (EXPRES) Stellar Signals Project (ESSP) presents a self-consistent comparison of 22 different methods tested on the same extreme-precision spectroscopic data from EXPRES. Methods derived new activity indicators, constructed models for mapping an indicator to the needed radial-velocity (RV) correction, or separated out shape- and shift-driven RV components. Since no ground truth is known when using real data, relative method performance is assessed using the total and nightly scatter of returned RVs and agreement between the results of different methods. Nearly all submitted methods return a lower RV rms than classic linear decorrelation, but no method is yet consistently reducing the RV rms to sub-meter-per-second levels. There is a concerning lack of agreement between the RVs returned by different methods. These results suggest that continued progress in this field necessitates increased interpretability of methods, high-cadence data to capture stellar signals at all timescales, and continued tests like the ESSP using consistent data sets with more advanced metrics for method performance. Future comparisons should make use of various well-characterized data sets—such as solar data or data with known injected planetary and/or stellar signals—to better understand method performance and whether planetary signals are preserved.more » « less
-
In this work, we introduce the concept of blue noise sampling, traditionally used in imaging applications, for band limited signals on graphs. We show how the spectral and vertex domain characterization of these patterns is connected with results about the quality of the sampling sets already existing in the literature. We provide numerical evidence that shows that these patterns are also competitive with respect to the state of the art sampling techniques in terms of the reconstruction error.more » « less
An official website of the United States government

