Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this pa- per, we introduce a signal sampling theory for a type of graph limit—the graphon. We prove a Poincare ́ inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks.
more »
« less
Optimal Sampling Sets in Cographs
In this paper, we calculate the optimal sampling sets for bandlimited signals on cographs. We take into account the tree structure of the cograph to derive closed form results for the uniqueness sets of signals with a given bandwidth. These results do not require expensive spectral decompositions and represent a promising tool for the analysis of signals on graphs that can be approximated by cographs.
more »
« less
- Award ID(s):
- 1815992
- PAR ID:
- 10110223
- Date Published:
- Journal Name:
- 2019 IEEE Data Science Workshop (DSW)
- Page Range / eLocation ID:
- 165 to 169
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study.more » « less
-
Abstract Measured spectral shifts due to intrinsic stellar variability (e.g., pulsations, granulation) and activity (e.g., spots, plages) are the largest source of error for extreme-precision radial-velocity (EPRV) exoplanet detection. Several methods are designed to disentangle stellar signals from true center-of-mass shifts due to planets. The Extreme-precision Spectrograph (EXPRES) Stellar Signals Project (ESSP) presents a self-consistent comparison of 22 different methods tested on the same extreme-precision spectroscopic data from EXPRES. Methods derived new activity indicators, constructed models for mapping an indicator to the needed radial-velocity (RV) correction, or separated out shape- and shift-driven RV components. Since no ground truth is known when using real data, relative method performance is assessed using the total and nightly scatter of returned RVs and agreement between the results of different methods. Nearly all submitted methods return a lower RV rms than classic linear decorrelation, but no method is yet consistently reducing the RV rms to sub-meter-per-second levels. There is a concerning lack of agreement between the RVs returned by different methods. These results suggest that continued progress in this field necessitates increased interpretability of methods, high-cadence data to capture stellar signals at all timescales, and continued tests like the ESSP using consistent data sets with more advanced metrics for method performance. Future comparisons should make use of various well-characterized data sets—such as solar data or data with known injected planetary and/or stellar signals—to better understand method performance and whether planetary signals are preserved.more » « less
-
In this work, we introduce the concept of blue noise sampling, traditionally used in imaging applications, for band limited signals on graphs. We show how the spectral and vertex domain characterization of these patterns is connected with results about the quality of the sampling sets already existing in the literature. We provide numerical evidence that shows that these patterns are also competitive with respect to the state of the art sampling techniques in terms of the reconstruction error.more » « less
-
A Study of Privacy Preservation in Average Consensus Algorithm via Deterministic Obfuscation SignalsThis article is a study on the use of additive obfuscation signals to keep the reference values of the agents in the continuous-time Laplacian average consensus algorithm private from eavesdroppers. Obfuscation signals are perturbations that agents add to their local dynamics and their transmitted-out messages to conceal their private reference values. An eavesdropper is an agent inside or outside the network that has access to some subset of the interagent communication messages, and its knowledge set also includes the network topology. Rather than focusing on using a zero-sum and vanishing additive signal, our work determines the necessary and sufficient conditions that define the set of admissible obfuscation signals that do not perturb the convergence point of the algorithm from the average of the reference values of the agents. Of theoretical interest, our results show that this class includes nonvanishing signals as well. Given this broader class of admissible obfuscation signals, we define a deterministic notion of privacy preservation. In this definition, privacy preservation for an agent means that neither the private reference value nor a finite set of values to which the private reference value of the agent belongs to can be obtained. Then, we evaluate the agents’ privacy against eavesdroppers with different knowledge sets.more » « less
An official website of the United States government

