skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improvements on SCORE, Especially for Weak Signals
A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study.  more » « less
Award ID(s):
1943902 1925845
PAR ID:
10287983
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Sankhya A
ISSN:
0976-836X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. SCORE was introduced as a spectral approach to network community detection. Since many networks have severe degree heterogeneity, the ordinary spectral clustering (OSC) approach to community detection may perform unsatisfactorily. SCORE alleviates the effect of degree heterogeneity by introducing a new normalization idea in the spectral domain and makes OSC more effective. SCORE is easy to use and computationally fast. It adapts easily to new directions and sees an increasing interest in practice. In this paper, we review the basics of SCORE, the adaption of SCORE to network mixed membership estimation and topic modeling, and the application of SCORE in real data, including two datasets on the publications of statisticians. We also review the theoretical “ideology” underlying SCORE. We show that in the spectral domain, SCORE converts a simplicial cone to a simplex and provides a simple and direct link between the simplex and network memberships. SCORE attains an exponential rate and a sharp phase transition in community detection, and achieves optimal rates in mixed membership estimation and topic modeling. 
    more » « less
  2. Abstract In recommender systems, users rate items, and are subsequently served other product recommendations based on these ratings. Even though users usually rate a tiny percentage of the available items, the system tries to estimate unobserved preferences by finding similarities across users and across items. In this work, we treat the observed ratings data as partially observed, dense, weighted, bipartite networks. For a class of systems without outside information, we adapt an approach developed for dense, weighted networks to account for unobserved edges and the bipartite nature of the problem. The approach begins with clustering both users and items into communities, and locally estimates the patterns of ratings within each subnetwork induced by restricting attention to one community of users and one community of items community. The local fitting procedure relies on estimating local sociability parameters for every user and item, and selecting the function that determines the degree correction contours which best models the underlying data. We compare the performance of our proposed approach to existing methods on a simulated data set, as well as on a data set of joke ratings, examining model performance in both cases at differing levels of sparsity. On the joke ratings data set, our proposed model performs better than existing alternatives in relatively sparse settings, though other approaches achieve better results when more data is available. Collectively, the results indicate that despite struggling to pick up subtler signals, the proposed approach’s recovery of large scale, coarse patterns may still be useful in practical settings where high sparsity is typical. 
    more » « less
  3. Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on trace of powers of the adjacency matrix. We illustrate our methods using both simulated and real data sets, and theoretical justifications are provided in terms of consistency. 
    more » « less
  4. We consider the problem of multiway clustering in the presence of unknown degree​ ​heterogeneity. Such data problems arise commonly in applications such as recommendation systems, neuroimaging, community detection, and hypergraph partitions​ ​in social networks. The allowance of degree heterogeneity provides great flexibility​ ​in clustering models, but the extra complexity poses significant challenges in both​ ​statistics and computation. Here, we develop a degree-corrected tensor block model​ ​with estimation accuracy guarantees. We present the phase transition of clustering​ ​performance based on the notion of angle separability, and we characterize three​ ​signal-to-noise regimes corresponding to different statistical-computational behaviors.​ ​In particular, we demonstrate that an intrinsic statistical-to-computational gap emerges​ ​only for tensors of order three or greater.​ ​Further, we develop an efficient polynomial time algorithm that provably achieves exact​ ​clustering under mild signal conditions. The​ ​efficacy of our procedure is demonstrated​ ​through both simulations and analyses of​ ​Peru Legislation dataset. 
    more » « less
  5. Many networked systems such as electric networks, the brain, and social networks of opinion dynamics are known to obey conservation laws. Examples of this phenomenon include the Kirchoff laws in electric networks and opinion consensus in social networks. Conservation laws in networked systems are modeled as balance equations of the form X=B*Y, where the sparsity pattern of B*∈R^{p×p} captures the connectivity of the network on p nodes, and Y, X ∈ R^p are vectors of ''potentials'' and ''injected flows'' at the nodes respectively. The node potentials Y cause flows across edges which aim to balance out the potential difference, and the flows X injected at the nodes are extraneous to the network dynamics. In several practical systems, the network structure is often unknown and needs to be estimated from data to facilitate modeling, management, and control. To this end, one has access to samples of the node potentials Y, but only the statistics of the node injections X. Motivated by this important problem, we study the estimation of the sparsity structure of the matrix B* from n samples of Y under the assumption that the node injections X follow a Gaussian distribution with a known covariance Σ_X. We propose a new ℓ1-regularized maximum likelihood estimator for tackling this problem in the high-dimensional regime where the size of the network may be vastly larger than the number of samples n. We show that this optimization problem is convex in the objective and admits a unique solution. Under a new mutual incoherence condition, we establish sufficient conditions on the triple (n,p,d) for which exact sparsity recovery of B* is possible with high probability; d is the degree of the underlying graph. We also establish guarantees for the recovery of B* in the element-wise maximum, Frobenius, and operator norms. Finally, we complement these theoretical results with experimental validation of the performance of the proposed estimator on synthetic and real-world data. 
    more » « less