skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Network global testing by counting graphlets
Consider a large social network with possibly severe degree heterogeneity and mixed-memberships. We are interested in testing whether the network has only one community or there are more than one communities. The problem is known to be non-trivial, partially due to the presence of severe degree heterogeneity. We construct a class of test statistics using the numbers of short paths and short cycles, and the key to our approach is a general framework for canceling the effects of degree heterogeneity. The tests compare favorably with existing methods. We support our methods with careful analysis and numerical study with simulated data and a real data example.  more » « less
Award ID(s):
1925845 1712958
PAR ID:
10289815
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
80
ISSN:
2640-3498
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE (Ann. Statist. 43, 57–89, 2015) is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study. 
    more » « less
  2. SCORE was introduced as a spectral approach to network community detection. Since many networks have severe degree heterogeneity, the ordinary spectral clustering (OSC) approach to community detection may perform unsatisfactorily. SCORE alleviates the effect of degree heterogeneity by introducing a new normalization idea in the spectral domain and makes OSC more effective. SCORE is easy to use and computationally fast. It adapts easily to new directions and sees an increasing interest in practice. In this paper, we review the basics of SCORE, the adaption of SCORE to network mixed membership estimation and topic modeling, and the application of SCORE in real data, including two datasets on the publications of statisticians. We also review the theoretical “ideology” underlying SCORE. We show that in the spectral domain, SCORE converts a simplicial cone to a simplex and provides a simple and direct link between the simplex and network memberships. SCORE attains an exponential rate and a sharp phase transition in community detection, and achieves optimal rates in mixed membership estimation and topic modeling. 
    more » « less
  3. Kovács, Ákos T. (Ed.)
    ABSTRACT In Bacillus subtilis , master regulator Spo0A controls several cell-differentiation pathways. Under moderate starvation, phosphorylated Spo0A (Spo0A~P) induces biofilm formation by indirectly activating genes controlling matrix production in a subpopulation of cells via an SinI-SinR-SlrR network. Under severe starvation, Spo0A~P induces sporulation by directly and indirectly regulating sporulation gene expression. However, what determines the heterogeneity of individual cell fates is not fully understood. In particular, it is still unclear why, despite being controlled by a single master regulator, biofilm matrix production and sporulation seem mutually exclusive on a single-cell level. In this work, with mathematical modeling, we showed that the fluctuations in the growth rate and the intrinsic noise amplified by the bistability in the SinI-SinR-SlrR network could explain the single-cell distribution of matrix production. Moreover, we predicted an incoherent feed-forward loop; the decrease in the cellular growth rate first activates matrix production by increasing in Spo0A phosphorylation level but then represses it via changing the relative concentrations of SinR and SlrR. Experimental data provide evidence to support model predictions. In particular, we demonstrate how the degree to which matrix production and sporulation appear mutually exclusive is affected by genetic perturbations. IMPORTANCE The mechanisms of cell-fate decisions are fundamental to our understanding of multicellular organisms and bacterial communities. However, even for the best-studied model systems we still lack a complete picture of how phenotypic heterogeneity of genetically identical cells is controlled. Here, using B. subtilis as a model system, we employ a combination of mathematical modeling and experiments to explain the population-level dynamics and single-cell level heterogeneity of matrix gene expression. The results demonstrate how the two cell fates, biofilm matrix production and sporulation, can appear mutually exclusive without explicitly inhibiting one another. Such a mechanism could be used in a wide range of other biological systems. 
    more » « less
  4. Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-of-the-art FL methods across a variety of settings while minimizing computation and memory overhead. Code is available at https://github.com/mmendiet/FedAlign. 
    more » « less
  5. Abstract This article uses administrative tax data to estimate top wealth in the United States. We assemble new data that link people to their sources of capital income and develop new methods to estimate the degree of return heterogeneity within asset classes. Disaggregated fixed-income data reveal that rich individuals earn much more of their interest income in higher-yielding forms and have much greater exposure to credit risk. Consequently, in recent years, the interest rate on fixed income at the top is approximately 3.5 times higher than the average. We value the population of U.S. firms using firm-level characteristics and apportion this wealth using firm-owner links. We combine this new data on fixed income and pass-through business returns with refined estimates of C-corporation equity, housing, and pension wealth to deliver new capitalized wealth estimates that build upon the methods of Saez and Zucman (2016a). From 1989 to 2016, the top 1%, 0.1%, and 0.01% wealth shares increased by 6.6, 4.6, and 2.9 percentage points, respectively, to 33.7%, 15.7%, and 7.1%. Overall, although we estimate a large degree of return heterogeneity, accounting for this heterogeneity does not change the fundamental story for top wealth shares and their growth—wealth inequality is high and has risen substantially over recent decades. 
    more » « less