The graphon (W-graph), including the stochastic block model as a special case, has been widely used in modeling and analyzing network data. Estimation of the graphon function has gained a lot of recent research interests. Most existing works focus on inference in the latent space of the model, while adopting simple maximum likelihood or Bayesian estimates for the graphon or connectivity parameters given the identified latent variables. In this work, we propose a hierarchical model and develop a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function. Based on our hierarchical model, we further introduce a new model selection criterion for choosing the number of communities. Numerical results on extensive simulations and two well-annotated social networks demonstrate the superiority of our approach in terms of parameter estimation and model selection.
more »
« less
Regularized estimation and testing for high-dimensional multi-block vector-autoregressive models.
Dynamical systems comprising of multiple components that can be partitioned into distinct blocks originate in many scientific areas. A pertinent example is the interactions between financial assets and selected macroeconomic indicators, which has been studied at aggregate level—e.g. a stock index and an employment index—extensively in the macroeconomics literature. A key shortcoming of this approach is that it ignores potential influences from other related components (e.g. Gross Domestic Product) that may impact the system’s dynamics and structure and thus produces incorrect results. To mitigate this issue, we consider a multi-block linear dynamical system with Granger-causal ordering between blocks, wherein the blocks’ temporal dynamics are described by vector autoregressive processes and are influenced by blocks higher in the system hierarchy. We derive the maximum likelihood estimator for the posited model for Gaussian data in the high-dimensional setting based on appropriate regularization schemes for the parameters of the block components. To optimize the underlying non-convex likelihood function, we develop an iterative algorithm with convergence guarantees. We establish theoretical properties of the maximum likelihood estimates, leveraging the decomposability of the regularizers and a careful analysis of the iterates. Finally, we develop testing procedures for the null hypothesis of whether a block “Granger-causes” another block of variables. The performance of the model and the testing procedures are evaluated on synthetic data, and illustrated on a data set involving log-returns of the US S&P100 component stocks and key macroeconomic variables for the 2001–16 period.
more »
« less
- Award ID(s):
- 1632730
- PAR ID:
- 10074334
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 18
- ISSN:
- 1533-7928
- Page Range / eLocation ID:
- 1-49
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Data-driven analysis and monitoring of complex dynamical systems have been gaining popularity due to various reasons like ubiquitous sensing and advanced computation capabilities. A key rationale is that such systems inherently have high dimensionality and feature complex subsystem interactions due to which majority of the first-principle based methods become insufficient. We explore the family of a recently proposed probabilistic graphical modeling technique, called spatiotemporal pattern network (STPN) in order to capture the Granger causal relationships among observations in a dynamical system. We also show that this technique can be used for anomaly detection and root-cause analysis for real-life dynamical systems. In this context, we introduce the notion of Granger-STPN (G-STPN) inspired by the notion of Granger causality and introduce a new nonparametric technique to detect causality among dynamical systems observations. We experimentally validate our framework for detecting anomalies and analyzing root causes in a robotic arm platform and obtain superior results compared to when other causality metrics were used in previous frameworks.more » « less
-
Ottawa F-65 sand (supplied by US Silica, Ottawa, Illinois) was selected as the standard sand for LEAP-UCD-2017. Between December 2017 and February 2018, each LEAP research team sent 500 g samples of sand to UC Davis for grain size analysis and minimum and maximum dry density testing. The purpose of this testing was to confirm the consistency of the sand used at various test sites and to provide updated minimum and maximum density index values. The variation of measured properties among the different samples is similar to the variation measured during repeat testing of the same sample. Modified LEAP procedures to measure index densities are used to confirm consistency of the sands, and the results from these procedures are compared to results from ASTM procedures. The LEAP procedures give repeatable results with median index densities of ρmin ¼ 1457 kg/m3, ρmax ¼ 1754 kg/m3. Relative densities calculated with facility-specific index densities varied by less than 4%, so we conclude that average index densities from all the sites may be used for analysis of the results. The LEAP procedures are easier to perform than the ASTM procedures and do not require specialized equipment; therefore, continued use of the LEAP procedure for frequent quality control purposes is recommended. However, the values from ASTM procedures are expected to be more consistent with values adopted in liquefaction literature in the past; therefore, we recommend using the median ASTM values for analysis of LEAP data. Index densities from ASTM procedures (ρmin ¼ 1490.5 kg/m3, ρmax ¼ 1757.0 kg/m3) produce relative densities that are 4 –10% smaller than the index densities from the LEAP procedures.more » « less
-
Testing for Granger causality relies on estimating the capacity of dynamics in one time series to forecast dynamics in another. The canonical test for such temporal predictive causality is based on fitting multivariate time series models and is cast in the classical null hypothesis testing framework. In this framework, we are limited to rejecting the null hypothesis or failing to reject the null -- we can never validly accept the null hypothesis of no Granger causality. This is poorly suited for many common purposes, including evidence integration, feature selection, and other cases where it is useful to express evidence against, rather than for, the existence of an association. Here we derive and implement the Bayes factor for Granger causality in a multilevel modeling framework. This Bayes factor summarizes information in the data in terms of a continuously scaled evidence ratio between the presence of Granger causality and its absence. We also introduce this procedure for the multilevel generalization of Granger causality testing. This facilitates inference when information is scarce or noisy or if we are interested primarily in population-level trends. We illustrate our approach with an application on exploring causal relationships in affect using a daily life study.more » « less
-
null (Ed.)We prove the Marchenko–Pastur law for the eigenvalues of [Formula: see text] sample covariance matrices in two new situations where the data does not have independent coordinates. In the first scenario — the block-independent model — the [Formula: see text] coordinates of the data are partitioned into blocks in such a way that the entries in different blocks are independent, but the entries from the same block may be dependent. In the second scenario — the random tensor model — the data is the homogeneous random tensor of order [Formula: see text], i.e. the coordinates of the data are all [Formula: see text] different products of [Formula: see text] variables chosen from a set of [Formula: see text] independent random variables. We show that Marchenko–Pastur law holds for the block-independent model as long as the size of the largest block is [Formula: see text], and for the random tensor model as long as [Formula: see text]. Our main technical tools are new concentration inequalities for quadratic forms in random variables with block-independent coordinates, and for random tensors.more » « less
An official website of the United States government

