skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Fast and Accurate Spreading Process Temporal Scale Estimation
Spreading processes on graphs arise in a host of application domains, from the study of online social networks to viral marketing to epidemiology. Various discrete-time probabilistic models for spreading processes have been proposed. These are used for downstream statistical estimation and prediction problems, often involving messages or other information that is transmitted along with infections caused by the process. These models generally model cascade behavior at a small time scale but are insufficiently flexible to model cascades that exhibit intermittent behavior governed by multiple scales. We argue that the presence of such time scales that are unaccounted for by a cascade model can result in degradation of performance of models on downstream statistical and time-sensitive optimization tasks. To address these issues, we formulate a model that incorporates multiple temporal scales of cascade behavior. This model is parameterized by a \emph{clock}, which encodes the times at which sessions of cascade activity start. These sessions are themselves governed by a small-scale cascade model, such as the discretized independent cascade (IC) model. Estimation of the multiscale cascade model parameters leads to the problem of \emph{clock estimation} in terms of a natural distortion measure that we formulate. Our framework is inspired by the optimization problem posed by DiTursi et al, 2017, which can be seen as providing one possible estimator (a maximum-proxy-likelihood estimator) for the parameters of our generative model. We give a clock estimation algorithm, which we call FastClock, that runs in linear time in the size of its input and is provably statistically accurate for a broad range of model parameters when cascades are generated from any spreading process model with well-concentrated session infection set sizes and when the underlying graph is at least in the semi-sparse regime. We exemplify our algorithm for the case where the small-scale model is the discretized independent cascade process and extend substantially to processes whose infection set sizes satisfy a general martingale difference property. We further evaluate the performance of FastClock empirically in comparison to the state of the art estimator from DiTursi et al, 2017. We find that in a broad parameter range on synthetic networks and on a real network, our algorithm substantially outperforms that algorithm in terms of both running time and accuracy. In all cases, our algorithm's running time is asymptotically lower than that of the baseline.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Massoulie, Laurent
Date Published:
Journal Name:
Transactions on machine learning research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Much of our conceptual understanding of midlatitude atmospheric motion comes from two-layer quasigeostrophic (QG) models. Traditionally, these QG models do not include moisture, which accounts for an estimated 30%–60% of the available energy of the atmosphere. The atmospheric moisture content is expected to increase under global warming, and therefore, a theory for how moisture modifies atmospheric dynamics is crucial. We use a two-layer moist QG model with convective adjustment as a basis for analyzing how latent heat release and large-scale moisture gradients impact the scalings of a midlatitude system at the synoptic scale. In this model, the degree of saturation can be tuned independently of other moist parameters by enforcing a high rate of evaporation from the surface. This allows for study of the effects of latent heat release at saturation, without the intrinsic nonlinearity of precipitation. At saturation, this system is equivalent to the dry QG model under a rescaling of both length and time. This predicts that the most unstable mode shifts to smaller scales, the growth rates increase, and the inverse cascade extends to larger scales. We verify these results numerically and use them to verify a framework for the complete energetics of a moist system. We examine the spectral features of the energy transfer terms. This analysis shows that precipitation generates energy at small scales, while dry dynamics drive a significant broadening to larger scales. Cascades of energy are still observed in all terms, albeit without a clearly defined inertial range. Significance Statement The effect of moist processes, especially the impact of latent heating associated with condensation, on the size and strength of midlatitude storms is not well understood. Such insight is particularly needed in the context of global warming, as we expect moisture to play a more important role in a warmer world. In this study, we provide intuition into how including condensation can result in midlatitude storms that grow faster and have features on both larger and smaller scales than their dry counterparts. We provide a framework for quantifying these changes and verify it for the special case where it is raining everywhere. These findings can be extended to the more realistic situation where it is only raining locally. 
    more » « less
  2. Woodruff, David P. (Ed.)
    Matroids are a fundamental object of study in combinatorial optimization. Three closely related and important problems involving matroids are maximizing the size of the union of $k$ independent sets (that is, \emph{$k$-fold matroid union}), computing $k$ disjoint bases (a.k.a.\ \emph{matroid base packing}), and covering the elements by $k$ bases (a.k.a.\ \emph{matroid base covering}). These problems generalize naturally to integral and real-valued capacities on the elements. This work develops faster exact and/or approximation problems for these and some other closely related problems such as optimal reinforcement and matroid membership. We obtain improved running times both for general matroids in the independence oracle model and for the graphic matroid. The main thrust of our improvements comes from developing a faster and unifying \emph{push-relabel} algorithm for the integer-capacitated versions of these problems, building on previous work by [FM12]. We then build on this algorithm in two directions. First we develop a faster augmenting path subroutine for $k$-fold matroid union that, when appended to an approximation version of the push-relabel algorithm, gives a faster exact algorithm for some parameters of $k$. In particular we obtain a subquadratic-query running time in the uncapacitated setting for the three basic problems listed above. We also obtain faster approximation algorithms for these problems with real-valued capacities by reducing to small integral capacities via randomized rounding. To this end, we develop a new randomized rounding technique for base covering problems in matroids that may also be of independent interest. 
    more » « less
  3. We consider the problem of in-hand dexterous manipulation with a focus on unknown or uncertain hand–object parameters, such as hand configuration, object pose within hand, and contact positions. In particular, in this work we formulate a generic framework for hand–object configuration estimation using underactuated hands as an example. Owing to the passive reconfigurability and the lack of encoders in the hand’s joints, it is challenging to estimate, plan, and actively control underactuated manipulation. By modeling the grasp constraints, we present a particle filter-based framework to estimate the hand configuration. Specifically, given an arbitrary grasp, we start by sampling a set of hand configuration hypotheses and then randomly manipulate the object within the hand. While observing the object’s movements as evidence using an external camera, which is not necessarily calibrated with the hand frame, our estimator calculates the likelihood of each hypothesis to iteratively estimate the hand configuration. Once converged, the estimator is used to track the hand configuration in real time for future manipulations. Thereafter, we develop an algorithm to precisely plan and control the underactuated manipulation to move the grasped object to desired poses. In contrast to most other dexterous manipulation approaches, our framework does not require any tactile sensing or joint encoders, and can directly operate on any novel objects, without requiring a model of the object a priori. We implemented our framework on both the Yale Model O hand and the Yale T42 hand. The results show that the estimation is accurate for different objects, and that the framework can be easily adapted across different underactuated hand models. In the end, we evaluated our planning and control algorithm with handwriting tasks, and demonstrated the effectiveness of the proposed framework. 
    more » « less
  4. A newly developed three-dimensional electrostatic fluid model solving continuity and current closure equations aims to study phenomena that generate ionospheric turbulence. The model is spatially discretized using a pseudo-spectral method with full Fourier basis functions and evolved in time using a four-stage, fourth-order Runge Kutta method. The 3D numerical model is used here to investigate the behavior and evolution of ionospheric plasma clouds. This problem has historically been used to study the processes governing the evolution of the irregularities in the F region of the ionosphere. It has been shown that these artificial clouds can become unstable and structure rapidly (i.e., cascade to smaller scales transverse to the ambient magnetic field). The primary mechanism which causes this structuring of ionospheric clouds is the E×B, or the gradient drift instability (GDI). The persistence and scale sizes of the resulting structures cannot be fully explained by a two-dimensional model. Therefore, we suggest here that the inclusion of three-dimensional effects is key to a successful interpretation of mid-latitude irregularities, as well as a prerequisite for a credible simulation of these processes. We investigate the results of 2D and 3D nonlinear simulations of the GDI and secondary Kelvin–Helmholtz instability (KHI) in plasma clouds for three different regimes: highly collisional (≈200 km), collisional (≈300 km), and inertial (≈450 km). The inclusion of inertial effects permits the growth of the secondary KHI. For the three different regimes, the overall evolution of structuring of plasma cloud occurs on longer timescales in 3D simulations. The inclusion of three-dimensional effects, in particular, the ambipolar potential in the current closure equation, introduces an azimuthal “twist“ about the axis of the cloud (i.e., the magnetic field B). This azimuthal “twist” is observed in the purely collisional regime, and it causes the perturbations to have a non-flute-like character (k‖≠0). However, for the 3D inertial simulations, the cloud rapidly diffuses to a state in which the sheared azimuthal flow is substantially reduced; subsequently, the cloud becomes unstable and structures, by retaining the flute-like character of the perturbations (k‖=0).

    more » « less
  5. Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from various sources. WS is theoretically well-understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for\emph {structured prediction}, where the output space consists of more than a binary or multi-class label set: eg rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. 
    more » « less