Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for timeseries data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear StateSpace Layer (LSSL) maps a sequence u↦y by simply simulating a linear continuoustime statespace representation ˙x=Ax+Bu,y=Cx+Du. Theoretically, we show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths. For example, they generalize convolutions to continuoustime, explain common RNN heuristics, and share features of NDEs such as timescale adaptation. We then incorporate and generalize recent theory on continuoustime memorization to introduce a trainable subset of structured matrices A that endow LSSLs with longrange memory. Empirically, stacking LSSL layers into a simple deep neural network obtains stateoftheart results across time series benchmarks for long dependencies in sequential image classification, realworld healthcare regression tasks, and speech. On a difficult speech classification task with length16000 sequences, LSSL outperforms prior approaches by 24 accuracy points, and even outperforms baselines that use handcrafted features on 100x shorter sequences.
Trainable Time Warping: Aligning Timeseries in the Continuoustime Domain
DTW calculates the similarity or alignment between two signals, subject to temporal warping. However, its computational complexity grows exponentially with the number of timeseries. Although there have been algorithms developed that are linear in the number of timeseries, they are generally quadratic in timeseries length. The exception is generalized time warping (GTW), which has linear computational cost. Yet, it can only identify simple time warping functions. There is a need for a new fast, highquality multisequence alignment algorithm. We introduce trainable time warping (TTW), whose complexity is linear in both the number and the length of timeseries. TTW performs alignment in the continuoustime domain using a sinc convolutional kernel and a gradientbased optimization technique. We compare TTW and GTW on S5 UCR datasets in timeseries averaging and classification. TTW outperforms GTW on 67.1% of the datasets for the averaging tasks, and 61.2% of the datasets for the classification tasks.
 Award ID(s):
 1651740
 Publication Date:
 NSFPAR ID:
 10125069
 Journal Name:
 International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
 Page Range or eLocationID:
 3502 to 3506
 Sponsoring Org:
 National Science Foundation
More Like this


Network embedding has become the cornerstone of a variety of mining tasks, such as classification, link prediction, clustering, anomaly detection and many more, thanks to its superior ability to encode the intrinsic network characteristics in a compact lowdimensional space. Most of the existing methods focus on a single network and/or a single resolution, which generate embeddings of different network objects (node/subgraph/network) from different networks separately. A fundamental limitation with such methods is that the intrinsic relationship across different networks (e.g., two networks share same or similar subgraphs) and that across different resolutions (e.g., the nodesubgraph membership) are ignored, resulting in disparate embeddings. Consequentially, it leads to suboptimal performance or even becomes inapplicable for some downstream mining tasks (e.g., role classification, network alignment. etc.). In this paper, we propose a unified framework MrMine to learn the representations of objects from multiple networks at three complementary resolutions (i.e., network, subgraph and node) simultaneously. The key idea is to construct the crossresolution crossnetwork context for each object. The proposed method bears two distinctive features. First, it enables and/or boosts various multinetwork downstream mining tasks by having embeddings at different resolutions from different networks in the same embedding space. Second, Our method is efficientmore »

Many real time series datasets exhibit structural changes over time. A popular model for capturing their temporal dependence is that of vector autoregressions (VAR), which can accommodate structural changes through time evolving transition matrices. The problem then becomes to both estimate the (unknown) number of structural break points, together with the VAR model parameters. An additional challenge emerges in the presence of very large datasets, namely on how to accomplish these two objectives in a computational efficient manner. In this article, we propose a novel procedure which leverages a block segmentation scheme (BSS) that reduces the number of model parameters to be estimated through a regularized leastsquare criterion. Specifically, BSS examines appropriately defined blocks of the available data, which when combined with a fused lassobased estimation criterion, leads to significant computational gains without compromising on the statistical accuracy in identifying the number and location of the structural breaks. This procedure is further coupled with new local and exhaustive search steps to consistently estimate the number and relative location of the break points. The procedure is scalable to big highdimensional time series datasets with a computational complexity that can achieve O(n), where n is the length of the time series (samplemore »

The Sylvester equation offers a powerful and unifying primitive for a variety of important graph mining tasks, including network alignment, graph kernel, node similarity, subgraph matching, etc. A major bottleneck of Sylvester equation lies in its high computational complexity. Despite tremendous effort, stateoftheart methods still require a complexity that is at least \em quadratic in the number of nodes of graphs, even with approximations. In this paper, we propose a family of Krylov subspace based algorithms (\fasten) to speed up and scale up the computation of Sylvester equation for graph mining. The key idea of the proposed methods is to project the original equivalent linear system onto a Kronecker Krylov subspace. We further exploit (1) the implicit representation of the solution matrix as well as the associated computation, and (2) the decomposition of the original Sylvester equation into a set of intercorrelated Sylvester equations of smaller size. The proposed algorithms bear two distinctive features. First, they provide the \em exact solutions without any approximation error. Second, they significantly reduce the time and space complexity for solving Sylvester equation, with two of the proposed algorithms having a \em linear complexity in both time and space. Experimental evaluations on a diverse setmore »

Counting homomorphisms of a constant sized pattern graph H in an input graph G is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input G and the pattern H. Given the significance of this problem and the large sizes of modern inputs, we investigate when nearlinear time algorithms are possible. We focus on the case when the input graph has bounded degeneracy, a commonly studied and practically relevant class for homomorphism counting. It is known from previous work that for certain classes of H, Hhomomorphisms can be counted exactly in nearlinear time in bounded degeneracy graphs. Can we precisely characterize the patterns H for which nearlinear time algorithms are possible? We completely resolve this problem, discovering a clean dichotomy using finegrained complexity. Let m denote the number of edges in G. We prove the following: if the largest induced cycle in H has length at most 5, then there is an O(m log m) algorithm for counting Hhomomorphisms in bounded degeneracy graphs. If the largest induced cycle in H has length at least 6, then (assuming standard finegrained complexity conjectures) there is a constant γ > 0,more »