skip to main content


Title: Harvesting random embedding for high-frequency change-point detection in temporal complex systems
Abstract

Recent investigations have revealed that dynamics of complex networks and systems are crucially dependent on the temporal structures. Accurate detection of the time instant at which a system changes its internal structures has become a tremendously significant mission, beneficial to fully understanding the underlying mechanisms of evolving systems, and adequately modeling and predicting the dynamics of the systems as well. In real-world applications, due to a lack of prior knowledge on the explicit equations of evolving systems, an open challenge is how to develop a practical and model-free method to achieve the mission based merely on the time-series data recorded from real-world systems. Here, we develop such a model-free approach, named temporal change-point detection (TCD), and integrate both dynamical and statistical methods to address this important challenge in a novel way. The proposed TCD approach, basing on exploitation of spatial information of the observed time series of high dimensions, is able not only to detect the separate change points of the concerned systems without knowing, a priori, any information of the equations of the systems, but also to harvest all the change points emergent in a relatively high-frequency manner, which cannot be directly achieved by using the existing methods and techniques. Practical effectiveness is comprehensively demonstrated using the data from the representative complex dynamics and real-world systems from biology to geology and even to social science.

 
more » « less
NSF-PAR ID:
10368382
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
National Science Review
Volume:
9
Issue:
4
ISSN:
2095-5138
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Hydropower is the largest renewable energy source for electricity generation in the world, with numerous benefits in terms of: environment protection (near-zero air pollution and climate impact), cost-effectiveness (long-term use, without significant impacts of market fluctuation), and reliability (quickly respond to surge in demand). However, the effectiveness of hydropower plants is affected by multiple factors such as reservoir capacity, rainfall, temperature and fluctuating electricity demand, and particularly their complicated relationships, which make the prediction/recommendation of station operational output a difficult challenge. In this paper, we present DeepHydro, a novel stochastic method for modeling multivariate time series (e.g., water inflow/outflow and temperature) and forecasting power generation of hydropower stations. DeepHydro captures temporal dependencies in co-evolving time series with a new conditioned latent recurrent neural networks, which not only considers the hidden states of observations but also preserves the uncertainty of latent variables. We introduce a generative network parameterized on a continuous normalizing flow to approximate the complex posterior distribution of multivariate time series data, and further use neural ordinary differential equations to estimate the continuous-time dynamics of the latent variables constituting the observable data. This allows our model to deal with the discrete observations in the context of continuous dynamic systems, while being robust to the noise. We conduct extensive experiments on real-world datasets from a large power generation company consisting of cascade hydropower stations. The experimental results demonstrate that the proposed method can effectively predict the power production and significantly outperform the possible candidate baseline approaches. 
    more » « less
  2. Abstract

    Transcriptome studies that provide temporal information about transcript abundance facilitate identification of gene regulatory networks (GRNs). Inferring GRNs from time series data using computational modeling remains a central challenge in systems biology. Commonly employed clustering algorithms identify modules of like-responding genes but do not provide information on how these modules are interconnected. These methods also require users to specify parameters such as cluster number and size, adding complexity to the analysis. To address these challenges, we used a recently developed algorithm, partitioned local depth (PaLD), to generate cohesive networks for 4 time series transcriptome datasets (3 hormone and 1 abiotic stress dataset) from the model plant Arabidopsis thaliana. PaLD provided a cohesive network representation of the data, revealing networks with distinct structures and varying numbers of connections between transcripts. We utilized the networks to make predictions about GRNs by examining local neighborhoods of transcripts with highly similar temporal responses. We also partitioned the networks into groups of like-responding transcripts and identified enriched functional and regulatory features in them. Comparison of groups to clusters generated by commonly used approaches indicated that these methods identified modules of transcripts that have similar temporal and biological features, but also identified unique groups, suggesting that a PaLD-based approach (supplemented with a community detection algorithm) can complement existing methods. These results revealed that PaLD could sort like-responding transcripts into biologically meaningful neighborhoods and groups while requiring minimal user input and producing cohesive network structure, offering an additional tool to the systems biology community to predict GRNs.

     
    more » « less
  3. null (Ed.)
    A plethora of complex dynamical systems from disordered media to biological systems exhibit mathematical characteristics (e.g., long-range dependence, self-similar and power law magnitude increments) that are well-fitted by fractional partial differential equations (PDEs). For instance, some biological systems displaying an anomalous diffusion behavior, which is characterized by a non-linear mean-square displacement relation, can be mathematically described by fractional PDEs. In general, the PDEs represent various physical laws or rules governing complex dynamical systems. Since prior knowledge about the mathematical equations describing complex dynamical systems in biology, healthcare, disaster mitigation, transportation, or environmental sciences may not be available, we aim to provide algorithmic strategies to discover the integer or fractional PDEs and their parameters from system's evolution data. Toward deciphering non-trivial mechanisms driving a complex system, we propose a data-driven approach that estimates the parameters of a fractional PDE model. We study the space-time fractional diffusion model that describes a complex stochastic process, where the magnitude and the time increments are stable processes. Starting from limited time-series data recorded while the system is evolving, we develop a fractional-order moments-based approach to determine the parameters of a generalized fractional PDE. We formulate two optimization problems to allow us to estimate the arguments of the fractional PDE. Employing extensive simulation studies, we show that the proposed approach is effective at retrieving the relevant parameters of the space-time fractional PDE. The presented mathematical approach can be further enhanced and generalized to include additional operators that may help to identify the dominant rule governing the measurements or to determine the degree to which multiple physical laws contribute to the observed dynamics. 
    more » « less
  4. Abstract

    Marine megafauna are difficult to observe and count because many species travel widely and spend large amounts of time submerged. As such, management programmes seeking to conserve these species are often hampered by limited information about population levels.

    Unoccupied aircraft systems (UAS, aka drones) provide a potentially useful technique for assessing marine animal populations, but a central challenge lies in analysing the vast amounts of data generated in the images or video acquired during each flight. Neural networks are emerging as a powerful tool for automating object detection across data domains and can be applied to UAS imagery to generate new population‐level insights. To explore the utility of these emerging technologies in a challenging field setting, we used neural networks to enumerate olive ridley turtlesLepidochelys olivaceain drone images acquired during a mass‐nesting event on the coast of Ostional, Costa Rica.

    Results revealed substantial promise for this approach; specifically, our model detected 8% more turtles than manual counts while effectively reducing the manual validation burden from 2,971,554 to 44,822 image windows. Our detection pipeline was trained on a relatively small set of turtle examples (N = 944), implying that this method can be easily bootstrapped for other applications, and is practical with real‐world UAS datasets.

    Our findings highlight the feasibility of combining UAS and neural networks to estimate population levels of diverse marine animals and suggest that the automation inherent in these techniques will soon permit monitoring over spatial and temporal scales that would previously have been impractical.

     
    more » « less
  5. Knowledge discovery and information extraction of large and complex datasets has attracted great attention in wide-ranging areas from statistics and biology to medicine. Tools from machine learning, data mining, and neurocomputing have been extensively explored and utilized to accomplish such compelling data analytics tasks. However, for time-series data presenting active dynamic characteristics, many of the state-of-the-art techniques may not perform well in capturing the inherited temporal structures in these data. In this paper, integrating the Koopman operator and linear dynamical systems theory with support vector machines (SVMs), we develop a novel dynamic data mining framework to construct low-dimensional linear models that approximate the nonlinear flow of high-dimensional time-series data generated by unknown nonlinear dynamical systems. This framework then immediately enables pattern recognition, e.g., classification, of complex time-series data to distinguish their dynamic behaviors by using the trajectories generated by the reduced linear systems. Moreover, we demonstrate the applicability and efficiency of this framework through the problems of time-series classification in bioinformatics and healthcare, including cognitive classification and seizure detection with fMRI and EEG data, respectively. The developed Koopman dynamic learning framework then lays a solid foundation for effective dynamic data mining and promises a mathematically justified method for extracting the dynamics and significant temporal structures of nonlinear dynamical systems. 
    more » « less