skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Testing for the Markov property in time series via deep conditional generative learning
Abstract The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the order of the Markov model. We show that the test controls the type-I error asymptotically, and has the power approaching one. Our proposal makes novel contributions in several ways. We utilise and extend state-of-the-art deep generative learning to estimate the conditional density functions, and establish a sharp upper bound on the approximation error of the estimators. We derive a doubly robust test statistic, which employs a nonparametric estimation but achieves a parametric convergence rate. We further adopt sample splitting and cross-fitting to minimise the conditions required to ensure the consistency of the test. We demonstrate the efficacy of the test through both simulations and the three data applications.  more » « less
Award ID(s):
2102227
PAR ID:
10425384
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
85
Issue:
4
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 1204-1222
Size(s):
p. 1204-1222
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Mobile sensing and information technology have enabled us to collect a large amount of mobility data from human decision-makers, for example, GPS trajectories from taxis, Uber cars, and passenger trip data of taking buses and trains. Understanding and learning human decision-making strategies from such data can potentially promote individual's well-being and improve the transportation service quality. Existing works on human strategy learning, such as inverse reinforcement learning, all model the decision-making process as a Markov decision process, thus assuming the Markov property. In this work, we show that such Markov property does not hold in real-world human decision-making processes. To tackle this challenge, we develop a Trajectory Generative Adversarial Imitation Learning (TrajGAIL) framework. It captures the long-term decision dependency by modeling the human decision processes as variable length Markov decision processes (VLMDPs), and designs a deep-neural-network-based framework to inversely learn the decision-making strategy from the human agent's historical dataset. We validate our framework using two real world human-generated spatial-temporal datasets including taxi driver passenger-seeking decision data and public transit trip data. Results demonstrate significant accuracy improvement in learning human decision-making strategies, when comparing to baselines with Markov property assumptions. 
    more » « less
  2. In this Letter, an unsupervised-learning platform—generative adversarial network (GAN)—is proposed for experimental data augmentation in a deep-learning assisted photonic-based instantaneous microwave frequency measurement (IFM) system. Only 75 sets of experimental data are required and the GAN can augment the small amount of data into 5000 sets of data for training the deep learning model. Furthermore, frequency measurement error of the estimated frequency has improved by an order of magnitude from 50 MHz to 5 MHz. The proposed use of GAN effectively reduces the amount of experimental data needed by 98.75% and reduces measurement error by 10 times. 
    more » « less
  3. Summary Nonparametric covariate adjustment is considered for log-rank-type tests of the treatment effect with right-censored time-to-event data from clinical trials applying covariate-adaptive randomization. Our proposed covariate-adjusted log-rank test has a simple explicit formula and a guaranteed efficiency gain over the unadjusted test. We also show that our proposed test achieves universal applicability in the sense that the same formula of test can be universally applied to simple randomization and all commonly used covariate-adaptive randomization schemes such as the stratified permuted block and the Pocock–Simon minimization, which is not a property enjoyed by the unadjusted log-rank test. Our method is supported by novel asymptotic theory and empirical results for Type-I error and power of tests. 
    more » « less
  4. Abstract Simulating DNA breathing dynamics, for instance Extended Peyrard-Bishop-Dauxois (EPBD) model, across the entire human genome using traditional biophysical methods like pyDNA-EPBD is computationally prohibitive due to intensive techniques such as Markov Chain Monte Carlo (MCMC) and Langevin dynamics. To overcome this limitation, we propose a deep surrogate generative model utilizing a conditional Denoising Diffusion Probabilistic Model (DDPM) trained on DNA sequence-EPBD feature pairs. This surrogate model efficiently generates high-fidelity DNA breathing features conditioned on DNA sequences, reducing computational time from months to hours–a speedup of over 1000 times. By integrating these features into the EPBDxDNABERT-2 model, we enhance the accuracy of transcription factor (TF) binding site predictions. Experiments demonstrate that the surrogate-generated features perform comparably to those obtained from the original EPBD framework, validating the model’s efficacy and fidelity. This advancement enables real-time, genome-wide analyses, significantly accelerating genomic research and offering powerful tools for disease understanding and therapeutic development. 
    more » « less
  5. Expectile is a generalization of the expected value in probability and statistics. In finance and risk management, the expectile is considered to be an important risk measure due to its connection with gain–loss ratio and its coherent and elicitable properties. Linear multiple expectile regression was proposed in 1987 for estimating the conditional expectiles of a response given a set of covariates. Recently, more flexible nonparametric expectile regression models were proposed based on gradient boosting and kernel learning. In this paper, we propose a new nonparametric expectile regression model by adopting the deep residual network learning framework and name itExpectile NN. Extensive numerical studies on simulated and real datasets demonstrate that Expectile NN has very competitive performance compared with existing methods. We explicitly specify the architecture of Expectile NN so that it is easy to be reproduced and used by others. Expectile NN is the first deep learning model for nonparametric expectile regression. 
    more » « less