Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

A major bottleneck of the current Machine Learning (ML) workflow is the time consuming, error prone engineering required to get data from a datastore or a database (DB) to the point an ML algorithm can be applied to it. This is further exacerbated since ML algorithms are now trained on large volumes of data, yet we need predictions in realtime, especially in a variety of timeseries applications such as finance and realtime control systems. Hence, we explore the feasibility of directly integrating prediction functionality on top of a data store or DB. Such a system ideally: (i) provides an intuitive prediction query interface which alleviates the unwieldy data engineering; (ii) provides stateoftheart statistical accuracy while ensuring incremental model update, low model training time and low latency for making predictions. As the main contribution we explicitly instantiate a proofofconcept, tspDB which directly integrates with PostgreSQL. We rigorously test tspDB’s statistical and computational performance against the stateoftheart time series algorithms, including a LongShortTermMemory (LSTM) neural network and DeepAR (industry standard deep learning library by Amazon). Statistically, on standard time series benchmarks, tspDB outperforms LSTM and DeepAR with 1.11.3x higher relative accuracy. Computationally, tspDB is 5962x and 9495x faster compared to LSTM andmore »

We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and denoise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) lowrank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with nonoverlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la amore »

The property of (quasi)reversibility of Markov chains have led to elegant characterization of steadystate distribution for complex queueing networks, e.g. celebrated Jackson networks, BCMP (Baskett, Chandi, Muntz, Palacois) and Kelly theorem. In a nutshell, despite the complicated interaction, in the steadystate, the queues in such networks exhibit independence and subsequently lead to explicit calculations of distributional properties of the queuing network that may seem impossible at the outset. The model of stochastic processing network (cf. Harrison 2000) captures variety of dynamic resource allocation problems including the flowlevel networks used for modeling bandwidth sharing in the Internet, switched networks (cf. Shah, Wischik 2006) for modeling packet scheduling in the Internet router and wireless medium access, and hybrid flowpacket networks for modeling jobandpacket level scheduling in data centers. Unlike before, an appropriate resource allocation or scheduling policy is required in such networks to achieve good performance. Given the complexity, asymptotic analytic approaches such as fluid model or LyapunovFoster criteria to establish positiverecurrence and heavy traffic or diffusion approximation to characterize the scaled steadystate distribution became method of choice. A remarkable progress has been made along these lines over the past few decades, but there is a need for much more to matchmore »

We consider modelfree reinforcement learning for infinitehorizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor QLearning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a ddimensional state space and the discounted factor in (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an "accurate estimate of the optimal Qfunction nearly optimal sample complexity.