Abstract Background Blood glucose (BG) management is crucial for type1 diabetes patients resulting in the necessity of reliable artificial pancreas or insulin infusion systems. In recent years, deep learning techniques have been utilized for a more accurate BG level prediction system. However, continuous glucose monitoring (CGM) readings are susceptible to sensor errors. As a result, inaccurate CGM readings would affect BG prediction and make it unreliable, even if the most optimal machine learning model is used. Methods In this work, we propose a novel approach to predicting blood glucose level with a stacked Long shortterm memory (LSTM) based deep recurrent neural network (RNN) model considering sensor fault. We use the Kalman smoothing technique for the correction of the inaccurate CGM readings due to sensor error. Results For the OhioT1DM (2018) dataset, containing eight weeks’ data from six different patients, we achieve an average RMSE of 6.45 and 17.24 mg/dl for 30 min and 60 min of prediction horizon (PH), respectively. Conclusions To the best of our knowledge, this is the leading average prediction accuracy for the ohioT1DM dataset. Different physiological information, e.g., Kalman smoothed CGM data, carbohydrates from the meal, bolus insulin, and cumulative step counts in a fixed time interval, are crafted to represent meaningful features used as input to the model. The goal of our approach is to lower the difference between the predicted CGM values and the fingerstick blood glucose readings—the ground truth. Our results indicate that the proposed approach is feasible for more reliable BG forecasting that might improve the performance of the artificial pancreas and insulin infusion system for T1D diabetes management.
more »
« less
A Function Approximation Approach to the Prediction of Blood Glucose Levels
The problem of real time prediction of blood glucose (BG) levels based on the readings from a continuous glucose monitoring (CGM) device is a problem of great importance in diabetes care, and therefore, has attracted a lot of research in recent years, especially based on machine learning. An accurate prediction with a 30, 60, or 90 min prediction horizon has the potential of saving millions of dollars in emergency care costs. In this paper, we treat the problem as one of function approximation, where the value of the BG level at time t + h (where h the prediction horizon) is considered to be an unknown function of d readings prior to the time t . This unknown function may be supported in particular on some unknown submanifold of the d dimensional Euclidean space. While manifold learning is classically done in a semisupervised setting, where the entire data has to be known in advance, we use recent ideas to achieve an accurate function approximation in a supervised setting; i.e., construct a model for the target function. We use the stateoftheart clinically relevant PREDEGA grid to evaluate our results, and demonstrate that for a real life dataset, our method performs better than a standard deep network, especially in hypoglycemic and hyperglycemic regimes. One noteworthy aspect of this work is that the training data and test data may come from different distributions.
more »
« less
 Award ID(s):
 2012355
 NSFPAR ID:
 10333981
 Date Published:
 Journal Name:
 Frontiers in Applied Mathematics and Statistics
 Volume:
 7
 ISSN:
 22974687
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


The practicality of reinforcement learning algorithms has been limited due to poor scaling with respect to the problem size, as the sample complexity of learning an εoptimal policy is Ω(SAH/ ε2) over worst case instances of an MDP with state space S, action space A, and horizon H. We consider a class of MDPs for which the associated optimal Q* function is low rank, where the latent features are unknown. While one would hope to achieve linear sample complexity in S and A due to the low rank structure, we show that without imposing further assumptions beyond low rank of Q*, if one is constrained to estimate the Q function using only observations from a subset of entries, there is a worst case instance in which one must incur a sample complexity exponential in the horizon H to learn a near optimal policy. We subsequently show that under stronger low rank structural assumptions, given access to a generative model, Low Rank Monte Carlo Policy Iteration (LRMCPI) and Low Rank Empirical Value Iteration (LREVI) achieve the desired sample complexity of Õ((S+A)poly (d,H)/ε2) for a rank d setting, which is minimax optimal with respect to the scaling of S, A, and ε. In contrast to literature on linear and lowrank MDPs, we do not require a known feature mapping, our algorithm is computationally simple, and our results hold for long time horizons. Our results provide insights on the minimal lowrank structural assumptions required on the MDP with respect to the transition kernel versus the optimal actionvalue function.more » « less

We study modelfree reinforcement learning (RL) algorithms for infinitehorizon averagereward Markov decision process (MDP), which is more appropriate for applications that involve continuing operations not divided into episodes. In contrast to episodic/discounted MDPs, theoretical understanding of modelfree RL algorithms is relatively inadequate for the averagereward setting. In this paper, we consider both the online setting and the setting with access to a simulator. We develop computationally efficient modelfree algorithms that achieve sharper guarantees on regret/sample complexity compared with existing results. In the online setting, we design an algorithm, UCBAVG, based on an optimistic variant of variancereduced Qlearning. We show that UCBAVG achieves a regret bound $\widetilde{O}(S^5A^2sp(h^*)\sqrt{T})$ after $T$ steps, where $S\times A$ is the size of stateaction space, and $sp(h^*)$ the span of the optimal bias function. Our result provides the first computationally efficient modelfree algorithm that achieves the optimal dependence in $T$ (up to log factors) for weakly communicating MDPs, which is necessary for low regret. In contrast, prior results either are suboptimal in $T$ or require strong assumptions of ergodicity or uniformly mixing of MDPs. In the simulator setting, we adapt the idea of UCBAVG to develop a modelfree algorithm that finds an $\epsilon$optimal policy with sample complexity $\widetilde{O}(SAsp^2(h^*)\epsilon^{2} + S^2Asp(h^*)\epsilon^{1}).$ This sample complexity is nearoptimal for weakly communicating MDPs, in view of the minimax lower bound $\Omega(SAsp(^*)\epsilon^{2})$. Existing work mainly focuses on ergodic MDPs and the results typically depend on $t_{mix},$ the worstcase mixing time induced by a policy. We remark that the diameter $D$ and mixing time $t_{mix}$ are both lower bounded by $sp(h^*)$, and $t_{mix}$ can be arbitrarily large for certain MDPs. On the technical side, our approach integrates two key ideas: learning an $\gamma$discounted MDP as an approximation, and leveraging referenceadvantage decomposition for variance in optimistic Qlearning. As recognized in prior work, a naive approximation by discounted MDPs results in suboptimal guarantees. A distinguishing feature of our method is maintaining estimates of valuedifference between state pairs to provide a sharper bound on the variance of reference advantage. We also crucially use a careful choice of the discounted factor $\gamma$ to balance approximation error due to discounting and the statistical learning error, and we are able to maintain a goodquality reference value function with $O(SA)$ space complexity.more » « less

In blackbox optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In reallife, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multifidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noisetolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multifidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayedUCB1 (DUCB1) and delayedUCBV (DUCBV) algorithms. Given a horizon T, PCTS retains the regret bound of nondelayed HOO for expected delay of O(log T), and worsens by T^((1α)/(d+2)) for expected delays of O(T^(1α)) for α ∈ (0,1]. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the stateoftheart blackbox optimization methods for feedbacks with different noise levels, delays, and fidelity.more » « less

Bringmann, Karl ; Grohe, Martin ; Puppis, Gabriele ; Svensson, Ola (Ed.)We consider the problem of approximate counting of triangles and longer fixed length cycles in directed graphs. For triangles, Tětek [ICALP'22] gave an algorithm that returns a (1±ε)approximation in Õ(n^ω/t^{ω2}) time, where t is the unknown number of triangles in the given n node graph and ω < 2.372 is the matrix multiplication exponent. We obtain an improved algorithm whose running time is, within polylogarithmic factors the same as that for multiplying an n× n/t matrix by an n/t × n matrix. We then extend our framework to obtain the first nontrivial (1± ε)approximation algorithms for the number of hcycles in a graph, for any constant h ≥ 3. Our running time is Õ(MM(n,n/t^{1/(h2)},n)), the time to multiply n × n/(t^{1/(h2)}) by n/(t^{1/(h2)) × n matrices. Finally, we show that under popular finegrained hypotheses, this running time is optimal.more » « less