skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Partially observable collaborative model for optimizing personalized treatment selection
Precision medicine that enables personalized treatment decision support has become an increasingly important research topic in chronic disease care. The main challenges in designing a treatment algorithm include modeling individual disease progression dynamics and designing adaptive treatment selection strategy. This study aims to develop an adaptive treatment selection framework tailored to an individual patient’s disease progression pattern and treatment response. We propose a Partially Observable Collaborative Model (POCM) to capture the individual variations in a heterogeneous population and optimize treatment outcomes in three stages. The POCM first infers the disease progression models by subgroup patterns using population data in stage one and then fine-tunes the models for individual patients with a small number of treatment trials in stage two. In stage three, we show how the treatment policies based on the Partially Observable Markov Decision Process (POMDP) can be tailored to individual patients by utilizing the disease models learned from the POCM. Using a simulated population of chronic depression patients, we show that the POCM can more accurately estimate the personal disease progression than the traditional method of solving a hidden Markov model. We also compare the POMDP treatment policies with other heuristic policies and demonstrate that the POCM-based policies give the highest net monetary benefits in majority of parameter settings. To conclude, the POCM method is a promising approach to model the chronic disease progression process and recommend a personalized treatment plan for individual patients in a heterogeneous population.  more » « less
Award ID(s):
1824623
PAR ID:
10476335
Author(s) / Creator(s):
;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
European Journal of Operational Research
Volume:
309
Issue:
3
ISSN:
0377-2217
Page Range / eLocation ID:
1409 to 1419
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cancer screening is a large, population-based intervention that would benefit from tools enabling individually-tailored decision making to decrease unintended consequences such as overdiagnosis. The heterogeneity of cancer screening participants advocates the need for more personalized approaches. Partially observable Markov decision processes (POMDPs) can be used to suggest optimal, individualized screening policies. However, determining an appropriate reward function can be challenging. Here, we propose the use of inverse reinforcement learning (IRL) to form rewards functions for lung and breast cancer screening POMDP models. Using data from the National Lung Screening Trial and our institution's breast screening registry, we developed two POMDP models with corresponding reward functions. Specifically, the maximum entropy (MaxEnt) IRL algorithm with an adaptive step size was used to learn rewards more efficiently; and combined with a multiplicative model to learn state-action pair rewards in the POMDP. The lung and breast cancer screening models were evaluated based on their ability to recommend appropriate screening decisions before the diagnosis of cancer. Results are comparable with experts' decisions. The lung POMDP demonstrated an improved performance in terms of recall and false positive rate in the second screening and post-screening stages. Precision (0.02-0.05) was comparable to experts' (0.02-0.06). The breast POMDP has excellent recall (0.97-1.00), matching the physicians and a satisfactory false positive rate (<0.03). The reward functions learned with the MaxEnt IRL algorithm, when combined with POMDP models in lung and breast cancer screening, demonstrate performance comparable to experts. 
    more » « less
  2. null (Ed.)
    Purpose : Personalized screening guidelines can be an effective strategy to prevent diabetic retinopathy (DR)-related vision loss. However, these strategies typically do not capture behavior-based factors such as a patient’s compliance or cost preferences. This study develops a mathematical model to identify screening policies that capture both DR progression and behavioral factors to provide personalized recommendations. Methods : A partially observable Markov decision process model (POMDP) is developed to provide personalized screening recommendations. For each patient, the model estimates the patient’s probability of having a sight-threatening diabetic eye disorder (STDED) yearly via Bayesian inference based on natural history, screening results, and compliance behavior. The model then determines a personalized, threshold-based recommendation for each patient annually--either no action (NA), teleretinal imaging (TRI), or clinical screening (CS)--based on the patient’s current probability of having STDED as well as patient-specific preference between cost saving ($) and QALY gain. The framework is applied to a hypothetical cohort of 40-year-old African American male patients. Results : For the base population with TRI and CS compliance rates of 65% and 55% and equal preference for cost and QALY, NA is identified as an optimal recommendation when the patient’s probability of having STDED is less than 0.72%, TRI when the probability is [0.72%, 2.09%], and CS when the probability is above 2.09%. Simulated against annual clinical screening, the model-based policy finds an average decrease of 7.07% in cost/QALY (95% CI; 6.93-7.23%) and 15.05% in blindness prevalence over a patient’s lifetime (95% CI; 14.88-15.23%). For patients with equal preference for cost and QALY, the model identifies 6 different types of threshold-based policies (See Fig 1). For patients with strong preference for QALY gain, CS-only policies had an increase in prevalence by a factor of 19.2 (see Fig 2). Conclusions : The POMDP model is highly flexible and responsive in incorporating behavioral factors when providing personalized screening recommendations. As a decision support tool, providers can use this modeling framework to provide unique, catered recommendations. 
    more » « less
  3. Recent work has considered personalized route planning based on user profiles, but none of it accounts for human trust. We argue that human trust is an important factor to consider when planning routes for automated vehicles. This article presents a trust-based route-planning approach for automated vehicles. We formalize the human-vehicle interaction as a partially observable Markov decision process (POMDP) and model trust as a partially observable state variable of the POMDP, representing the human’s hidden mental state. We build data-driven models of human trust dynamics and takeover decisions, which are incorporated in the POMDP framework, using data collected from an online user study with 100 participants on the Amazon Mechanical Turk platform. We compute optimal routes for automated vehicles by solving optimal policies in the POMDP planning and evaluate the resulting routes via human subject experiments with 22 participants on a driving simulator. The experimental results show that participants taking the trust-based route generally reported more positive responses in the after-driving survey than those taking the baseline (trust-free) route. In addition, we analyze the trade-offs between multiple planning objectives (e.g., trust, distance, energy consumption) via multi-objective optimization of the POMDP. We also identify a set of open issues and implications for real-world deployment of the proposed approach in automated vehicles. 
    more » « less
  4. This paper presents a framework to learn the reward function underlying high-level sequential tasks from demonstrations. The purpose of reward learning, in the context of learning from demonstration (LfD), is to generate policies that mimic the demonstrator’s policies, thereby enabling imitation learning. We focus on a human-robot interaction(HRI) domain where the goal is to learn and model structured interactions between a human and a robot. Such interactions can be modeled as a partially observable Markov decision process (POMDP) where the partial observability is caused by uncertainties associated with the ways humans respond to different stimuli. The key challenge in finding a good policy in such a POMDP is determining the reward function that was observed by the demonstrator. Existing inverse reinforcement learning(IRL) methods for POMDPs are computationally very expensive and the problem is not well understood. In comparison, IRL algorithms for Markov decision process (MDP) are well defined and computationally efficient. We propose an approach of reward function learning for high-level sequential tasks from human demonstrations where the core idea is to reduce the underlying POMDP to an MDP and apply any efficient MDP-IRL algorithm. Our extensive experiments suggest that the reward function learned this way generates POMDP policies that mimic the policies of the demonstrator well. 
    more » « less
  5. Background Lung volume reduction surgery (LVRS) and medical therapy are 2 available treatment options in dealing with severe emphysema, which is a chronic lung disease. However, or there are currently limited guidelines on the timing of LVRS for patients with different characteristics. Objective The objective of this study is to assess the timing of receiving LVRS in terms of patient outcomes, taking into consideration a patient’s characteristics. Methods A finite-horizon Markov decision process model for patients with severe emphysema was developed to determine the short-term (5 y) and long-term timing of emphysema treatment. Maximizing the expected life expectancy, expected quality-adjusted life-years, and total expected cost of each treatment option were applied as the objective functions of the model. To estimate parameters in the model, the data provided by the National Emphysema Treatment Trial were used. Results The results indicate that the treatment timing strategy for patients with upper-lobe predominant emphysema is to receive LVRS regardless of their specific characteristics. However, for patients with non–upper-lobe–predominant emphysema, the optimal strategy depends on the age, maximum workload level, and forced expiratory volume in 1 second level. Conclusion This study demonstrates the utilization of clinical trial data to gain insights into the timing of surgical treatment for patients with emphysema, considering patient age, observable health condition, and location of emphysema. Highlights Both short-term and long-term Markov decision process models were developed to assess the timing of receiving lung volume reduction surgery in patients with severe emphysema. How clinical trial data can be used to estimate the parameters and obtain short-term results from the Markov decision process model is demonstrated. The results provide insights into the timing of receiving lung volume reduction surgery as a function of a patient’s characteristics, including age, emphysema location, maximum workload, and forced expiratory volume in 1 second level. 
    more » « less