skip to main content


Title: Learning Probably Approximately Complete and Safe Action Models for Stochastic Worlds
We consider the problem of learning action models for planning in unknown stochastic environments that can be defined using the Probabilistic Planning Domain Description Language (PPDDL). As input, we are given a set of previously executed trajectories, and the main challenge is to learn an action model that has a similar goal achievement probability to the policies used to create these trajectories. To this end, we introduce a variant of PPDDL in which there is uncertainty about the transition probabilities, specified by an interval for each factor that contains the respective true transition probabilities. Then, we present SAM+, an algorithm that learns such an imprecise-PPDDL environment model. SAM+ has a polynomial time and sample complexity, and guarantees that with high probability, the true environment is indeed captured by the defined intervals. We prove that the action model SAM+ outputs has a goal achievement probability that is almost as good or better than that of the policies used to produced the training trajectories. Then, we show how to produce a PPDDL model based on this imprecise-PPDDL model that has similar properties.  more » « less
Award ID(s):
1908287 1718380
NSF-PAR ID:
10357215
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
36
Issue:
9
ISSN:
2159-5399
Page Range / eLocation ID:
9795 to 9804
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Standard methods for synthesis of control policies in Markov decision processes with unknown transition probabilities largely rely on a combination of exploration and exploitation. While these methods often offer theoretical guarantees on system performance, the number of time steps and samples needed to initially explore the environment before synthesizing a well-performing control policy is impractically large. This paper partially alleviates such a burden by incorporating a priori existing knowledge into learning, when such knowledge is available. Based on prior information about bounds on the differences between the transition probabilities at different states, we propose a learning approach where the transition probabilities at a given state are not only learned from outcomes of repeatedly performing a certain action at that state, but also from outcomes of performing actions at states that are known to have similar transition probabilities. Since the directly obtained information is more reliable at determining transition probabilities than second-hand information, i.e., information obtained from similar but potentially slightly different states, samples obtained indirectly are weighted with respect to the known bounds on the differences of transition probabilities. While the proposed strategy can naturally lead to errors in learned transition probabilities, we show that, by proper choice of the weights, such errors can be reduced, and the number of steps needed to form a near-optimal control policy in the Bayesian sense can be significantly decreased. 
    more » « less
  2. Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines. 
    more » « less
  3. Abstract STUDY QUESTION To what extent is female preconception antibiotic use associated with fecundability? SUMMARY ANSWER Preconception antibiotic use overall was not appreciably associated with fecundability. WHAT IS KNOWN ALREADY Antibiotics are commonly used by women and are generally thought to be safe for use during pregnancy. However, little is known about possible effects of antibiotic use on fecundability, the per-cycle probability of conception. Previous research on this question has been limited to occupational rather than therapeutic exposure. STUDY DESIGN, SIZE, DURATION We analyzed data from an Internet-based preconception cohort study of 9524 female pregnancy planners aged 21–45 years residing in the USA and Canada who had been attempting to conceive for six or fewer cycles at study entry. Participants enrolled between June 2013 and September 2020 and completed baseline and bimonthly follow-up questionnaires for up to 12 months or until a reported pregnancy, whichever came first. The questions pertaining to antibiotic type and indication were added to the PRESTO questionnaires in March 2016. PARTICIPANTS/MATERIALS, SETTING, METHODS We assessed antibiotic use in the previous 4 weeks at baseline and on each follow-up questionnaire. Participants provided the name of the specific antibiotic and the indication for use. Antibiotics were classified based on active ingredient (penicillins, macrolides, nitrofurantoin, nitroimidazole, cephalosporins, sulfonamides, quinolones, tetracyclines, lincosamides), and indications were classified by type of infection (respiratory, urinary tract, skin, vaginal, pelvic, and surgical). Participants reported pregnancy status on follow-up questionnaires. We used proportional probabilities regression to estimate fecundability ratios (FR), the per-cycle probability of conception comparing exposed with unexposed individuals, and 95% confidence intervals (CI), adjusting for sociodemographics, lifestyle factors, and reproductive history. MAIN RESULTS AND THE ROLE OF CHANCE Overall, women who used antibiotics in the past 4 weeks at baseline had similar fecundability to those who had not used antibiotics (FR: 0.98, 95% CI: 0.89–1.07). Sulfonamides and lincosamides were associated with slightly increased fecundability (FR: 1.39, 95% CI: 0.90–2.15, and FR: 1.58 95% CI: 0.96–2.60, respectively), while macrolides were associated with slightly reduced fecundability (FR: 0.70, 95% CI: 0.47–1.04). Analyses of the indication for antibiotic use suggest that there is likely some confounding by indication. LIMITATIONS, REASONS FOR CAUTION Findings were imprecise for some antibiotic classes and indications for use owing to small numbers of antibiotic users in these categories. There are likely heterogeneous effects of different combinations of indications and treatments, which may be obscured in the overall null results, but cannot be further elucidated in this analysis. WIDER IMPLICATIONS OF THE FINDINGS There is little evidence that use of most antibiotics is associated with reduced fecundability. Antibiotics and the infections they treat are likely associated with fecundability through differing mechanisms, resulting in their association with increased fecundability in some circumstances and decreased fecundability in others. STUDY FUNDING/COMPETING INTEREST(S) This study was supported through funds provided by the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (R01-HD086742, R21-HD072326). L.A.W. has received in-kind donations from Swiss Precision Diagnostics, Sandstone Diagnostics, Fertility Friend, and Kindara for primary data collection in PRESTO. The other authors have no conflicts of interest to disclose. TRIAL REGISTRATION NUMBER N/A. 
    more » « less
  4. This paper studies the evaluation of learning-based object detection models in conjunction with model-checking of formal specifications defined on an abstract model of an autonomous system and its environment. In particular, we define two metrics – proposition-labeled and class-labeled confusion matrices – for evaluating object detection, and we incorporate these metrics to compute the satisfaction probability of system-level safety requirements. While confusion matrices have been effective for comparative evaluation of classification and object detection models, our framework fills two key gaps. First, we relate the performance of object detection to formal requirements defined over downstream high-level planning tasks. In particular, we provide empirical results that show that the choice of a good object detection algorithm, with respect to formal requirements on the overall system, significantly depends on the downstream planning and control design. Secondly, unlike the traditional confusion matrix, our metrics account for variations in performance with respect to the distance between the ego and the object being detected. We demonstrate this framework on a car-pedestrian example by computing the satisfaction probabilities for safety requirements formalized in Linear Temporal Logic (LTL). 
    more » « less
  5. Abstract

    We consider user retention analytics for online freemium role-playing games (RPGs). RPGs constitute a very popular genre of computer-based games that, along with a player’s gaming actions, focus on the development of the player’s in-game virtual character through a persistent exploration of the gaming environment. Most RPGs follow the freemium business model in which the gamers can play for free but they are charged for premium add-on amenities. As with other freemium products, RPGs suffer from the curse of high dropout rates. This makes retention analysis extremely important for successful operation and survival of their gaming portals. Here, we develop a disciplined statistical framework for retention analysis by modelling multiple in-game player characteristics along with the dropout probabilities. We capture players’ motivations through engagement times, collaboration and achievement score at each level of the game, and jointly model them using a generalized linear mixed model (glmm) framework that further includes a time-to-event variable corresponding to churn. We capture the interdependencies in a player’s level-wise engagement, collaboration, achievement with dropout through a shared parameter model. We illustrate interesting changes in player behaviours as the gaming level progresses. The parameters in our joint model were estimated by a Hamiltonian Monte Carlo algorithm which incorporated a divide-and-recombine approach for increased scalability in glmm estimation that was needed to accommodate our large longitudinal gaming data-set. By incorporating the level-wise changes in a player’s motivations and using them for dropout rate prediction, our method greatly improves on state-of-the-art retention models. Based on data from a popular action based RPG, we demonstrate the competitive optimality of our proposed joint modelling approach by exhibiting its improved predictive performance over competitors. In particular, we outperform aggregate statistics based methods that ignore level-wise progressions as well as progression tracking non-joint model such as the Cox proportional hazards model. We also display improved predictions of popular marketing retention statistics and discuss how they can be used in managerial decision making.

     
    more » « less