skip to main content


Title: Deep Reinforcement Learning for Procedural Content Generation of 3D Virtual Environments
Abstract This work presents a deep reinforcement learning (DRL) approach for procedural content generation (PCG) to automatically generate three-dimensional (3D) virtual environments that users can interact with. The primary objective of PCG methods is to algorithmically generate new content in order to improve user experience. Researchers have started exploring the use of machine learning (ML) methods to generate content. However, these approaches frequently implement supervised ML algorithms that require initial datasets to train their generative models. In contrast, RL algorithms do not require training data to be collected a priori since they take advantage of simulation to train their models. Considering the advantages of RL algorithms, this work presents a method that generates new 3D virtual environments by training an RL agent using a 3D simulation platform. This work extends the authors’ previous work and presents the results of a case study that supports the capability of the proposed method to generate new 3D virtual environments. The ability to automatically generate new content has the potential to maintain users’ engagement in a wide variety of applications such as virtual reality applications for education and training, and engineering conceptual design.  more » « less
Award ID(s):
1834465
NSF-PAR ID:
10186606
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of Computing and Information Science in Engineering
Volume:
20
Issue:
5
ISSN:
1530-9827
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work presents a Procedural Content Generation (PCG) method based on a Neural Network Reinforcement Learning (RL) approach that generates new environments for Virtual Reality (VR) learning applications. The primary objective of PCG methods is to algorithmically generate new content (e.g., environments, levels) in order to improve user experience. Researchers have started exploring the integration of Machine Learning (ML) algorithms into their PCG methods. These ML approaches help explore the design space and generate new content more efficiently. The capability to provide users with new content has great potential for learning applications. However, these ML algorithms require large datasets to train their generative models. In contrast, RL based methods do not require any training data to be collected a priori since they take advantage of simulation to train their models. Moreover, even though VR has become an emerging technology to engage users, there have been few studies that explore PCG for learning purposes and fewer in the context of VR. Considering these limitations, this work presents a method that generates new VR environments by training an RL in a simulation platform. This PCG method has the potential to maintain users’ engagement over time by presenting them with new environments in VR learning applications. 
    more » « less
  2. In this work, a Deep Reinforcement Learning (RL) approach is proposed for Procedural Content Generation (PCG) that seeks to automate the generation of multiple related virtual reality (VR) environments for enhanced personalized learning. This allows for the user to be exposed to multiple virtual scenarios that demonstrate a consistent theme, which is especially valuable in an educational context. RL approaches to PCG offer the advantage of not requiring training data, as opposed to other PCG approaches that employ supervised learning approaches. This work advances the state of the art in RL-based PCG by demonstrating the ability to generate a diversity of contexts in order to teach the same underlying concept. A case study is presented that demonstrates the feasibility of the proposed RL-based PCG method using examples of probability distributions in both manufacturing facility and grocery store virtual environments. The method demonstrated in this paper has the potential to enable the automatic generation of a variety of virtual environments that are connected by a common concept or theme. 
    more » « less
  3. Relevance to proposal: This project evaluates the generalizability of real and synthetic training datasets which can be used to train model-free techniques for multi-agent applications. We evaluate different methods of generating training corpora and machine learning techniques including Behavior Cloning and Generative Adversarial Imitation Learning. Our results indicate that the utility-guided selection of representative scenarios to generate synthetic data can have significant improvements on model performance. Paper abstract: Crowd simulation, the study of the movement of multiple agents in complex environments, presents a unique application domain for machine learning. One challenge in crowd simulation is to imitate the movement of expert agents in highly dense crowds. An imitation model could substitute an expert agent if the model behaves as good as the expert. This will bring many exciting applications. However, we believe no prior studies have considered the critical question of how training data and training methods affect imitators when these models are applied to novel scenarios. In this work, a general imitation model is represented by applying either the Behavior Cloning (BC) training method or a more sophisticated Generative Adversarial Imitation Learning (GAIL) method, on three typical types of data domains: standard benchmarks for evaluating crowd models, random sampling of state-action pairs, and egocentric scenarios that capture local interactions. Simulated results suggest that (i) simpler training methods are overall better than more complex training methods, (ii) training samples with diverse agent-agent and agent-obstacle interactions are beneficial for reducing collisions when the trained models are applied to new scenarios. We additionally evaluated our models in their ability to imitate real world crowd trajectories observed from surveillance videos. Our findings indicate that models trained on representative scenarios generalize to new, unseen situations observed in real human crowds. 
    more » « less
  4. 3D models of objects and scenes are critical to many academic disciplines and industrial applications. Of particular interest is the emerging opportunity for 3D graphics to serve artificial intelligence: computer vision systems can benefit from synthetically-generated training data rendered from virtual 3D scenes, and robots can be trained to navigate in and interact with real-world environments by first acquiring skills in simulated ones. One of the most promising ways to achieve this is by learning and applying generative models of 3D content: computer programs that can synthesize new 3D shapes and scenes. To allow users to edit and manipulate the synthesized 3D content to achieve their goals, the generative model should also be structure-aware: it should express 3D shapes and scenes using abstractions that allow manipulation of their high-level structure. This state-of-the- art report surveys historical work and recent progress on learning structure-aware generative models of 3D shapes and scenes. We present fundamental representations of 3D shape and scene geometry and structures, describe prominent methodologies including probabilistic models, deep generative models, program synthesis, and neural networks for structured data, and cover many recent methods for structure-aware synthesis of 3D shapes and indoor scenes. 
    more » « less
  5. ABSTRACT Introduction

    Remote military operations require rapid response times for effective relief and critical care. Yet, the military theater is under austere conditions, so communication links are unreliable and subject to physical and virtual attacks and degradation at unpredictable times. Immediate medical care at these austere locations requires semi-autonomous teleoperated systems, which enable the completion of medical procedures even under interrupted networks while isolating the medics from the dangers of the battlefield. However, to achieve autonomy for complex surgical and critical care procedures, robots require extensive programming or massive libraries of surgical skill demonstrations to learn effective policies using machine learning algorithms. Although such datasets are achievable for simple tasks, providing a large number of demonstrations for surgical maneuvers is not practical. This article presents a method for learning from demonstration, combining knowledge from demonstrations to eliminate reward shaping in reinforcement learning (RL). In addition to reducing the data required for training, the self-supervised nature of RL, in conjunction with expert knowledge-driven rewards, produces more generalizable policies tolerant to dynamic environment changes. A multimodal representation for interaction enables learning complex contact-rich surgical maneuvers. The effectiveness of the approach is shown using the cricothyroidotomy task, as it is a standard procedure seen in critical care to open the airway. In addition, we also provide a method for segmenting the teleoperator’s demonstration into subtasks and classifying the subtasks using sequence modeling.

    Materials and Methods

    A database of demonstrations for the cricothyroidotomy task was collected, comprising six fundamental maneuvers referred to as surgemes. The dataset was collected by teleoperating a collaborative robotic platform—SuperBaxter, with modified surgical grippers. Then, two learning models are developed for processing the dataset—one for automatic segmentation of the task demonstrations into a sequence of surgemes and the second for classifying each segment into labeled surgemes. Finally, a multimodal off-policy RL with rewards learned from demonstrations was developed to learn the surgeme execution from these demonstrations.

    Results

    The task segmentation model has an accuracy of 98.2%. The surgeme classification model using the proposed interaction features achieved a classification accuracy of 96.25% averaged across all surgemes compared to 87.08% without these features and 85.4% using a support vector machine classifier. Finally, the robot execution achieved a task success rate of 93.5% compared to baselines of behavioral cloning (78.3%) and a twin-delayed deep deterministic policy gradient with shaped rewards (82.6%).

    Conclusions

    Results indicate that the proposed interaction features for the segmentation and classification of surgical tasks improve classification accuracy. The proposed method for learning surgemes from demonstrations exceeds popular methods for skill learning. The effectiveness of the proposed approach demonstrates the potential for future remote telemedicine on battlefields.

     
    more » « less