Reinforcement learning (RL), a subset of machine learning (ML), could optimize and control biomanufacturing processes, such as improved production of therapeutic cells. Here, the process of CAR T‐cell activation by antigen‐presenting beads and their subsequent expansion is formulated in silico. The simulation is used as an environment to train RL‐agents to dynamically control the number of beads in culture to maximize the population of robust effector cells at the end of the culture. We make periodic decisions of incremental bead addition or complete removal. The simulation is designed to operate in OpenAI Gym, enabling testing of different environments, cell types, RL‐agent algorithms, and state inputs to the RL‐agent. RL‐agent training is demonstrated with three different algorithms (PPO, A2C, and DQN), each sampling three different state input types (tabular, image, mixed); PPO‐tabular performs best for this simulation environment. Using this approach, training of the RL‐agent on different cell types is demonstrated, resulting in unique control strategies for each type. Sensitivity to input‐noise (sensor performance), number of control step interventions, and advantages of pre‐trained RL‐agents are also evaluated. Therefore, we present an RL framework to maximize the population of robust effector cells in CAR T‐cell therapy production.
- Award ID(s):
- 2024594
- PAR ID:
- 10475150
- Publisher / Repository:
- ICML
- Date Published:
- Journal Name:
- ICML
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Fine-grained visual reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark fine-grained multi-agent categorization dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping fine-grained multi-agent categorization methods. All datasets, code for extraction, and code for dataset loading can be found at
https://starcraftdata.davidinouye.com/. -
Educational video games can engage students in authentic STEM practices, which often involve visual representations. In particular, because most interactions within video games are mediated through visual representations, video games provide opportunities for students to experience disciplinary practices with visual representations. Prior research on learning with visual representations in non-game contexts suggests that visual representations may confuse students if they lack prerequisite representational-competencies. However, it is unclear how this research applies to game environments. To address this gap, we investigated the role of representational-competencies for students’ learning from video games. We first conducted a single-case study of a high-performing undergraduate student playing an astronomy game as an assignment in an astronomy course. We found that this student had difficulties making sense of the visual representations in the game. We interpret these difficulties as indicating a lack of representational-competencies. Further, these difficulties seemed to lead to the student’s inability to relate the game experiences to the content covered in his astronomy course. A second study investigated whether interventions that have proven successful in structured learning environments to support representational-competencies would enhance students’ learning from visual representations in the video game. We randomly assigned 45 students enrolled in an undergraduate course to two conditions. Students either received representational-competency support while playing the astronomy game or they did not receive this support. Results showed no effects of representational-competency supports. This suggests that instructional designs that are effective for representational-competency supports in structured learning environments may not be effective for educational video games. We discuss implications for future research, for designers of educational games, and for educators.more » « less
-
Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet it is not server-“less” and comes at the cost of more complex infrastructure management (e.g., resource provisioning and scheduling) for cloud providers. To maintain function service-level objectives (SLOs) and improve resource utilization efficiency, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to rule-based solutions with heuristics, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. Despite the initial success of applying RL, we first show in this paper that the state-of-the-art single-agent RL algorithm (S-RL) suffers up to 4.8x higher p99 function latency degradation on multi-tenant serverless FaaS platforms compared to isolated environments and is unable to converge during training. We then design and implement a scalable and incremental multi-agent RL framework based on Proximal Policy Optimization (SIMPPO). Our experiments on widely used serverless benchmarks demonstrate that in multi-tenant environments, SIMPPO enables each RL agent to efficiently converge during training and provides online function latency performance comparable to that of S-RL trained in isolation (which we refer to as the baseline for assessing RL performance) with minor degradation (<9.2%). In addition, SIMPPO reduces the p99 function latency by 4.5x compared to S-RL in multi-tenant cases.more » « less
-
Reinforcement learning (RL) can help agents learn complex tasks that would be hard to specify using standard imperative programming. However, end users may have trouble personalizing their technology using RL due to a lack of technical expertise. Prior work has explored means of supporting end users after a problem for the RL agent to solve has been defined. Little work, however, has explored how to support end users when defining this problem. We propose a tool to provide structured support for end users defining problems for RL agents. Through this tool, users can (i) directly and indirectly specify the problem as a Markov decision process (MDP); (ii) receive automatic suggestions on possible MDP changes that would enhance training time and accuracy; and (iii) revise the MDP after training the agent to solve it. We believe this work will help reduce barriers to using RL and contribute to the existing literature on designing human-in-the-loop systems.more » « less