This content will become publicly available on May 29, 2025
Reinforcement learning (RL), a subset of machine learning (ML), could optimize and control biomanufacturing processes, such as improved production of therapeutic cells. Here, the process of CAR T‐cell activation by antigen‐presenting beads and their subsequent expansion is formulated in silico. The simulation is used as an environment to train RL‐agents to dynamically control the number of beads in culture to maximize the population of robust effector cells at the end of the culture. We make periodic decisions of incremental bead addition or complete removal. The simulation is designed to operate in OpenAI Gym, enabling testing of different environments, cell types, RL‐agent algorithms, and state inputs to the RL‐agent. RL‐agent training is demonstrated with three different algorithms (PPO, A2C, and DQN), each sampling three different state input types (tabular, image, mixed); PPO‐tabular performs best for this simulation environment. Using this approach, training of the RL‐agent on different cell types is demonstrated, resulting in unique control strategies for each type. Sensitivity to input‐noise (sensor performance), number of control step interventions, and advantages of pre‐trained RL‐agents are also evaluated. Therefore, we present an RL framework to maximize the population of robust effector cells in CAR T‐cell therapy production.
more » « less- PAR ID:
- 10533228
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Biotechnology and Bioengineering
- Volume:
- 121
- Issue:
- 9
- ISSN:
- 0006-3592
- Format(s):
- Medium: X Size: p. 2868-2880
- Size(s):
- p. 2868-2880
- Sponsoring Org:
- National Science Foundation
More Like this
-
Serverless Function-As-A-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve resource utilization, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to existing heuristics-based resource management approaches, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. In this paper, we show that the state-of-The-Art single-Agent RL algorithm (S-RL) suffers up to 4.6x higher function tail latency degradation on multi-Tenant serverless FaaS platforms and is unable to converge during training. We then propose and implement a customized multi-Agent RL algorithm based on Proximal Policy Optimization, i.e., multi-Agent PPO (MA-PPO). We show that in multi-Tenant environments, MA-PPO enables each agent to be trained until convergence and provides online performance comparable to S-RL in single-Tenant cases with less than 10% degradation. Besides, MA-PPO provides a 4.4x improvement in S-RL performance (in terms of function tail latency) in multi-Tenant cases.more » « less
-
Agents that are aware of the separation between themselves and their environments can leverage this understanding to form effective representations of visual input. We propose an approach for learning such structured representations for RL algorithms, using visual knowledge of the agent, such as its shape or mask, which is often inexpensive to obtain. This is incorporated into the RL objective using a simple auxiliary loss. We show that our method, Structured Environment-Agent Representations, outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots.more » « less
-
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent’s inputs, which raises concerns about deploying such agents in the real world. To address this issue, we propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against lp-norm bounded adversarial attacks. Our framework is compatible with popular deep reinforcement learning algorithms and we demonstrate its performance with deep Q-learning, A3C and PPO. We experiment on three deep RL benchmarks (Atari, MuJoCo and ProcGen) to show the effectiveness of our robust training algorithm. Our RADIAL-RL agents consistently outperform prior methods when tested against attacks of varying strength and are more computationally efficient to train. In addition, we propose a new evaluation method called Greedy Worst-Case Reward (GWC) to measure attack agnostic robustness of deep RL agents. We show that GWC can be evaluated efficiently and is a good estimate of the reward under the worst possible sequence of adversarial attacks. All code used for our experiments is available at https://github.com/tuomaso/radial_rl_v2.more » « less
-
ABSTRACT Chimeric antigen receptor (CAR) T cell immunotherapy has demonstrated exceptional efficacy against hematological malignancies, but notably less against solid tumors. To overcome this limitation, it is critical to investigate antitumor CAR‐T cell potency in synthetic 3D microenvironments that can simulate the physical barriers presented by solid tumors. The overall goal of this study was the preliminary assessment of a synthetic thermo‐responsive material as a substrate for in vitro co‐cultures of anti‐disialoganglioside (GD2) CAR‐T cells and patient‐derived glioblastoma (GBM) spheroids. Independent co‐culture experiments demonstrated that the encapsulation process did not adversely affect the cell cycle progression of glioma stem cells (GSCs) or CAR‐T cells. GSC spheroids grew over time within the terpolymer scaffold, when seeded in the same ratio as the suspension control. Co‐cultures of CAR‐T cells in suspension with hydrogel‐encapsulated GSC spheroids demonstrated that CAR‐T cells could migrate through the hydrogel and target the encapsulated GSC spheroids. CAR‐T cells killed approximately 80% of encapsulated GSCs, while maintaining effective CD4:CD8 T cell ratios and secreting inflammatory cytokines after interacting with GD2‐expressing GSCs. Importantly, the scaffolds also facilitated cell harvesting for downstream cellular analysis. This study demonstrated that a synthetic 3D terpolymer hydrogel can serve as an artificial scaffold to investigate cellular immunotherapeutic potency against solid tumors.
-
Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controllermore » « less