skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards Real Time Team Optimization
Teams can be often viewed as a dynamic system where the team configuration evolves over time (e.g., new members join the team; existing members leave the team; the skills of the members improve over time). Consequently, the performance of the team might be changing due to such team dynamics. A natural question is how to plan the (re-)staffing actions (e.g., recruiting a new team member) at each time step so as to maximize the expected cumulative performance of the team. In this paper, we address the problem of real-time team optimization by intelligently selecting the best candidates towards increasing the similarity between the current team and the high-performance teams according to the team configuration at each time-step. The key idea is to formulate it as a Markov Decision process (MDP) problem and leverage recent advances in reinforcement learning to optimize the team dynamically. The proposed method bears two main advantages, including (1) dynamics, being able to model the dynamics of the team to optimize the initial team towards the direction of a high-performance team via performance feedback; (2) efficacy, being able to handle the large state/action space via deep reinforcement learning based value estimation. We demonstrate the effectiveness of the proposed method through extensive empirical evaluations.  more » « less
Award ID(s):
1947135 1651203 1715385
PAR ID:
10159175
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE BigData
Page Range / eLocation ID:
1008 to 1017
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent algorithms have achieved superhuman performance at a number of twoplayer zero-sum games such as poker and go. However, many real-world situations are multi-player games. Zero-sum two-team games, such as bridge and football, involve two teams where each member of the team shares the same reward with every other member of that team, and each team has the negative of the reward of the other team. A popular solution concept in this setting, called TMECor, assumes that teams can jointly correlate their strategies before play, but are not able to communicate during play. This setting is harder than two-player zerosum games because each player on a team has different information and must use their public actions to signal to other members of the team. Prior works either have game-theoretic guarantees but only work in very small games, or are able to scale to large games but do not have game-theoretic guarantees. In this paper we introduce two algorithms: Team-PSRO, an extension of PSRO from twoplayer games to team games, and Team-PSRO Mix-and-Match which improves upon Team PSRO by better using population policies. In Team-PSRO, in every iteration both teams learn a joint best response to the opponent’s meta-strategy via reinforcement learning. As the reinforcement learning joint best response approaches the optimal best response, Team-PSRO is guaranteed to converge to a TMECor. In experiments on Kuhn poker and Liar’s Dice, we show that a tabular version of Team-PSRO converges to TMECor, and a version of Team PSRO using deep cooperative reinforcement learning beats self-play reinforcement learning in the large game of Google Research Football. 
    more » « less
  2. Multiagent teams have been shown to be effective in many domains that require coordination among team members. However, finding valuable joint-actions becomes increasingly difficult in tightly-coupled domains where each agent’s performance depends on the actions of many other agents. Reward shaping partially addresses this challenge by deriving more “tuned" rewards to provide agents with additional feedback, but this approach still relies on agents ran- domly discovering suitable joint-actions. In this work, we introduce Counterfactual Agent Suggestions (CAS) as a method for injecting knowledge into an agent’s learning process within the confines of existing reward structures. We show that CAS enables agent teams to converge towards desired behaviors more reliably. We also show that improvement in team performance in the presence of suggestions extends to large teams and tightly-coupled domains. 
    more » « less
  3. Abstract Although teamwork is being integrated throughout engineering education because of the perceived benefits of teams, the construct of psychological safety has been largely ignored in engineering research. This omission is unfortunate because psychological safety reflects collective perceptions about how comfortable team members feel in sharing their perspectives, and it has been found to positively impact team performance in samples outside of engineering. While prior research has indicated that psychological safety is positively related to team performance and outcomes, research related to psychological safety in engineering teams is less established. There is also a lack of comprehensive methodologies that capture the dynamic changes that occur throughout the design process and at each time point. In light of this, the goal of the current study was to understand how psychological safety might be measured practically and reliably in engineering student teams over time. In addition, we sought to identify factors that impact the building and waning of psychological safety in these teams over time. This was accomplished through a study with 260 engineering students in 68 teams in a first-year engineering design class. The psychological safety of the teams was captured for each team over five time points over the course of a semester long design project. The results of this study provide some of the first evidence on the reliability of psychological safety in engineering teams and offer insights as to how to support and improve psychological safety. 
    more » « less
  4. null (Ed.)
    Abstract Collaborative work often benefits from having teams or organizations with heterogeneous members. In this paper, we present a method to form such diverse teams from people arriving sequentially over time. We define a monotone submodular objective function that combines the diversity and quality of a team and proposes an algorithm to maximize the objective while satisfying multiple constraints. This allows us to balance both how diverse the team is and how well it can perform the task at hand. Using crowd experiments, we show that, in practice, the algorithm leads to large gains in team diversity. Using simulations, we show how to quantify the additional cost of forming diverse teams and how to address the problem of simultaneously maximizing diversity for several attributes (e.g., country of origin and gender). Our method has applications in collaborative work ranging from team formation, the assignment of workers to teams in crowdsourcing, and reviewer allocation to journal papers arriving sequentially. Our code is publicly accessible for further research. 
    more » « less
  5. Multi-human multi-robot teams (MH-MR) obtain tremendous potential in tackling intricate and massive missions by merging distinct strengths and expertise of individual members. The inherent heterogeneity of these teams necessitates advanced initial task allocation (ITA) methods that align tasks with the intrinsic capabilities of team members from the outset. While existing reinforcement learning approaches show encouraging results, they might fall short in addressing the nuances of long-horizon ITA problems, particularly in settings with large-scale MH-MR teams or multifaceted tasks. To bridge this gap, we propose an attention-enhanced hierarchical reinforcement learning approach that decomposes the complex ITA problem into structured sub-problems, facilitating more efficient allocations. To bolster sub-policy learning, we introduce a hierarchical cross-attribute attention (HCA) mechanism, encouraging each sub-policy within the hierarchy to discern and leverage the specific nuances in the state space that are crucial for its respective decision-making phase. Through an extensive environmental surveillance case study, we demonstrate the benefits of our model and the HCA inside. 
    more » « less