skip to main content


Title: Identifying Correlates of Emergent Behaviors in Agent-Based Simulation Models Using Inverse Reinforcement Learning
In large agent-based models, it is difficult to identify the correlate system-level dynamics with individuallevel attributes. In this paper, we use inverse reinforcement learning to estimate compact representations of behaviors in large-scale pandemic simulations in the form of reward functions. We illustrate the capacity and performance of these representations identifying agent-level attributes that correlate with the emerging dynamics of large-scale multi-agent systems. Our experiments use BESSIE, an ABM for COVID-like epidemic processes, where agents make sequential decisions (e.g., use PPE/refrain from activities) based on observations (e.g., number of mask wearing people) collected when visiting locations to conduct their activities. The IRL-based reformulations of simulation outputs perform significantly better in classification of agent-level attributes than direct classification of decision trajectories and are thus more capable of determining agent-level attributes with definitive role in the collective behavior of the system. We anticipate that this IRL-based approach is broadly applicable to general ABMs.  more » « less
Award ID(s):
1918656
NSF-PAR ID:
10403991
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 Winter Simulation Conference (WSC)
Page Range / eLocation ID:
322 to 333
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Self-organized pattern behavior is ubiquitous throughout nature, from fish schooling to collective cell dynamics during organism development. Qualitatively these patterns display impressive consistency, yet variability inevitably exists within pattern-forming systems on both microscopic and macroscopic scales. Quantifying variability and measuring pattern features can inform the underlying agent interactions and allow for predictive analyses. Nevertheless, current methods for analyzing patterns that arise from collective behavior capture only macroscopic features or rely on either manual inspection or smoothing algorithms that lose the underlying agent-based nature of the data. Here we introduce methods based on topological data analysis and interpretable machine learning for quantifying both agent-level features and global pattern attributes on a large scale. Because the zebrafish is a model organism for skin pattern formation, we focus specifically on analyzing its skin patterns as a means of illustrating our approach. Using a recent agent-based model, we simulate thousands of wild-type and mutant zebrafish patterns and apply our methodology to better understand pattern variability in zebrafish. Our methodology is able to quantify the differential impact of stochasticity in cell interactions on wild-type and mutant patterns, and we use our methods to predict stripe and spot statistics as a function of varying cellular communication. Our work provides an approach to automatically quantifying biological patterns and analyzing agent-based dynamics so that we can now answer critical questions in pattern formation at a much larger scale. 
    more » « less
  2. Attributed network embedding aims to learn lowdimensional vector representations for nodes in a network, where each node contains rich attributes/features describing node content. Because network topology structure and node attributes often exhibit high correlation, incorporating node attribute proximity into network embedding is beneficial for learning good vector representations. In reality, large-scale networks often have incomplete/missing node content or linkages, yet existing attributed network embedding algorithms all operate under the assumption that networks are complete. Thus, their performance is vulnerable to missing data and suffers from poor scalability. In this paper, we propose a Scalable Incomplete Network Embedding (SINE) algorithm for learning node representations from incomplete graphs. SINE formulates a probabilistic learning framework that separately models pairs of node-context and node-attribute relationships. Different from existing attributed network embedding algorithms, SINE provides greater flexibility to make the best of useful information and mitigate negative effects of missing information on representation learning. A stochastic gradient descent based online algorithm is derived to learn node representations, allowing SINE to scale up to large-scale networks with high learning efficiency. We evaluate the effectiveness and efficiency of SINE through extensive experiments on real-world networks. Experimental results confirm that SINE outperforms state-of-the-art baselines in various tasks, including node classification, node clustering, and link prediction, under settings with missing links and node attributes. SINE is also shown to be scalable and efficient on large-scale networks with millions of nodes/edges and high-dimensional node features. 
    more » « less
  3. The landscapes of many elementary, middle, and high school math classrooms have undergone major transformations over the last half-century, moving from drill-and-skill work to more conceptual reasoning and hands-on manipulative work. However, if you look at a college level calculus class you are likely to find the main difference is the professor now has a whiteboard marker in hand rather than a piece of chalk. It is possible that some student work may be done on the computer, but much of it contains the same type of repetitive skill building problems. This should seem strange given the advancements in technology that allow more freedom than ever to build connections between different representations of a concept. Several class activities have been developed using a combination of approaches, depending on the topic. Topics covered in the activities include Riemann Sums, Accumulation, Center of Mass, Volumes of Revolution (Discs, Washers, and Shells), and Volumes of Similar Cross-section. All activities use student note outlines that are either done in a whole group interactive-lecture approach, or in a group work inquiry-based approach. Some of the activities use interactive graphs designed on desmos.com and others use physical models that have been designed in OpenSCAD and 3D-printed for students to use in class. Tactile objects were developed because they should provide an advantage to students by enabling them to physically interact with the concepts being taught, deepening their involvement with the material, and providing more stimuli for the brain to encode the learning experience. Web-based activities were developed because the topics involved needed substantial changes in graphical representations (i.e. limits with Riemann Sums). Assessment techniques for each topic include online homework, exams, and online concept questions with an explanation response area. These concept questions are intended to measure students’ ability to use multiple representations in order to answer the question, and are not generally computational in nature. Students are also given surveys to rate the overall activities as well as finer grained survey questions to try and elicit student thoughts on certain aspects of the models, websites, and activity sheets. We will report on student responses to the activity surveys, looking for common themes in students’ thoughts toward specific attributes of the activities. We will also compare relevant exam question responses and online concept question results, including common themes present or absent in student reasoning. 
    more » « less
  4. We consider concept generalization at a large scale in the diverse and natural visual spectrum. Established computational modes (i.e., rule-based or similarity-based) are primarily studied isolated and focus on confined and abstract problem spaces. In this work, we study these two modes when the problem space scales up, and the complexity of concepts becomes diverse. Specifically, at the representational level, we seek to answer how the complexity varies when a visual concept is mapped to the representation space. Prior psychology literature has shown that two types of complexities (i.e., subjective complexity and visual complexity) build an inverted-U relation. Leveraging the Representativeness of Attribute (RoA), we computationally confirm the following observation: Models use attributes with high RoA to describe visual concepts, and the description length falls in an inverted-U relation with the increment in visual complexity. At the computational level, we aim to answer how the complexity of representation affects the shift between the rule- and similarity-based generalization. We hypothesize that category-conditioned visual modeling estimates the co-occurrence frequency between visual and categorical attributes, thus potentially serving as the prior for the natural visual world. Experimental results show that representations with relatively high subjective complexity out-perform those with relatively low subjective complexity in the rule-based generalization, while the trend is the opposite in the similarity-based generalization. 
    more » « less
  5. Sequence stratigraphy is an observationally-based method for interpreting sedimentary cyclicity. Stacking patterns of progradation, retrogradation and degradation are related to the balance of sedimentary accommodation versus sediment supply. While often related to eustasy, accommodation is also controlled by tectono-subsidence. Based on over 50 global examples, regional subsidence and uplift rates are usually greater than rates of sea level rise/fall for durations greater than about one million years. Thus, in many basins, the larger scale patterns of sedimentary cyclicity are driven by tectonics. The Upper Cretaceous of the Western Interior is an ideal laboratory to evaluate stratigraphic response to tectono-subsidence. Based on the stratigraphic framework, geohistory analyses, mapped shorelines and interpreted 2nd order system tracts, there is a strong correlation between subsidence rates and shoreline trajectories/stacking patterns. Large scale transgressions correlate with marked increases in subsidence, while strongly regressive intervals correspond to periods of low subsidence (or uplift). For example, the widespread transgression that occurs above the Turonian (e.g., Niobrara-Baxter-Cody) is associated with a large increase in regional subsidence. And the strongly progradational interval in the Upper Campanian that occurs throughout Wyoming (e.g., Ericson-Pine Ridge-Teapot) corresponds with uplift in proximal areas and reduced subsidence rate in more distal areas. Moreover, the patterns of large-scale cyclicity changes along strike. A transect through the Green River to Powder River Basin shows a complicated large-scale stacking pattern with three complete 2nd order cycles in the Upper Cretaceous, correlative to regional subsidence/uplift events. A transect through the Uinta to North Park Basin has only two cycles, with much less complexity in the Campanian-Maastrichtian stacking and subsidence. To the south, the San Juan Basin has three cycles, but these are not coeval with those seen in the northern transects. Subsidence-driven large-scale cyclicity controls exploration play elements, especially reservoir-seal couplets. Along-strike variability in regional subsidence is important in controlling the petroleum system play elements of source, seal and reservoir. It also indicates variation in lithospheric architecture/processes. Drivers may include variations in the angle and nature of the subducting plate. 
    more » « less