<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Embracing Embodied Social Cognition in AI: Moving Away from Computational Theory of Mind</title></titleStmt>
			<publicationStmt>
				<publisher>ACM</publisher>
				<date>05/11/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10545917</idno>
					<idno type="doi">10.1145/3613905.3650998</idno>
					
					<author>Manoj Deshpande</author><author>Brian Magerko</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[As artificial intelligence becomes more integral to daily life, the need to design AI systems capable of understanding human interactions is increasingly important. This paper delves into the integration of social cognition in AI, tracing back to its historical foundations and examining seminal theories like Newell's Bands of Cognition, Minsky's Society of Mind, etc., which have emphasized the importance of social cognition since AI's inception. We highlight the shortcomings of traditional computational theory of mind approaches, particularly in their failure to capture the embodied nature of social cognition. Advocating for including embodied socio-cognitive perspectives, we draw on theories such as Participatory Sensemaking and frameworks like Observable Creative Sensemaking. The paper further demonstrates the practical implementation of these concepts in AI through two case studies: one in co-creative dance AI and another in text-to-image generative AI systems.
CCS CONCEPTS• Human-centered computing → Collaborative content creation; • Computing methodologies → Theory of mind.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The field of Human-Computer Interaction (HCI) has experienced a paradigm shift <ref type="bibr">[27]</ref>, moving from a focus on purely cognitive concerns to a more holistic approach in designing interactions focused on situated, embodied, and social factors <ref type="bibr">[17,</ref><ref type="bibr">43,</ref><ref type="bibr">52]</ref>. Artificial Intelligence (AI) is seemingly undergoing a similar transformation: while AI development historically concentrated on computational models of specific cognitive processes, it is now evolving to embrace holistic and embodied cognitive processes <ref type="bibr">[9,</ref><ref type="bibr">13,</ref><ref type="bibr">37]</ref>. Consequently, there is an increasing emphasis on developing AI systems that align more closely with the complexities of human experience <ref type="bibr">[33,</ref><ref type="bibr">49]</ref>.</p><p>Social cognition refers to the study of how individuals interpret and make sense of their interactions with others and themselves. It encompasses methods by which people perceive, assess, and classify their encounters in the social world <ref type="bibr">[18]</ref>. In AI, this translates to the development of systems that can not only process information but also understand and adapt to the social and cultural context of their interactions with humans. Integrating social cognition in AI, particularly for enhancing decision-making in human-AI teams and building shared understandings between humans and agents, is central to the current paradigm shift in AI development <ref type="bibr">[1,</ref><ref type="bibr">2,</ref><ref type="bibr">45,</ref><ref type="bibr">61]</ref>.</p><p>This paper aims to explore the historical context of AI development with respect to social cognition, critique the limitations of traditional computational models of social cognition, and argue for the inclusion of embodied socio-cognitive perspectives in AI, drawing upon theories such as Participatory Sensemaking (PSM) <ref type="bibr">[15]</ref>. Furthermore, this paper will examine how contemporary frameworks, mainly Observable Creative Sensemaking (OCSM) <ref type="bibr">[16]</ref>, practically offer pathways to integrate embodied social cognition perspectives into AI development. By bridging historical insights with modern theoretical frameworks, this paper aims to contribute to the discourse of the human-centered perspective of social cognition in AI systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">HISTORICAL THEORETICAL CONTEXT OF SOCIAL COGNITION IN AI</head><p>One of the primary objectives of AI has been to replicate human cognition, a concept embodied in the distinction between Weak AI (or Narrow AI) and Strong AI (or General AI). Weak AI refers to systems designed for specific tasks without emulating human cognition, while Strong AI aims to replicate human cognitive abilities <ref type="bibr">[44]</ref>. Pioneers like Simon, Newell, Minsky, Shanon, etc. laid the foundation for AI by developing cognitive models that mimic human problem-solving and reasoning, influencing the direction of AI research and development <ref type="bibr">[32,</ref><ref type="bibr">47,</ref><ref type="bibr">50,</ref><ref type="bibr">51]</ref>.In this section, we discuss some of the seminal theories related to social cognition in AI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Newell's Bands of Cognition</head><p>The consideration for social cognition in AI can be traced back to foundational works, notably Newell's Bands of Cognition. In his work Unified Theories of Cognition <ref type="bibr">[35]</ref>, Newell introduced a framework that categorizes cognitive processes across various scales, acknowledging the multifaceted nature of human cognition. This framework, known as Newell's Bands of Cognition, represents a layered approach to understanding cognition, ranging from the  <ref type="bibr">[29,</ref><ref type="bibr">35]</ref> biological to the social and cultural levels, each 'band' representing a different scale at which cognitive processes operate as shown in Figure <ref type="figure">1</ref>.</p><p>Newell's Bands of Cognition framework categorizes cognitive processes across increasingly longer temporal durations. The Biological Band, operating on milliseconds to seconds, focuses on the brain's neurophysiological processes underpinning cognition. The Cognitive Band-spanning seconds to minutes-deals with core cognitive processes like perception and problem-solving, central to cognitive psychology and traditional AI. The Rational Band, ranging from minutes to days, involves rational behavior and decisionmaking. The Social Band, extending from days to months, encompasses the influence of social interactions and cultural norms on cognition. Lastly, the Historical and Evolutionary Band, covering decades to centuries, examines the long-term evolution of cognitive processes, including language development and cultural shifts. Each band represents a distinct aspect and timescale of cognitive functioning.</p><p>Newell's Bands of Cognition framework acknowledges the complexity of cognitive processes and highlights the significance of social and cultural dimensions in human cognition. This early recognition in AI research emphasizes that social interactions and cultural contexts are integral to understanding human cognitive processes to develop systems that aim to replicate or complement human cognitive abilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Minsky's Society of Mind</head><p>Minsky provided another theoretical framework in Society of Mind for understanding social cognition in AI <ref type="bibr">[32]</ref>. Minsky proposes that the mind is not a single entity but rather a collection of smaller processes, which he terms 'agents. ' These agents work independently and collaboratively, like a society, to produce what we perceive as thought, consciousness, and intelligence. This theory suggests that complex cognitive functions, including those involved in social cognition, emerge from the interactions and cooperation of simpler, specialized mental processes.</p><p>Minsky's concept offers a framework for developing more advanced and socially aware AI systems. By emulating the societal structure of the mind, AI systems could potentially replicate the human ability to understand and navigate social contexts. This involves creating AI agents that can perform specific cognitive tasks and enabling them to interact and integrate their functionalities to exhibit complex social cognition. Minsky's model implies that social cognition in AI could be achieved not through a single, all-encompassing algorithm but through the orchestrated functioning of multiple specialized agents, each contributing to an overall understanding of social dynamics, emotions, and interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Brooks' Subsumption Architecture</head><p>Rodney Brooks, an early advocate for embodied cognition in robotics and AI, introduced the Subsumption Architecture, a bottom-up, behavior-based approach <ref type="bibr">[4]</ref> for AI logic. This architecture features layered control systems, with each layer handling a specific behavior and independently interacting with the robot's sensors and actuators. Its decentralized design focuses on reactive behaviors, enabling robots to adaptively respond to environmental stimuli <ref type="bibr">[6]</ref>. The architecture's strength lies in its incremental development and the emergence of complex behaviors from simple interactions. Although initially intended for physical robot-environment interactions, the principles of the Subsumption Architecture, like decentralized control and emergent behavior, are applicable to social cognition in AI, suggesting that complex social behaviors in AI could arise from simpler, socially focused behavioral modules <ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">COMPUTATIONAL MODELS OF SOCIAL COGNITION VIA THEORY OF MIND</head><p>In the previous section, we laid out the historical foundation for social cognition in AI. In this section, we describe the prevalent computational approach to incorporating social cognition in AI, specifically through the lens of Theory of Mind (ToM). In cognitive science, ToM has historically been instrumental in understanding human social cognition. According to ToM, an individual can make inferences about others' mental states in a social situation and act accordingly <ref type="bibr">[39]</ref>.</p><p>Various approaches have been developed to create AI systems with artificial theory of mind over the years. One of the early approaches is the Computational Theory of Mind (CTM) <ref type="bibr">[41]</ref>. Initially suggested by Warren McCulloch and Walter Pitts in 1943 <ref type="bibr">[30]</ref>, posits that the human mind functions like a computer (turning machine <ref type="bibr">[56]</ref>), and cognitive processes, including thinking, reasoning, and problem-solving, can be modeled as information processing systems. Hilary Putnam introduced CTM into philosophy, advocating for machine functionalism, which identifies mental states with machine states of a probabilistic automaton <ref type="bibr">[40]</ref>.Later, Fodor combined CTM with the Representational Theory of Mind (RTM), focusing on mental representations and proposing that mental activity involves Turing-style computation over a language of thought <ref type="bibr">[19]</ref>. Overall, CTM conceptualizes thoughts as software running on the hardware of the brain, a perspective that has guided much of AI research and cognitive psychology <ref type="bibr">[20,</ref><ref type="bibr">32]</ref>.</p><p>Integrating ToM into AI represents is significant area of interest even in the current landscape of AI development, attracting considerable interest among scholars. This line of thought, which posits that AI systems equipped with ToM could understand and interpret the mental states of others, is seen as a crucial step towards more advanced, socially aware AI <ref type="bibr">[10,</ref><ref type="bibr">60]</ref>. Various methods are being explored to develop CTM in AI. These include applying inverse reinforcement learning while treating CTM as a multi-agent problem <ref type="bibr">[7]</ref>, using game theory for CTM formulation <ref type="bibr">[23]</ref>, and employing Bayesian inference to enable agents to form mental models through observation <ref type="bibr">[59]</ref>. Additionally, theoretical frameworks like the mutual theory of mind are being developed to enhance long-term human-AI interactions <ref type="bibr">[57]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">EMBODIED SOCIAL COGNITION THEORIES</head><p>The theories and frameworks discussed earlier, while important, exhibit notable limitations <ref type="bibr">[25,</ref><ref type="bibr">34,</ref><ref type="bibr">46]</ref>. One primary limitation is their tendency to conceptualize the mind as an isolated entity, capable of simulating, theorizing, and inferring others' mental states solely through observation. This perspective, known as the innerworld hypothesis, has been critiqued by Fuchs and De Jaegher <ref type="bibr">[21]</ref> for its oversimplification of mental processes. Additionally, theories such as ToM often localize social cognition within a single participant's mind, assuming a third-person observational stance rather than an interactive one. Another critical shortcoming of these approaches is their lack of emphasis on embodiment. They adhere to cartesian dualism, the separation of mind and body as proposed by Ren&#233; Descartes, leading to a reductive view of social cognition. In contrast, the embodied social cognition approach, as discussed by Meier et al. <ref type="bibr">[31]</ref>, and Niedenthal et al. <ref type="bibr">[36]</ref>, emphasizes the integral role of physical embodiment in social interactions, moving beyond the view of the body as merely a transmission device between two 'cartesian minds. '</p><p>These limitations are also highlighted in the works of Suchman and Damiso. Suchman's work in "Plans and Situated Actions" critiques the symbolic, plan-based AI systems for their failure to account for the situated nature of human interaction <ref type="bibr">[53]</ref>. She argues that human actions are more contingent and emergent in real-world contexts than what is often assumed in symbolic AI models. Similarly, Damasio argues against the traditional separation of emotion and reason, showing that emotions are integral to rational thinking <ref type="bibr">[12]</ref>. He demonstrates through neurological studies how emotional aspects are crucial for effective decision-making, challenging the cartesian dualism of separating mind and body. These critiques highlight the need for AI to integrate more holistic models that encompass not only computational aspects but also the embodied and socially situated nature of cognition <ref type="bibr">[54]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Enactivism</head><p>Embodied cognition, particularly enactivism, moves away from dualism by positing that cognition emerges from sensorimotor activity and interaction with the environment rather than from passive observation <ref type="bibr">[48]</ref>. Enactivism is categorized into three varieties: autopoietic, sensorimotor, and radical <ref type="bibr">[58]</ref>. Autopoietic enactivism, as described by Di Paolo and Thompson <ref type="bibr">[54]</ref>, views cognition as an organism's active modification of its relationship with the environment to maintain its identity, blurring the line between mental and non-mental processes. Sensorimotor enactivism focuses on cognition by actively exploring the environment and forming sensorimotor dependencies, essentially 'thinking by doing' <ref type="bibr">[48]</ref>. Radical enactivism, on the other hand, rejects the notion of mental states and internal representations, arguing that cognition is simply dynamic, adaptive interactions with the environment <ref type="bibr">[8,</ref><ref type="bibr">11,</ref><ref type="bibr">58]</ref>. This approach analyzes cognition through the interplay of biological, sensorimotor, and social dynamics without relying on internal mental representations. 4.2 Participatory Sensemaking (PSM) Participatory Sensemaking (PSM) is a cognitive framework by Di Paolo and De Jaegher, grounded in enactive cognition, to understand social cognition. It positions itself close to autopoietic enactivism but incorporates elements from sensorimotor enactivism.</p><p>PSM emphasizes the embodiment of interaction, evolving levels of autonomous identity, joint sensemaking, and experience, focusing on how understanding is collaboratively built through physical exploration and interaction <ref type="bibr">[15]</ref>.</p><p>From an enactive standpoint, social cognition is seen as a byproduct of social interaction, where participants unconsciously coordinate movements and speech, akin to coupled physical and biological systems <ref type="bibr">[15,</ref><ref type="bibr">54]</ref>. PSM is formally defined as "The coordination of intentional activity in interaction, whereby individual sensemaking processes are affected, and new domains of social sensemaking can be generated that were not available to each individual on their own" <ref type="bibr">[15]</ref>.</p><p>The effect of coordination and interaction on sensemaking can be analyzed through different degrees of participation in social interaction (Figure-2), as outlined by Di Paolo and De Jaegher <ref type="bibr">[15]</ref>. For example, in a scenario where two individuals build with Lego blocks, individual sensemaking occurs when they work independently without shared meaning. Orientational sensemaking arises when one individual influences or is influenced by the other's building approach, leading to an exchange or modification of ideas. This spectrum illustrates how participation levels affect the emergence of shared understanding in social interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">MOVING TOWARDS EMBODIED SOCIAL COGNITION IN AI</head><p>Contemporary AI research is increasingly acknowledging the importance of embodiment and social context. This shift is evident in developing AI systems designed to understand and interact with humans in more nuanced and contextually aware ways. For instance, in robotics, researchers are focusing on developing socially aware robots that can understand and respond to human emotions and social dynamics, as seen in the work of Breazeal <ref type="bibr">[3]</ref> on sociable robots. Additionally, AI in healthcare is being tailored to consider patients' emotional and social contexts, enhancing the effectiveness of AI-assisted therapies and diagnostics <ref type="bibr">[38]</ref>. In computational cocreativity, systems like Drawing Apprentice <ref type="bibr">[14]</ref>, Shimon <ref type="bibr">[24]</ref>, etc. have been developed to enable embodied co-creation with AI agents. Scholars have also proposed interaction frameworks to support embodied meaning construction through bidirectional communication between AI and humans in co-creative tasks <ref type="bibr">[22,</ref><ref type="bibr">42]</ref>.</p><p>Integrating embodied social cognition into AI requires reliable methods for understanding and quantifying the dynamics of interaction. As social cognition involves more than just processing information; it also encompasses interpreting and responding to complex and dynamic social interactions. The Observable Creative Sensemaking (OCSM) <ref type="bibr">[16]</ref> framework is one example of how this can be achieved. While it's not the only method for facilitating social cognition in AI, frameworks like OCSM offer valuable tools for developing AI that can navigate and respond to the nuances of social interactions. Drawing on principles from Participatory Sensemaking (PSM) <ref type="bibr">[15]</ref>, OCSM offers a methodology for quantifying interaction dynamics by focusing on observable behavioral states within creative processes, especially in nuanced, non-verbal, and embodied contexts.</p><p>OCSM functions based on three key observable behavioral dimensions: participation, newness, and appropriateness, each assessed on a 4-point qualitative scale as shown in Figure <ref type="figure">-3</ref>. The scale for 'participation' assesses the degree of involvement and contribution of individuals in the creative process, ranging from individual exploration to collaborative engagement. 'newness' measures the novelty of contributions within the context of the interaction, ranging from repetition of a previous idea to introducing a new idea. Lastly, 'appropriateness' gauges the relevance and suitability of contributions to the context of the interaction, from being entirely off-topic to highly pertinent. This structure enables the continuous and systematic quantification of interaction dynamics over time, effectively capturing the nuances of social interactions. Its focus on observable behavioral states facilitates the integration of OCSM into AI agents, enabling the measurement of these critical dimensions in real-time interactions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">CASE STUDIES</head><p>In this section, we present two case studies to illustrate the practical application of OCSM in expanding AI capabilities by integrating embodied social cognition. The first case study details our ongoing project using the OCSM to improve decision-making in an embodied co-creative dance AI. The second, a speculative study, explores applying OCSM to text-to-image AI systems for enhanced socio-cognitive abilities.</p><p>6.1 Case study-1: Social cognition through OCSM for LuminAI 6.1.1 Background: Our co-creative dance AI application (shown in Figure <ref type="figure">-4</ref>), called LuminAI <ref type="bibr">[26,</ref><ref type="bibr">28]</ref>, integrates OCSM into its operations, utilizing its five-module software design: perception, description, learning, transformation, and selection <ref type="bibr">[55]</ref>. This integration occurs across both training and interaction phases, enhancing interactive dance capabilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.2">Objective:</head><p>The goal is to evolve the co-creative dance AI agent into a more intuitive and responsive dance partner that can understand and adapt to the creative process of human dancers through OCSM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.3">Implementation:</head><p>During the training phase, the co-creative dance AI system employs three key modules:</p><p>&#8226; Perception Module: This module processes a dataset of videos featuring dancers improvising. It focuses on distinguishing the dancer from the background and identifying keyframes based on the dancer's movements.</p><p>&#8226; Description Module: Here, expert dancers create a dataset by annotating the keyframe movements with OCSM state descriptors. &#8226; Learning Module: In this module, we utilize the dataset created earlier to train a neural network to recognize body actions and OCSM state descriptors. This training is vital because the neural network learns to categorize the dataset into distinct clusters and has the ability to attribute similar OCSM state descriptors during a live interaction, thus enabling social cognition based on observable embodied behavioral states.</p><p>In the interaction phase, the co-creative dance AI system utilizes the remaining modules:</p><p>&#8226; Transformation Module: As dancers perform, a Kinect sensor captures their movements. The transformation module utilizes the trained model from the training phase to apply OCSM descriptors to understand the ongoing interaction by determining the appropriate movement cluster.</p><p>&#8226; Selection Module: This module selects the most similar sequence from the identified movement cluster based on the observed body action sequence. It ensures that the AI agent's responses are aligned with the ongoing interaction dynamics. 6.1.4 Outcome: The use of OCSM in co-creative dance AI demonstrates its effectiveness in both descriptive and generative capacities. In the training phase, OCSM is used for labeling and categorizing dance movements. During live interaction, it serves as a heuristic to guide real-time improvisational responses. This showcases the advantages of integrating embodied social cognition in AI via OCSM, enabling it to effectively comprehend and respond to the context of ongoing interactions. 6.2 Case Study-2: Enhancing a Text-to-Image Generative AI Model with OCSM 6.2.1 Background: A text-to-image generative AI model, like Midjourney, typically creates visual content from textual prompts. In this speculative case study, we propose integrating the OCSM into Midjourney to enhance its understanding of the creative process. 6.2.2 Objective: The goal is to speculatively transform Midjourney into a co-creative AI partner that can adapt to and participate in the user's creative journey rather than simply acting as a tool for image generation. 6.2.3 Implementation: The implementation of OCSM into Midjourney might involve the following three interaction mechanisms-&#8226; Prompt Understanding: Midjourney uses the OCSM framework to evaluate 'Newness' and 'Appropriateness' in user prompts. The AI model gauges these parameters to understand the user's creative intent and expectations. &#8226; Adaptive Image Generation: The AI model not only analyzes prompts for 'Newness' and 'Appropriateness' but also engages in dialogue with users. This interaction, aiming for 'Orientational Sensemaking', helps align the AI's output with the user's evolving creative direction, adapting the image generation process accordingly. &#8226; Interactive Feedback Loop: Through ongoing dialogue, Midjourney refines its understanding of 'Newness' and 'Appropriateness' based on user feedback and generates multiple image options. This collaborative process, moving towards 'Joint Sensemaking', allows the AI to adjust its outputs, enhancing the co-creative experience with the user. 6.2.4 Outcome: Integrating OCSM into Midjourney enables AI to participate actively in the creative process. The model becomes more adept at interpreting creative prompts, engaging in a dynamic dialogue with users, and producing images that are not only contextually relevant but also aligned with the user's evolving creative journey. This case study demonstrates the potential of social cognition via OCSM in enriching the capabilities of generative AI models, particularly in creative and interactive applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">DISCUSSION AND CONCLUSION</head><p>Integrating social cognition into AI presents significant challenges, particularly in accurately interpreting nuanced human intentions. Approaches like the computational theory of mind, while useful, have limitations, notably in their lack of emphasis on embodied and situated interaction. Theories such as participatory sensemaking and frameworks like OCSM offer solutions to these issues but depend heavily on extensive annotated data from domain experts. Furthermore, implementing models like OCSM in AI involves complex processing of non-verbal cues, which brings about technical difficulties and ethical concerns, especially regarding privacy and autonomy. Therefore, developing ethical frameworks and extending the use of models like OCSM across various domains is crucial for creating AI systems that are not only technically advanced but also socially and ethically attuned.</p><p>In this paper, we have explored the evolving landscape of Artificial Intelligence (AI), focusing on integrating social cognition. We have traced AI's historical roots in emulating human cognition and examined foundational theories like Bands of Cognition, Society of Mind, and Subsumption Architecture, highlighting the importance of social cognition in AI. Additionally, we discussed the prevalent approach of using computational models of the theory of mind for social cognition in AI, pointing out their limitations in addressing the social and embodied aspects of human cognition. We introduced alternative frameworks like PSM and OCSM, demonstrating their application in AI through case studies such as co-creative dance AI and potential uses in text-to-image AI systems.</p></div></body>
		</text>
</TEI>
