In natural language processing, most models try to learn semantic representations merely from texts. The learned representations encode the “distributional semantics” but fail to connect to any knowledge about the physical world. In contrast, humans learn language by grounding concepts in perception and action and the brain encodes “grounded semantics” for cognition. Inspired by this notion and recent work in vision-language learning, we design a two-stream model for grounding language learning in vision. The model includes a VGG-based visual stream and a Bert-based language stream. The two streams merge into a joint representational space. Through cross-modal contrastive learning, the model first learns to align visual and language representations with the MS COCO dataset. The model further learns to retrieve visual objects with language queries through a cross-modal attention module and to infer the visual relations between the retrieved objects through a bilinear operator with the Visual Genome dataset. After training, the model’s language stream is a stand-alone language model capable of embedding concepts in a visually grounded semantic space. This semantic space manifests principal dimensions explainable with human intuition and neurobiological knowledge. Word embeddings in this semantic space are predictive of human-defined norms of semantic features and are segregated into perceptually distinctive clusters. Furthermore, the visually grounded language model also enables compositional language understanding based on visual knowledge and multimodal image search with queries based on images, texts, or their combinations. 
                        more » 
                        « less   
                    
                            
                            Assessing the Alignment Between Word Representations in the Brain and Large Language Models
                        
                    
    
            Recent developments in using Large Language Models (LLMs) to predict and align with neural representations of language can be applied to achieving a future vision of design tools that enable detection and reconstruction of designers’ mental representations of ideas. Prior work has largely explored this relationship during passive language tasks only, e.g., reading or listening. In this work, the relationship between brain activation data (functional imaging, fMRI) during appropriate and novel word association generation and LLM (Llama-2 7b) word representations is tested using Representational Similarity Analysis (RSA). Findings suggest that LLM word representations align with brain activity captured during novel word association, but not when forming appropriate associates. Association formation is one cognitive process central to design. By demonstrating that brain activity during this task can align with LLM word representations, insights from this work encourage further investigation into this relationship during more complex design ideation processes. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2145432
- PAR ID:
- 10560521
- Editor(s):
- Gero, JS
- Publisher / Repository:
- Design Computing and Cognition’24
- Date Published:
- ISBN:
- 978-3-031-71922-6
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Active sampling in the olfactory domain is an important aspect of mouse behaviour, and there is increasing evidence that respiration-entrained neural activity outside of the olfactory system sets an important global brain rhythm. It is therefore important to accurately measure breathing during natural behaviours. We develop a new approach to do this in freely moving animals, by implanting a telemetry-based pressure sensor into the right jugular vein, which allows for wireless monitoring of thoracic pressure. After verifying this technique against standard head-fixed respiration measurements, we combined it with EEG and EMG recording and used evolving partial coherence analysis to investigate the relationship between respiration and brain activity across a range of experiments in which the mice could move freely. During voluntary exploration of odours and objects, we found that the association between respiration and cortical delta and theta rhythms decreased, while the association between respiration and cortical alpha rhythm increased. During sleep, however, the presentation of an odour was able to cause a transient increase in sniffing without changing dominant sleep rhythms (delta and theta) in the cortex. Our data align with the emerging idea that the respiration rhythm could act as a synchronising scaffold for specific brain rhythms during wakefulness and exploration, but suggest that respiratory changes are less able to impact brain activity during sleep. Combining wireless respiration monitoring with different types of brain recording across a variety of behaviours will further increase our understanding of the important links between active sampling, passive respiration, and neural activity.more » « less
- 
            What is the relationship between language and event cognition? Past work has suggested that linguistic/aspectual distinctions encoding the internal temporal profile of events map onto nonlinguistic event representations. Here, we use a novel visual detection task to directly test the hypothesis that processing telic versus atelic sentences (e.g., “Ebony folded a napkin in 10 seconds” vs. “Ebony did some folding for 10 seconds”) can influence whether the very same visual event is processed as containing distinct temporal stages including a well‐defined endpoint or lacking such structure, respectively. In two experiments, we show that processing (a)telicity in language shifts how people later construe the temporal structure of identical visual stimuli. We conclude that event construals are malleable representations that can align with the linguistic framing of events.more » « less
- 
            Background/Objectives: The Implicit Prosody Hypothesis (IPH) posits that individuals generate internal prosodic representations during silent reading, mirroring those produced in spoken language. While converging behavioral evidence supports the IPH, the underlying neurocognitive mechanisms remain largely unknown. Therefore, this study investigated the neurophysiological markers of sensitivity to speech rhythm cues during silent word reading. Methods: EEGs were recorded while participants silently read four-word sequences, each composed of either trochaic words (stressed on the first syllable) or iambic words (stressed on the second syllable). Each sequence was followed by a target word that was either metrically congruent or incongruent with the preceding rhythmic pattern. To investigate the effects of metrical expectancy and lexical stress type, we examined single-trial event-related potentials (ERPs) and time–frequency representations (TFRs) time-locked to target words. Results: The results showed significant differences based on the stress pattern expectancy and type. Specifically, words that carried unexpected stress elicited larger ERP negativities between 240 and 628 ms after the word onset. Furthermore, different frequency bands were sensitive to distinct aspects of the rhythmic structure in language. Alpha activity tracked the rhythmic expectations, and theta and beta activities were sensitive to both the expected rhythms and specific locations of the stressed syllables. Conclusions: The findings clarify neurocognitive mechanisms of phonological and lexical mental representations during silent reading using a conservative data-driven approach. Similarity with neural response patterns previously reported for spoken language contexts suggests shared neural networks for implicit and explicit speech rhythm processing, further supporting the IPH and emphasizing the centrality of prosody in reading.more » « less
- 
            Physics-based simulations are essential for designing autonomous construction equipment, but preparing models is time-consuming, requiring the integration of mechanical and geometric data. Current automatic modeling methods for modular robots are inadequate for construction equipment. This paper explores automating the modeling process by integrating mechanical data into 3D computer-aided design (CAD) models. A template library is developed with hierarchy and joint templates specific for equipment. During model generation, appropriate templates are selected based on the equipment type. Unspecified joint template data is extracted from technical specifications using a large language model (LLM). The 3D CAD model is then converted into a Universal Scene Description (USD) model. Users can adjust the part names and hierarchy within the USD model to align with the hierarchy template, and joint data is automatically integrated, resulting in a simulation-ready model. This method reduces modeling time by over 87 % compared to manual methods, while maintaining accuracy.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    