A fundamental aspect of human cognition is the ability to parse our constantly unfolding experience into meaningful representations of dynamic events and to communicate about these events with others. How do we communicate about events we have experienced? Influential theories of language production assume that the formulation and articulation of a linguistic message is preceded by preverbal apprehension that captures core aspects of the event. Yet the nature of these preverbal event representations and the way they are mapped onto language are currently not well understood. Here, we review recent evidence on the link between event conceptualization and language, focusing on two core aspects of event representation: event roles and event boundaries. Empirical evidence in both domains shows that the cognitive representation of events aligns with the way these aspects of events are encoded in language, providing support for the presence of deep homologies between linguistic and cognitive event structure.
more »
« less
Aspectual Processing Shifts Visual Event Apprehension
What is the relationship between language and event cognition? Past work has suggested that linguistic/aspectual distinctions encoding the internal temporal profile of events map onto nonlinguistic event representations. Here, we use a novel visual detection task to directly test the hypothesis that processing telic versus atelic sentences (e.g., “Ebony folded a napkin in 10 seconds” vs. “Ebony did some folding for 10 seconds”) can influence whether the very same visual event is processed as containing distinct temporal stages including a well‐defined endpoint or lacking such structure, respectively. In two experiments, we show that processing (a)telicity in language shifts how people later construe the temporal structure of identical visual stimuli. We conclude that event construals are malleable representations that can align with the linguistic framing of events.
more »
« less
- Award ID(s):
- 2041171
- PAR ID:
- 10568240
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Cognitive Science
- Volume:
- 48
- Issue:
- 6
- ISSN:
- 0364-0213
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Perlman, Marcus (Ed.)Longstanding cross-linguistic work on event representations in spoken languages have argued for a robust mapping between an event’s underlying representation and its syntactic encoding, such that–for example–the agent of an event is most frequently mapped to subject position. In the same vein, sign languages have long been claimed to construct signs that visually represent their meaning, i.e., signs that are iconic. Experimental research on linguistic parameters such as plurality and aspect has recently shown some of them to be visually universal in sign, i.e. recognized by non-signers as well as signers, and have identified specific visual cues that achieve this mapping. However, little is known about what makes action representations in sign language iconic, or whether and how the mapping of underlying event representations to syntactic encoding is visually apparent in the form of a verb sign. To this end, we asked what visual cues non-signers may use in evaluating transitivity (i.e., the number of entities involved in an action). To do this, we correlated non-signer judgments about transitivity of verb signs from American Sign Language (ASL) with phonological characteristics of these signs. We found that non-signers did not accurately guess the transitivity of the signs, but that non-signer transitivity judgments can nevertheless be predicted from the signs’ visual characteristics. Further, non-signers cue in on just those features that code event representations across sign languages, despite interpreting them differently. This suggests the existence of visual biases that underlie detection of linguistic categories, such as transitivity, which may uncouple from underlying conceptual representations over time in mature sign languages due to lexicalization processes.more » « less
-
Video Paragraph Captioning aims to generate a multi-sentence description of an untrimmed video with multiple temporal event locations in a coherent storytelling. Following the human perception process, where the scene is effectively understood by decomposing it into visual (e.g. human, animal) and non-visual components (e.g. action, relations) under the mutual influence of vision and language, we first propose a visual-linguistic (VL) feature. In the proposed VL feature, the scene is modeled by three modalities including (i) a global visual environment; (ii) local visual main agents; (iii) linguistic scene elements. We then introduce an autoregressive Transformer-in-Transformer (TinT) to simultaneously capture the semantic coherence of intra- and inter-event contents within a video. Finally, we present a new VL contrastive loss function to guarantee the learnt embedding features are consistent with the captions semantics. Comprehensive experiments and extensive ablation studies on the ActivityNet Captions and YouCookII datasets show that the proposed Visual-Linguistic Transformer-in-Transform (VLTinT) outperforms previous state-of-the-art methods in terms of accuracy and diversity. The source code is made publicly available at: https://github.com/UARK-AICV/VLTinT.more » « less
-
Abstract Statistical learning (SL) is the ability to detect and learn regularities from input and is foundational to language acquisition. Despite the dominant role of SL as a theoretical construct for language development, there is a lack of direct evidence supporting the shared neural substrates underlying language processing and SL. It is also not clear whether the similarities, if any, are related to linguistic processing, or statistical regularities in general. The current study tests whether the brain regions involved in natural language processing are similarly recruited during auditory, linguistic SL. Twenty-two adults performed an auditory linguistic SL task, an auditory nonlinguistic SL task, and a passive story listening task as their neural activation was monitored. Within the language network, the left posterior temporal gyrus showed sensitivity to embedded speech regularities during auditory, linguistic SL, but not auditory, nonlinguistic SL. Using a multivoxel pattern similarity analysis, we uncovered similarities between the neural representation of auditory, linguistic SL, and language processing within the left posterior temporal gyrus. No other brain regions showed similarities between linguistic SL and language comprehension, suggesting that a shared neurocomputational process for auditory SL and natural language processing within the left posterior temporal gyrus is specific to linguistic stimuli.more » « less
-
When listening to speech, our brain responses time lock to acoustic events in the stimulus. Recent studies have also reported that cortical responses track linguistic representations of speech. However, tracking of these representations is often described without controlling for acoustic properties. Therefore, the response to these linguistic representations might reflect unaccounted acoustic processing rather than language processing. Here, we evaluated the potential of several recently proposed linguistic representations as neural markers of speech comprehension. To do so, we investigated EEG responses to audiobook speech of 29 participants (22 females). We examined whether these representations contribute unique information over and beyond acoustic neural tracking and each other. Indeed, not all of these linguistic representations were significantly tracked after controlling for acoustic properties. However, phoneme surprisal, cohort entropy, word surprisal, and word frequency were all significantly tracked over and beyond acoustic properties. We also tested the generality of the associated responses by training on one story and testing on another. In general, the linguistic representations are tracked similarly across different stories spoken by different readers. These results suggests that these representations characterize the processing of the linguistic content of speech. SIGNIFICANCE STATEMENT For clinical applications, it would be desirable to develop a neural marker of speech comprehension derived from neural responses to continuous speech. Such a measure would allow for behavior-free evaluation of speech understanding; this would open doors toward better quantification of speech understanding in populations from whom obtaining behavioral measures may be difficult, such as young children or people with cognitive impairments, to allow better targeted interventions and better fitting of hearing devices.more » « less
An official website of the United States government

