skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Curious Case of Combining Text and Visualization
Visualization research has made significant progress in demonstrating the value of graphical data representation. Even still, the value added by static visualization is disputed in some areas. When presenting Bayesian reasoning information, for example, some studies suggest that combining text and visualizations could have an interactive effect. In this paper, we use eye tracking to compare how people extract information from text and visualization. Using a Bayesian reasoning problem as a test bed, we provide evidence that visualization makes it easier to identify critical information, but that once identified as critical, information is more easily extracted from the text. These tendencies persist even when text and visualization are presented together, indicating that users do not integrate information well across the two representation types. We discuss these findings and argue that effective representations should consider the ease of both information identification and extraction.  more » « less
Award ID(s):
1755734
PAR ID:
10334141
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
EuroVis 2019 - Short Papers
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Spatial reasoning over text is challenging as the models not only need to extract the direct spatial information from the text but also reason over those and infer implicit spatial relations. Recent studies highlight the struggles even large language models encounter when it comes to performing spatial reasoning over text. In this paper, we explore the potential benefits of disentangling the processes of information extraction and reasoning in models to address this challenge. To explore this, we design various models that disentangle extraction and reasoning(either symbolic or neural) and compare them with state-of-the-art(SOTA) baselines with no explicit design for these parts. Our experimental results consistently demonstrate the efficacy of disentangling, showcasing its ability to enhance models{'} generalizability within realistic data domains. 
    more » « less
  2. This thesis investigates the computational modeling of belief and related cognitive states as expressed in text and speech. Understanding how speakers or authors convey commitment, certainty, and emotions is crucial for language understanding, yet poses significant challenges for current NLP systems. We present a comprehensive study spanning multiple facets of belief prediction. We begin by re-examining the widely used FactBank corpus, correcting a critical projection error and establishing new state-of-the-art results for author-only belief prediction through multi-task learning and error analysis. We then tackle the more complex task of source-and-target belief prediction, introducing a novel generative framework using Flan-T5. This includes developing a structured database representation for FactBank and proposing a linearized tree generation approach, culminating in the BeLeaf system for visualization and analysis, which achieves state-of-the-art performance on both FactBank and the MDP corpus. With the rise of large language models (LLMs), we investigate their zero-shot capabilities for the source-and-target belief task. We propose Unified and Hybrid prompting frameworks, finding that while current LLMs struggle, particularly with nested beliefs, our Hybrid approach paired with reasoning-focused LLMs achieves new state-of-the-art results on FactBank. Finally, we explore the role of multimodality among multiple cognitive states. We present the first study on multimodal belief prediction using the CB-Prosody corpus, demonstrating that integrating audio features via fine-tuned Whisper models significantly improves performance over text-only BERT models. We further introduce Synthetic Audio Data (SAD), showing that even synthetic audio generated by TTS systems provides orthogonal, beneficial signals for various cognitive state tasks (belief, emotion, sentiment). We conclude by presenting OmniVox, the first systematic evaluation of omni-LLMs for zero-shot emotion recognition directly from audio, demonstrating their competitiveness with fine-tuned models and analyzing their acoustic reasoning capabilities. 
    more » « less
  3. Proc. 2023 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (Ed.)
    Representation learning on networks aims to derive a meaningful vector representation for each node, thereby facilitating downstream tasks such as link prediction, node classification, and node clustering. In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure. As pretrained language models (PLMs) have demonstrated their effectiveness in obtaining widely generalizable text representations, a substantial amount of effort has been made to incorporate PLMs into representation learning on text-rich networks. However, few of them can jointly consider heterogeneous structure (network) information as well as rich textual semantic information of each node effectively. In this paper, we propose Heterformer, a Heterogeneous Network-Empowered Transformer that performs contextualized text encoding and heterogeneous structure encoding in a unified model. Specifically, we inject heterogeneous structure information into each Transformer layer when encoding node texts. Meanwhile, Heterformer is capable of characterizing node/edge type heterogeneity and encoding nodes with or without texts. We conduct comprehensive experiments on three tasks (i.e., link prediction, node classification, and node clustering) on three large-scale datasets from different domains, where Heterformer outperforms competitive baselines significantly and consistently. 
    more » « less
  4. Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, notably in connecting ideas and adhering to logical rules to solve problems. These models have evolved to accommodate various data modalities, including sound and images, known as multimodal LLMs (MLLMs), which are capable of describing images or sound recordings. Previous work has demonstrated that when the LLM component in MLLMs is frozen, the audio or visual encoder serves to caption the sound or image input facilitating text-based reasoning with the LLM component. We are interested in using the LLM's reasoning capabilities in order to facilitate classification. In this paper, we demonstrate through a captioning/classification experiment that an audio MLLM cannot fully leverage its LLM's text-based reasoning when generating audio captions. We also consider how this may be due to MLLMs separately representing auditory and textual information such that it severs the reasoning pathway from the LLM to the audio encoder. 
    more » « less
  5. Knowledge representation and reasoning (KRR) is key to the vision of the intelligent Web. Unfortunately, wide deployment of KRR is hindered by the difficulty in specifying the requisite knowledge, which requires skills that most domain experts lack. A way around this problem could be to acquire knowledge automatically from documents. The difficulty is that, KRR requires high-precision knowledge and is sensitive even to small amounts of errors. Although most automatic information extraction systems developed for general text understandings have achieved remarkable results, their accuracy is still woefully inadequate for logical reasoning. A promising alternative is to ask the domain experts to author knowledge in Controlled Natural Language (CNL). Nonetheless, the quality of knowledge construc- tion even through CNL is still grossly inadequate, the main obstacle being the multiplicity of ways the same information can be described even in a controlled language. Our previous work addressed the problem of high accuracy knowledge authoring for KRR from CNL documents by introducing the Knowledge Au- thoring Logic Machine (KALM). This paper develops the query aspect of KALM with the aim of getting high precision answers to CNL questions against previously authored knowledge and is tolerant to linguistic variations in the queries. To make queries more expressive and easier to formulate, we propose a hybrid CNL, i.e., a CNL with elements borrowed from formal query languages. We show that KALM achieves superior accuracy in semantic parsing of such queries. 
    more » « less