skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Exploring the Capability of LLMs in Performing Low-Level Visual Analytic Tasks on SVG Data Visualizations
Data visualizations help extract insights from datasets, but reaching these insights requires decomposing high level goals into low-level analytic tasks that can be complex due to varying degrees of data literacy and visualization experience. Recent advancements in large language models (LLMs) have shown promise for lowering barriers for users to achieve tasks such as writing code and may likewise facilitate visualization insight. Scalable Vector Graphics (SVG), a text-based image format common in data visualizations, matches well with the text sequence processing of transformer-based LLMs. In this paper, we explore the capability of LLMs to perform 10 low-level visual analytic tasks defined by Amar, Eagan, and Stasko directly on SVG-based visualizations. Using zero-shot prompts, we instruct the models to provide responses or modify the SVG code based on given visualizations. Our findings demonstrate that LLMs can effectively modify existing SVG visualizations for some tasks like Cluster but perform poorly on tasks requiring mathematical operations like Compute Derived Value. We also discovered that LLM performance can vary based on factors such as the number of data points, the presence of value labels, and the chart type. Our findings contribute to gauging the general capabilities of LLMs and highlight the need for further exploration and development to fully harness their potential in supporting visual analytic tasks.  more » « less
Award ID(s):
2311574
PAR ID:
10533379
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data visualizations typically show a representation of a data set with little to no focus on the repeatability or generalizability of the displayed trends and patterns. However, insights gleaned from these visualizations are often used as the basis for decisions about future events. Visualizations of retrospective data therefore often serve as “visual predictive models.” However, this visual predictive model approach can lead to invalid inferences. In this article, we describe an approach to visual model validation called Inline Replication. Inline Replication is closely related to the statistical techniques of bootstrap sampling and cross-validation and, like those methods, provides a non-parametric and broadly applicable technique for assessing the variance of findings from visualizations. This article describes the overall Inline Replication process and outlines how it can be integrated into both traditional and emerging “big data” visualization pipelines. It also provides examples of how Inline Replication can be integrated into common visualization techniques such as bar charts and linear regression lines. Results from an empirical evaluation of the technique and two prototype Inline Replication–based visual analysis systems are also described. The empirical evaluation demonstrates the impact of Inline Replication under different conditions, showing that both (1) the level of partitioning and (2) the approach to aggregation have a major influence over its behavior. The results highlight the trade-offs in choosing Inline Replication parameters but suggest that using [Formula: see text] partitions is a reasonable default. 
    more » « less
  2. Abstract The rise of Large Language Models (LLMs) and generative visual analytics systems has transformed data‐driven insights, yet significant challenges persist in accurately interpreting users analytical and interaction intents. While language inputs offer flexibility, they often lack precision, making the expression of complex intents inefficient, error‐prone, and time‐intensive. To address these limitations, we investigate the design space of multimodal interactions for generative visual analytics through a literature review and pilot brainstorming sessions. Building on these insights, we introduce a highly extensible workflow that integrates multiple LLM agents for intent inference and visualization generation. We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs. This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses. By employing effective prompt engineering, and contextual interaction linking, alongside intuitive visualization and interaction designs, InterChat bridges the gap between user interactions and LLM‐driven visualizations, enhancing both interpretability and usability. Extensive evaluations, including two usage scenarios, a user study, and expert feedback, demonstrate the effectiveness of InterChat. Results show significant improvements in the accuracy and efficiency of handling complex visual analytics tasks, highlighting the potential of multimodal interactions to redefine user engagement and analytical depth in generative visual analytics. 
    more » « less
  3. Abstract The increasing integration of Visual Language Models (VLMs) into visualization systems demands a comprehensive understanding of their visual interpretation capabilities and constraints. While existing research has examined individual models, systematic comparisons of VLMs' visualization literacy remain unexplored. We bridge this gap through a rigorous, first‐of‐its‐kind evaluation of four leading VLMs (GPT‐4, Claude, Gemini, and Llama) using standardized assessments: the Visualization Literacy Assessment Test (VLAT) and Critical Thinking Assessment for Literacy in Visualizations (CALVI). Our methodology uniquely combines randomized trials with structured prompting techniques to control for order effects and response variability ‐ a critical consideration overlooked in many VLM evaluations. Our analysis reveals that while specific models demonstrate competence in basic chart interpretation (Claude achieving 67.9% accuracy on VLAT), all models exhibit substantial difficulties in identifying misleading visualization elements (maximum 30.0% accuracy on CALVI). We uncover distinct performance patterns: strong capabilities in interpreting conventional charts like line charts (76‐96% accuracy) and detecting hierarchical structures (80‐100% accuracy), but consistent difficulties with data‐dense visualizations involving multiple encodings (bubble charts: 18.6‐61.4%) and anomaly detection (25‐30% accuracy). Significantly, we observe distinct uncertainty management behavior across models, with Gemini displaying heightened caution (22.5% question omission) compared to others (7‐8%). These findings provide crucial insights for the visualization community by establishing reliable VLM evaluation benchmarks, identifying areas where current models fall short, and highlighting the need for targeted improvements in VLM architectures for visualization tasks. To promote reproducibility, encourage further research, and facilitate benchmarking of future VLMs, our complete evaluation framework, including code, prompts, and analysis scripts, is available athttps://github.com/washuvis/VisLit‐VLM‐Eval. 
    more » « less
  4. Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy, monologue reasoning, to train Code LLMs to reason comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states. We begin by collecting PyX, a clean Python corpus of fully executable code samples with functional descriptions and test cases. We propose training Code LLMs not only to write code but also to understand code semantics by reasoning about key properties, constraints, and execution behaviors using natural language, mimicking human verbal debugging, i.e., rubber-duck debugging. This approach led to the development of SemCoder, a Code LLM with only 6.7B parameters, which shows competitive performance with GPT-3.5-turbo on code generation and execution reasoning tasks. SemCoder achieves 79.3% on HumanEval (GPT-3.5-turbo: 76.8%), 63.6% on CRUXEval-I (GPT-3.5-turbo: 50.3%), and 63.9% on CRUXEval-O (GPT-3.5-turbo: 59.0%). We also study the effectiveness of SemCoder's monologue-style execution reasoning compared to concrete scratchpad reasoning, showing that our approach integrates semantics from multiple dimensions more smoothly. Finally, we demonstrate the potential of applying learned semantics to improve Code LLMs' debugging and self-refining capabilities. Our data, code, and models are available at: https://github.com/ARiSE-Lab/SemCoder. 
    more » « less
  5. Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into “The Good” (beneficial LLM applications), “The Bad” (offensive applications), and “The Ugly” (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs’ potential to both bolster and jeopardize cybersecurity. 
    more » « less