NSF PAR Search | NSF Public Access Repository

Distributional Semantics of Line Charts for Trend Classification

https://doi.org/10.1007/978-3-031-20716-7_20

Onweller, Connor; O’Brien, Andrew; Kim, Edward; McCoy, Kathleen. F. (October 2022, International Symposium on Visual Computing)

Line charts are often used to convey high level information about time series data. Unfortunately, these charts are not always described in text, and as a result are often inaccessible to users with visual impairments who rely on screen readers. In these situations, an automated system that can describe the overall trend in a chart would be desirable. This paper presents a novel approach to classifying trends in line chart images, for use in existing chart summarization tools. Previous projects have introduced approaches to automatically summarize line charts, but have thus far been unable to describe chart trends with sufficient accuracy for real-world applications. Instead of classifying an image’s trend via a convolutional neural network (CNN) system, as has been done previously, we present an architecture similar to bag-of-words (BoW) techniques for computer vision, mapping the image classification problem to an analogous natural language problem. We divided images into matrices of image patches which we then each treated as a series of “visual words” which were used to classify each image. We utilized natural language processing (NLP) word embeddings techniques to to create embeddings of visual words that allowed us to model contextual similarity between patches. We trained a linear support vector machine (SVM) model using these patch embeddings as inputs to classify the chart trend. We compared this method against a ResNet classifier pre-trained on ImageNet. Our experimental results showed that the novel approach presented in this paper outperforms existing approaches.

Full Text Available

We present a multimodal deep learning framework that can generate summarization text supporting the main idea of an information graphic for presentation to a person who is blind or visually impaired. The framework utilizes the visual, textual, positional, and size characteristics extracted from the image to create the summary. Different and complimentary neural architectures are optimized for each task using crowdsourced training data. From our quantitative experiments and results, we explain the reasoning behind our framework and show the effectiveness of our models. Our qualitative results showcase text generated from our framework and show that Mechanical Turk participants favor them to other automatic and human generated summarizations. We describe the design and results of an experiment to evaluate the utility of our system for people who have visual impairments in the context of understanding Twitter Tweets containing line graphs.

Search for: All records