skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using Machine Learning and Visualization for Qualitative Inductive Analyses of Big Data
Many domains require analyst expertise to determine what patterns and data are interesting in a corpus. However, most analytics tools attempt to prequalify “interestingness” using algorithmic approaches to provide exploratory overviews. This overview-driven workflow precludes the use of qualitative analysis methodologies in large datasets. This paper discusses a preliminary visual analytics approach demonstrating how visual analytics tools can instead enable expert-driven qualitative analyses at scale by supporting computer-in-the-loop mixed-initiative approaches. We argue that visual analytics tools can support rich qualitative inference by using machine learning methods to continually model and refine what features correlate to an analyst’s on-going qualitative observations and by providing transparency into these features in order to aid analysts in navigating large corpora during qualitative analyses. We illustrate these ideas through an example from social media analysis and discuss open opportunities for designing visualizations that support qualitative inference through computer-in-the-loop approaches.  more » « less
Award ID(s):
1764089
PAR ID:
10176175
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 2019 Workshop on Machine Learning from User Interaction
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The rise of Large Language Models (LLMs) and generative visual analytics systems has transformed data‐driven insights, yet significant challenges persist in accurately interpreting users analytical and interaction intents. While language inputs offer flexibility, they often lack precision, making the expression of complex intents inefficient, error‐prone, and time‐intensive. To address these limitations, we investigate the design space of multimodal interactions for generative visual analytics through a literature review and pilot brainstorming sessions. Building on these insights, we introduce a highly extensible workflow that integrates multiple LLM agents for intent inference and visualization generation. We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs. This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses. By employing effective prompt engineering, and contextual interaction linking, alongside intuitive visualization and interaction designs, InterChat bridges the gap between user interactions and LLM‐driven visualizations, enhancing both interpretability and usability. Extensive evaluations, including two usage scenarios, a user study, and expert feedback, demonstrate the effectiveness of InterChat. Results show significant improvements in the accuracy and efficiency of handling complex visual analytics tasks, highlighting the potential of multimodal interactions to redefine user engagement and analytical depth in generative visual analytics. 
    more » « less
  2. This study proposes and demonstrates how computer‐aided methods can be used to extend qualitative data analysis by quantifying qualitative data, and then through exploration, categorization, grouping, and validation. Computer‐aided approaches to inquiry have gained important ground in educational research, mostly through data analytics and large data set processing. We argue that qualitative data analysis methods can also be supported and extended by computer‐aided methods. In particular, we posit that computing capacities rationally applied can expand the innate human ability to recognize patterns and group qualitative information based on similarities. We propose a principled approach to using machine learning in qualitative education research based on the three interrelated elements of the assessment triangle: cognition, observation, and interpretation. Through the lens of the assessment triangle, the study presents three examples of qualitative studies in engineering education that have used computer‐aided methods for visualization and grouping. The first study focuses on characterizing students' written explanations of programming code, using tile plots and hierarchical clustering with binary distances to identify the different approaches that students used to self‐explain. The second study looks into students' modeling and simulation process and elicits the types of knowledge that they used in each step through a think‐aloud protocol. For this purpose, we used a bubble plot and a k‐means clustering algorithm. The third and final study explores engineering faculty's conceptions of teaching, using data from semi‐structured interviews. We grouped these conceptions based on coding similarities, using Jaccard's similarity coefficient, and visualized them using a treemap. We conclude this manuscript by discussing some implications for engineering education qualitative research. 
    more » « less
  3. We introduce MooBot, a RAG-based video querying system powered by GPT-4o designed to bridge the gap between what complex cattle video data can provide and what dairy farmers need through a natural language web interface. MooBot applies computer vision inference on barn videos to detect cows, identify individuals, and classify their behaviors, transforming visual data into a structured schema containing useful insights. Our results demonstrate the potential of MooBot for enhancing accessibility to video-derived insights in precision livestock farming, bringing advanced computer vision analytics within reach of farmers without requiring technical expertise. 
    more » « less
  4. Visualization tools facilitate exploratory data analysis, but fall short at supporting hypothesis-based reasoning. We conducted an exploratory study to investigate how visualizations might support a concept-driven analysis style, where users can optionally share their hypotheses and conceptual models in natural language, and receive customized plots depicting the fit of their models to the data. We report on how participants leveraged these unique affordances for visual analysis. We found that a majority of participants articulated meaningful models and predictions, utilizing them as entry points to sensemaking. We contribute an abstract typology representing the types of models participants held and externalized as data expectations. Our findings suggest ways for rearchitecting visual analytics tools to better support hypothesis- and model-based reasoning, in addition to their traditional role in exploratory analysis. We discuss the design implications and reflect on the potential benefits and challenges involved. 
    more » « less
  5. Over the past decade, there has been a significant increase in the development of visual analytics systems dedicated to addressing urban issues. These systems distill intricate urban analysis workflows into intuitive, interactive visual representations and interfaces, enabling users to explore, understand, and derive insights from large and complex data, including street-level imagery, street networks, and building geometries. Developing urban visual analytics systems, however, is a challenging endeavor that requires considerable programming expertise and interaction between various multidisciplinary stakeholders. This situation often leads to monolithic and isolated prototypes that are hard to reproduce, combine, or extend. Concurrently, there has been an increase in the availability of general and urban-specific toolkits, frameworks, and authoring tools that are open source and abstract away the need to implement low-level visual analytics functionalities. This paper provides a hierarchical taxonomy of urban visual analytics systems to contextualize how they are usually designed, implemented, and evaluated. We develop this taxonomy across three distinct levels (i.e., dimensions, categories, and tags), juxtaposing visualization with analytics, data, and system dimensions. We then assess the extent to which current open-source toolkits, frameworks, and authoring tools can effectively support the development of components tailored to urban visual analytics, identifying their strengths and limitations in addressing the unique challenges posed by urban data. In doing so, we offer a roadmap that can guide the effective employment of existing resources and chart a pathway for developing and refining future systems 
    more » « less