skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Using Machine Learning and Visualization for Qualitative Inductive Analyses of Big Data
Many domains require analyst expertise to determine what patterns and data are interesting in a corpus. However, most analytics tools attempt to prequalify “interestingness” using algorithmic approaches to provide exploratory overviews. This overview-driven workflow precludes the use of qualitative analysis methodologies in large datasets. This paper discusses a preliminary visual analytics approach demonstrating how visual analytics tools can instead enable expert-driven qualitative analyses at scale by supporting computer-in-the-loop mixed-initiative approaches. We argue that visual analytics tools can support rich qualitative inference by using machine learning methods to continually model and refine what features correlate to an analyst’s on-going qualitative observations and by providing transparency into these features in order to aid analysts in navigating large corpora during qualitative analyses. We illustrate these ideas through an example from social media analysis and discuss open opportunities for designing visualizations that support qualitative inference through computer-in-the-loop approaches.  more » « less
Award ID(s):
1764089
PAR ID:
10176175
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 2019 Workshop on Machine Learning from User Interaction
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The rise of Large Language Models (LLMs) and generative visual analytics systems has transformed data‐driven insights, yet significant challenges persist in accurately interpreting users analytical and interaction intents. While language inputs offer flexibility, they often lack precision, making the expression of complex intents inefficient, error‐prone, and time‐intensive. To address these limitations, we investigate the design space of multimodal interactions for generative visual analytics through a literature review and pilot brainstorming sessions. Building on these insights, we introduce a highly extensible workflow that integrates multiple LLM agents for intent inference and visualization generation. We develop InterChat, a generative visual analytics system that combines direct manipulation of visual elements with natural language inputs. This integration enables precise intent communication and supports progressive, visually driven exploratory data analyses. By employing effective prompt engineering, and contextual interaction linking, alongside intuitive visualization and interaction designs, InterChat bridges the gap between user interactions and LLM‐driven visualizations, enhancing both interpretability and usability. Extensive evaluations, including two usage scenarios, a user study, and expert feedback, demonstrate the effectiveness of InterChat. Results show significant improvements in the accuracy and efficiency of handling complex visual analytics tasks, highlighting the potential of multimodal interactions to redefine user engagement and analytical depth in generative visual analytics. 
    more » « less
  2. This study proposes and demonstrates how computer‐aided methods can be used to extend qualitative data analysis by quantifying qualitative data, and then through exploration, categorization, grouping, and validation. Computer‐aided approaches to inquiry have gained important ground in educational research, mostly through data analytics and large data set processing. We argue that qualitative data analysis methods can also be supported and extended by computer‐aided methods. In particular, we posit that computing capacities rationally applied can expand the innate human ability to recognize patterns and group qualitative information based on similarities. We propose a principled approach to using machine learning in qualitative education research based on the three interrelated elements of the assessment triangle: cognition, observation, and interpretation. Through the lens of the assessment triangle, the study presents three examples of qualitative studies in engineering education that have used computer‐aided methods for visualization and grouping. The first study focuses on characterizing students' written explanations of programming code, using tile plots and hierarchical clustering with binary distances to identify the different approaches that students used to self‐explain. The second study looks into students' modeling and simulation process and elicits the types of knowledge that they used in each step through a think‐aloud protocol. For this purpose, we used a bubble plot and a k‐means clustering algorithm. The third and final study explores engineering faculty's conceptions of teaching, using data from semi‐structured interviews. We grouped these conceptions based on coding similarities, using Jaccard's similarity coefficient, and visualized them using a treemap. We conclude this manuscript by discussing some implications for engineering education qualitative research. 
    more » « less
  3. We introduce MooBot, a RAG-based video querying system powered by GPT-4o designed to bridge the gap between what complex cattle video data can provide and what dairy farmers need through a natural language web interface. MooBot applies computer vision inference on barn videos to detect cows, identify individuals, and classify their behaviors, transforming visual data into a structured schema containing useful insights. Our results demonstrate the potential of MooBot for enhancing accessibility to video-derived insights in precision livestock farming, bringing advanced computer vision analytics within reach of farmers without requiring technical expertise. 
    more » « less
  4. Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia . 
    more » « less
  5. Visualization tools facilitate exploratory data analysis, but fall short at supporting hypothesis-based reasoning. We conducted an exploratory study to investigate how visualizations might support a concept-driven analysis style, where users can optionally share their hypotheses and conceptual models in natural language, and receive customized plots depicting the fit of their models to the data. We report on how participants leveraged these unique affordances for visual analysis. We found that a majority of participants articulated meaningful models and predictions, utilizing them as entry points to sensemaking. We contribute an abstract typology representing the types of models participants held and externalized as data expectations. Our findings suggest ways for rearchitecting visual analytics tools to better support hypothesis- and model-based reasoning, in addition to their traditional role in exploratory analysis. We discuss the design implications and reflect on the potential benefits and challenges involved. 
    more » « less