skip to main content


This content will become publicly available on June 25, 2025

Title: Visualize Music Using Generative Arts
Music is one of the most universal forms of communication and entertainment across cultures. This can largely be credited to the sense of synesthesia, or the combining of senses. Based on this concept of synesthesia, we want to explore whether generative AI can create visual representations for music. The aim is to inspire the user’s imagination and enhance the user experience when enjoying music. Our approach has the following steps: (a) Music is analyzed and classified into multiple dimensions (including instruments, emotion, tempo, pitch range, harmony, and dynamics) to produce textual descriptions. (b) The texts form inputs of machine models that can predict the genre of the input audio. (c) The prompts are inputs of generative machine models to create visual representations. The visual representations are continuously updated as the music plays, ensuring that the visual effects aptly mirror the musical changes. A comprehensive user study with 88 users confirms that our approach is able to generate visual art reflecting the music pieces. From a list of images covering both abstract images and realistic images, users considered that our system-generated images can better represent pieces of music than human-chosen images. It suggests that generative arts can become a promising method to enhance users' listening experience while enjoying music. Our method provides a new approach to visualize music and to enjoy music through generative arts.  more » « less
Award ID(s):
2326198
PAR ID:
10533890
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-5409-6
Page Range / eLocation ID:
1516 to 1521
Format(s):
Medium: X
Location:
Singapore, Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. Drawing is an art that enables people to express their imagination and emotions. However, individuals usually face challenges in drawing, especially when translating conceptual ideas into visually coherent representations and bridging the gap between mental visualization and practical execution. In response, we propose ARtVista - a novel system integrating AR and generative AI technologies. ARtVista not only recommends reference images aligned with users’ abstract ideas and generates sketches for users to draw but also goes beyond, crafting vibrant paintings in various painting styles. ARtVista also offers users an alternative approach to create striking paintings by simulating the paint-by-number concept on reference images, empowering users to create visually stunning artwork devoid of the necessity for advanced drawing skills. We perform a pilot study and reveal positive feedback on its usability, emphasizing its effectiveness in visualizing user ideas and aiding the painting process to achieve stunning pictures without requiring advanced drawing skills. 
    more » « less
  2. Freehand gesture is an essential input modality for modern Augmented Reality (AR) user experiences. However, developing AR applications with customized hand interactions remains a challenge for end-users. Therefore, we propose GesturAR, an end-to-end authoring tool that supports users to create in-situ freehand AR applications through embodied demonstration and visual programming. During authoring, users can intuitively demonstrate the customized gesture inputs while referring to the spatial and temporal context. Based on the taxonomy of gestures in AR, we proposed a hand interaction model which maps the gesture inputs to the reactions of the AR contents. Thus, users can author comprehensive freehand applications using trigger-action visual programming and instantly experience the results in AR. Further, we demonstrate multiple application scenarios enabled by GesturAR, such as interactive virtual objects, robots, and avatars, room-level interactive AR spaces, embodied AR presentations, etc. Finally, we evaluate the performance and usability of GesturAR through a user study. 
    more » « less
  3. This paper explores the feasibility of using sonification in delivering and communicating health and wellness status on personal devices. Ambient displays have proven to inform users of their health and wellness and help them to make healthier decisions, yet, little technology provides health assessments through sounds, which can be even more pervasive than visual displays. We developed a method to generate music from user preferences and evaluated it in a two-step user study. In the first step, we acquired general healthiness impressions from each user. In the second step, we generated customized melodies from music preferences in the first step to capture participants' perceived healthiness of those melodies. We deployed our surveys for 55 participants to complete on their own over 31 days. We analyzed the data to understand commonalities and differences in users' perceptions of music as an expression of health. Our findings show the existence of clear associations between perceived healthiness and different music features. We provide useful insights into how different musical features impact the perceived healthiness of music, how perceptions of healthiness vary between users, what trends exist between users' impressions, and what influences (or does not influence) a user's perception of healthiness in a melody. Overall, our results indicate validity in presenting health data through personalized music models. The findings can inform the design of behavior management applications on personal and ubiquitous devices. 
    more » « less
  4. Navigating unfamiliar websites is challenging for users with visual impairments. Although many websites offer visual cues to facilitate access to pages/features most websites are expected to have (e.g., log in at the top right), such visual shortcuts are not accessible to users with visual impairments. Moreover, although such pages serve the same functionality across websites (e.g., to log in, to sign up), the location, wording, and navigation path of links to these pages vary from one website to another. Such inconsistencies are challenging for users with visual impairments, especially for users of screen readers, who often need to linearly listen to content of pages to figure out how to access certain website features. To study how to improve access to main website features, we iteratively designed and tested a command-based approach for main features of websites via a browser extension powered by machine learning and human input. The browser extension gives users a way to access high-level website features (e.g., log in, find stores, contact) via keyboard commands. We tested the browser extension in a lab setting with 15 Internet users, including 9 users with visual impairments and 6 without. Our study showed that commands for main website features can greatly improve the experience of users with visual impairments. People without visual impairments also found command-based access helpful when visiting unfamiliar, cluttered, or infrequently visited websites, suggesting that this approach can support users with visual impairments while also benefiting other user groups (i.e., universal design). Our study reveals concerns about the handling of unsupported commands and the availability and trustworthiness of human input. We discuss how websites, browsers, and assistive technologies could incorporate a command-based paradigm to enhance web accessibility and provide more consistency on the web to benefit users with varied abilities when navigating unfamiliar or complex websites. 
    more » « less
  5. Line charts are often used to convey high level information about time series data. Unfortunately, these charts are not always described in text, and as a result are often inaccessible to users with visual impairments who rely on screen readers. In these situations, an automated system that can describe the overall trend in a chart would be desirable. This paper presents a novel approach to classifying trends in line chart images, for use in existing chart summarization tools. Previous projects have introduced approaches to automatically summarize line charts, but have thus far been unable to describe chart trends with sufficient accuracy for real-world applications. Instead of classifying an image’s trend via a convolutional neural network (CNN) system, as has been done previously, we present an architecture similar to bag-of-words (BoW) techniques for computer vision, mapping the image classification problem to an analogous natural language problem. We divided images into matrices of image patches which we then each treated as a series of “visual words” which were used to classify each image. We utilized natural language processing (NLP) word embeddings techniques to to create embeddings of visual words that allowed us to model contextual similarity between patches. We trained a linear support vector machine (SVM) model using these patch embeddings as inputs to classify the chart trend. We compared this method against a ResNet classifier pre-trained on ImageNet. Our experimental results showed that the novel approach presented in this paper outperforms existing approaches. 
    more » « less