Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

Flores_García, Hugo Flores; Nieto, Oriol; Salamon, Justin; Pardo, Bryan; Seetharaman, Prem

doi:10.1109/ICASSP49660.2025.10888184

Citation Details

This content will become publicly available on April 6, 2026

Sketch2Sound: Controllable Audio Generation via Time-Varying Signals and Sonic Imitations

We present Sketch2Sound, a generative audio model capable of creating high-quality sounds from a set of interpretable time-varying control signals: loudness, brightness, and pitch, as well as text prompts. Sketch2Sound can synthesize arbitrary sounds from sonic imitations (i.e.,~a vocal imitation or a reference sound-shape). Sketch2Sound can be implemented on top of any text-to-audio latent diffusion transformer (DiT), and requires only 40k steps of fine-tuning and a single linear layer per control, making it more lightweight than existing methods like ControlNet. To synthesize from sketchlike sonic imitations, we propose applying random median filters to the control signals during training, allowing Sketch2Sound to be prompted using controls with flexible levels of temporal specificity. We show that Sketch2Sound can synthesize sounds that follow the gist of input controls from a vocal imitation while retaining the adherence to an input text prompt and audio quality compared to a text-only baseline. Sketch2Sound allows sound artists to create sounds with the semantic flexibility of text prompts and the expressivity and precision of a sonic gesture or vocal imitation. more »

Award ID(s):: 2222369

PAR ID:: 10638308

Author(s) / Creator(s):: Flores_García, Hugo Flores; Nieto, Oriol; Salamon, Justin; Pardo, Bryan; Seetharaman, Prem

Publisher / Repository:: IEEE

Date Published:: 2025-04-06

ISBN:: 979-8-3503-6874-1

Page Range / eLocation ID:: 1 to 5

Format(s):: Medium: X

Location:: Hyderabad, India

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 6, 2026
Conference Paper:
https://doi.org/10.1109/ICASSP49660.2025.10888184

More Like this