Efficient and consistent zero-shot video generation with diffusion models

Frakes, Ethan; Khalid, Umar; Chen, Chen

doi:10.1117/12.3013575

Citation Details

Efficient and consistent zero-shot video generation with diffusion models

Recent diffusion-based generative models employ methods such as one-shot fine-tuning an image diffusion model for video generation. However, this leads to long video generation times and suboptimal efficiency. To resolve this long generation time, zero-shot text-to-video models eliminate the fine-tuning method entirely and can generate novel videos from a text prompt alone. While the zero-shot generation method greatly reduces generation time, many models rely on inefficient cross-frame attention processors, hindering the diffusion model’s utilization for real-time video generation. We address this issue by introducing more efficient attention processors to a video diffusion model. Specifically, we use attention processors (i.e. xFormers, FlashAttention, and HyperAttention) that are highly optimized for efficiency and hardware parallelization. We then apply these processors to a video generator and test with both older diffusion models such as Stable Diffusion 1.5 and newer, high-quality models such as Stable Diffusion XL. Our results show that using efficient attention processors alone can reduce generation time by around 25%, while not resulting in any change in video quality. Combined with the use of higher quality models, this use of efficient attention processors in zero-shot generation presents a substantial efficiency and quality increase, greatly expanding the video diffusion model’s application to real-time video generation. more »

Award ID(s):: 2050731

PAR ID:: 10539732

Author(s) / Creator(s):: Frakes, Ethan; Khalid, Umar; Chen, Chen

Editor(s):: Kehtarnavaz, Nasser; Shirvaikar, Mukul V

Publisher / Repository:: SPIE

Date Published:: 2024-06-07

Volume:: 13034

ISBN:: 9781510673861

Page Range / eLocation ID:: 8

Format(s):: Medium: X

Location:: National Harbor, United States

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1117/12.3013575

More Like this