FineStyle: Fine-grained Controllable Style Personalization for Text-to-image Models

Zhang, Gong; Sohn, Kihyuk; Hahn, Meera; Shi, Humphrey; Essa, Irfan

Citation Details

Few-shot fine-tuning of text-to-image (T2I) generation models enables people to create unique images in their own style using natural languages without requiring extensive prompt engineering. However, fine-tuning with only a handful, as little as one, of image-text paired data prevents fine-grained control of style attributes at generation. In this paper, we present FineStyle, a few-shot fine-tuning method that allows enhanced controllability for style personalized text-to-image generation. To overcome the lack of training data for fine-tuning, we propose a novel conceptoriented data scaling that amplifies the number of image-text pair, each of which focuses on different concepts (e.g., objects) in the style reference image. We also identify the benefit of parameter-efficient adapter tuning of key and value kernels of cross-attention layers. Extensive experiments show the effectiveness of FineStyle at following fine-grained text prompts and delivering visual quality faithful to the specified style, measured by CLIP scores and human raters. more »

Award ID(s):: 2427478

PAR ID:: 10611028

Author(s) / Creator(s):: Zhang, Gong; Sohn, Kihyuk; Hahn, Meera; Shi, Humphrey; Essa, Irfan

Publisher / Repository:: NeurIPS 2024

Date Published:: 2024-12-10

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this