Diffusion-based Text-to-Image (T2I) models have achieved impressive success in generating high-quality images from textual prompts. While large language models (LLMs) effectively leverage Direct Preference Optimization (DPO) for fine-tuning on human preference data without the need for reward models, diffusion models have not been extensively explored in this area. Current preference learning methods applied to T2I diffusion models immediately adapt existing techniques from LLMs. However, this direct adaptation introduces an estimated loss specific to T2I diffusion models. This estimation can potentially lead to suboptimal performance through our empirical results. In this work, we propose Direct Score Preference Optimization (DSPO), a novel algorithm that aligns the pretraining and fine-tuning objectives of diffusion models by leveraging score matching, the same objective used during pretraining. It introduces a new perspective on preference learning for diffusion models. Specifically, DSPO distills the score function of human-preferred image distributions into pretrained diffusion models, fine-tuning the model to generate outputs that align with human preferences. We theoretically show that DSPO shares the same optimization direction as reinforcement learning algorithms in diffusion models under certain conditions. Our experimental results demonstrate that DSPO outperforms preference learning baselines for T2I diffusion models in human preference evaluation tasks and enhances both visual appeal and prompt alignment of generated images.
more »
« less
Mechano-diffusion of particles in stretchable hydrogels
We report a mechano-diffusion mechanism that harnesses mechanical deformation to control particle diffusion in stretchable hydrogels with a significantly enlarged tuning ratio and highly expanded tuning freedom.
more »
« less
- PAR ID:
- 10599451
- Publisher / Repository:
- ROYAL SOCIETY OF CHEMISTRY
- Date Published:
- Journal Name:
- Soft Matter
- Volume:
- 21
- Issue:
- 12
- ISSN:
- 1744-683X
- Page Range / eLocation ID:
- 2230 to 2241
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Kehtarnavaz, Nasser; Shirvaikar, Mukul V (Ed.)Recent diffusion-based generative models employ methods such as one-shot fine-tuning an image diffusion model for video generation. However, this leads to long video generation times and suboptimal efficiency. To resolve this long generation time, zero-shot text-to-video models eliminate the fine-tuning method entirely and can generate novel videos from a text prompt alone. While the zero-shot generation method greatly reduces generation time, many models rely on inefficient cross-frame attention processors, hindering the diffusion model’s utilization for real-time video generation. We address this issue by introducing more efficient attention processors to a video diffusion model. Specifically, we use attention processors (i.e. xFormers, FlashAttention, and HyperAttention) that are highly optimized for efficiency and hardware parallelization. We then apply these processors to a video generator and test with both older diffusion models such as Stable Diffusion 1.5 and newer, high-quality models such as Stable Diffusion XL. Our results show that using efficient attention processors alone can reduce generation time by around 25%, while not resulting in any change in video quality. Combined with the use of higher quality models, this use of efficient attention processors in zero-shot generation presents a substantial efficiency and quality increase, greatly expanding the video diffusion model’s application to real-time video generation.more » « less
-
While diffusion models have recently demonstrated remarkable progress in generating realistic images, privacy risks also arise: published models or APIs could generate training images and thus leak privacy-sensitive training information. In this paper, we reveal a new risk, Shake-to-Leak (S2L), that fine-tuning the pre-trained models with manipulated data can amplify the existing privacy risks. We demonstrate that S2L could occur in various standard fine-tuning strategies for diffusion models, including concept-injection methods (DreamBooth and Textual Inversion) and parameter-efficient methods (LoRA and Hypernetwork), as well as their combinations. In the worst case, S2L can amplify the state-of-the-art membership inference attack (MIA) on diffusion models by 5.4% (absolute difference) AUC and can increase extracted private samples from almost 0 samples to 16.3 samples on average per target domain. This discovery underscores that the privacy risk with diffusion models is even more severe than previously recognized. Codes are available at https://github.com/VITA-Group/Shake-to-Leak.more » « less
-
Diffusion models have emerged as powerful tools for generative modeling, demonstrating exceptional capability in capturing target data distributions from large datasets. However, fine-tuning these massive models for specific downstream tasks, constraints, and human preferences remains a critical challenge. While recent advances have leveraged reinforcement learning algorithms to tackle this problem, much of the progress has been empirical, with limited theoretical understanding. To bridge this gap, we propose a stochastic control framework for fine-tuning diffusion models. Building on denoising diffusion probabilistic models as the pre-trained reference dynamics, our approach integrates linear dynamics control with Kullback–Leibler regularization. We establish the well-posedness and regularity of the stochastic control problem and develop a policy iteration algorithm (PI-FT) for numerical solution. We show that PI-FT achieves global convergence at a linear rate. Unlike existing work that assumes regularities throughout training, we prove that the control and value sequences generated by the algorithm preserve the desired regularity. Finally, we extend our framework to parametric settings for efficient implementation and demonstrate the practical effectiveness of the proposed PI-FT algorithm through numerical experiments.more » « less
-
Abstract Graphene-based electrodes have been extensively investigated for supercapacitor applications. However, their ion diffusion efficiency is often hindered by the graphene restacking phenomenon. Even though holey graphene is fabricated to address this issue by providing ion transport channels, those channels could still be blocked by densely stacked graphene nanosheets. To tackle this challenge, this research aims at improving the ion diffusion efficiency of microwave-synthesized holey graphene films by tuning the water interlayer spacer towards the improved supercapacitor performance. By controlling the vacuum filtration during graphene-based electrode fabrication, we obtain dry films with dense packing and wet films with sparse packing. The SEM images reveal that 20 times larger interlayer distance is constructed in the wet film compared to that in the dry counterpart. The holey graphene wet film delivers a specific capacitance of 239 F/g, ~82% enhancement over the dry film (131 F/g). By an integrated experimental and computational study, we quantitatively show that the interlayer spacing in combination with the nanoholes in the basal plane dominates the ion diffusion rate in holey graphene-based electrodes. Our study concludes that novel hierarchical structures should be further considered even in holey graphene thin films to fully exploit the superior advantages of graphene-based supercapacitors.more » « less
An official website of the United States government

