<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>DRAGON: Distributional Rewards Optimize Diffusion Generative Models</title></titleStmt>
			<publicationStmt>
				<publisher>JMLR Inc.</publisher>
				<date>10/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10657901</idno>
					<idno type="doi"></idno>
					<title level='j'>Transactions on machine learning research</title>
<idno>2835-8856</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Yatong Bai</author><author>Jonah Casebeer</author><author>Somayeh Sojoudi</author><author>Nicholas J Bryan</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We pr es e nt D istri b uti o n al R e w A r ds f or G e n er ati v e O pti mi z ati o N (D R A G O N ), a v ers atil e fr a m e w or k f or fi n e-t u ni n g m e di a g e n er ati o n m o d els t o w ar ds a d esir e d o ut c o m e. C o m p ar e d wit h tr a diti o n al r ei nf or c e m e nt l e ar ni n g wit h h u m a n f e e d b a c k ( R L H F) or p air wis e pr ef er e n c e a p pr o a c h es s u c h as dir e ct pr ef er e n c e o pti mi z ati o n ( D P O), D R A G O N is m or e fl e xi bl e. It c a n o pti mi z e r e w ar d f u n cti o ns t h at e v al u at e eit h er i n di vi d u al e x a m pl es or distri b uti o ns of t h e m, m a ki n g it c o m p ati bl e wit h a br o a d s p e ctr u m of i nst a n c e-wis e, i nst a n c e-t o-distri b uti o n, a n d distri b uti o n-t o-distri b uti o n r e w ar ds. L e v er a gi n g t his v ers atilit y, w e c o nstr u ct n o v el r e w ar d f u n cti o ns b y s el e cti n g a n e n c o d er a n d a s et of r ef er e n c e e x a m pl es t o cr e at e a n e x e m pl ar distri b uti o n. W h e n cr oss-m o d al e n c o d ers s u c h as C L A P ar e us e d, t h e r ef er e n c e m a y b e of a di ff er e nt m o d alit y ( e. g., t e xt v ers us a u di o). T h e n, D R A G O N g at h ers o nli n e a n d o n-p oli c y g e n er ati o ns, s c or es t h e m wit h t h e r e w ar d f u n cti o n t o c o nstr u ct a p ositi v e d e m o nstr ati o n s et a n d a n e g ati v e s et, a n d l e v er a g es t h e c o ntr ast b et w e e n t h e t w o fi nit e s ets t o a p pr o xi m at e distri b uti o n al r e w ar d o pti mi z ati o n. F or e v al u ati o n, w e fi n e-t u n e a n a u di o-d o m ai n t e xtt o-m usi c di ff usi o n m o d el wit h 2 0 r e w ar d f u n cti o ns, i n cl u di n g a c ust o m m usi c a est h eti cs m o d el, C L A P s c or e, Ve n di di v ersit y, a n d Fr é c h et a u di o dist a n c e ( F A D). We f urt h er c o m p ar e i nst a n c e-wis e ( p er-s o n g) a n d f ull-d at as et F A D s etti n gs w hil e a bl ati n g m ulti pl e F A D e n c o d ers a n d r ef er e n c e s ets. O v er all 2 0 t ar g et r e w ar ds, D R A G O N a c hi e v es a n 8 1 .4 5 % a v er a g e wi n r at e. M or e o v er, r e w ar d f u n cti o ns b as e d o n e x e m pl ar s ets i n d e e d e n h a n c e g e n er ati o ns a n d ar e c o m p ar a bl e t o m o d el-b as e d r e w ar ds. Wit h a n a p pr o pri at e e x e m pl ar s et, D R A G O N a c hi e v es a 6 0 .9 5 % h u m a n-v ot e d m usi c q u alit y wi n r at e wit h o ut tr ai ni n g o n h u m a n pr ef er e n c e a n n ot ati o ns. As s u c h, D R A G O N e x hi bits a n e w a p pr o a c h t o d esi g ni n g a n d o pti mi zi n g r e w ar d f u n cti o ns f or i m pr o vi n g h u m a n-p er c ei v e d q u alit y. E x a m pl e g e n er ati o ns c a n b e f o u n d at h t t p s : / / m l -d r a g o n . g i t h u b . i o / w e b . 1 I n tr o d u c ti o n R e c e nt a d v a n c es i n di ff usi o n m o d els h a v e tr a nsf or m e d c o nt e nt g e n er ati o n a cr oss m e di a d o m ai ns, est a blis hi n g n e w st a n d ar ds f or g e n er ati n g hi g h-q u alit y i m a g es, vi d e o, a n d a u di o ( R o m b a c h et al., 2 0 2 2; H o et al., 2 0 2 2; Li u et al., 2 0 2 3; G h os al et al., 2 0 2 3). W hil e t h es e m o d els a c hi e v e i m pr essi v e r es ults t hr o u g h s o p histi c at e d † W o r k d o n e a s a n i nt e r n a t A d o b e R e s e a r c h, s u p p o r t e d i n p a r t b y t h e U. S. A r m y R e s e a r c h L a b o r a t o r y a n d t h e U. S. A r m y R e s e a r c h O ffi c e u n d e r G r a nt W 9 1 1 N F 2 0 1 0 2 1 9, O ffi c e of N a v al R e s e a r c h, a n d N S F.P u blis h e d i n Tr a ns a cti o ns o n M a c hi n e L e ar ni n g R es e ar c h ( 1 0 / 2 0 2 5) M u si c C a pti o n D at a s et]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>-) &#8592; (D , D 2 ) if r di s t (D 1 ) &gt; r di s t (D 2 ) els e (D 2 , D 1 ). 3: f o r i = 0 , 1 , . . . , n d o 4:</p><p>At st e p i, s w a p t h e i t h g e n er ati o n p air i n D </p><p>5 4 2. 8 % 4 7. 8 % 6 1. 9 % 5 9. 4 % 0. 0 % 0. 2 % D at a s et C L A P-F A D A LI M-A u di o 7 3. 6 % . 2 1 4 /. 2 0 7 5 8. 3 % 4 5. 7 % 6 1. 5 % 5 0. 2 % 7 3. 5 % 2 9. 9 % D at a s et C L A P-F A D S D N V-A u di o 8 3. 2 % . 2 6 0 /. 2 5 1 4 7. 7 % 4 8. 8 % 0. 0 % 0. 0 % 4 2. 4 % 8 3. 2 % D at a s et C L A P-F A D A LI M-Te xt 8 5. 4 % . 9 8 3 /. 9 6 7 6 8. 8 % 5 9. 5 % 8 8. 2 % 8 4. 7 % 2. 1 % 0. 9 % D at a s et C L A P-F A D S D N V-Te xt 8 1. 6 % . 7 9 9 /. 7 8 8 4 1. 4 % 5 2. 4 % 1. 2 % 1. 9 % 6. 2 % 5. 8 % D at a s et C L A P-F A D H u m a n-Te xt 9 8. 4 % . 8 3 7 /. 8 1 3 5 7. 4 % 5 5. 8 % 2 6. 5 % 3 8. 9 % 2 6. 2 % 1 4. 1 % D at a s et C L A P-F A D Mi xt r al-Te xt 9 9. 8 % . 8 3 2 /. 7 8 6 6 4. 6 % 6 1. 1 % 8. 3 % 1 4. 1 % 1. 3 % 0. 0 % O n a v er a g e, D R A G O N a c hi e v es a n 8 1. 4 % wi n r at e a cr oss all p er-s o n g F A D r u ns. M a n y r u ns ( es p e ci all y t h os e usi n g A LI M as t h e r ef er e n c e) i m pr o v e t h e pr e di ct e d a est h eti cs s c or e, r e a c hi n g a n a v er a g e a est h eti cs wi n r at e of 5 9. 9 % wit h o ut a n y h u m a n r ati n g. </p><p>( </p><p>w h er e A &#952; (x 0 , x t , t) := x 0 -f r ef (x t , t)   </p><p>r al H u m a n A LI M S D N V P e r-S o n g F A D R ef e r e n c e St ati sti c s -0.</p><p>-0.</p><p>-0.</p><p>-0.       R ef e r e n c e ( 4 0 i nf e r e n c e st e p s) 0. 2 1 4 0. 2 6 0 0. 9 8 3 0. 7 9 9 0. 8 3 7 0. 8 3 2 8. 2 6 1 8. 2 9 7 1 2. 7 0 5 R ef e r e n c e ( 1 0 i nf e r e n c e st e p s) 0. 3 4 4 0. 3 7 4 1. 0 6 9 0. 8 7 6 0. 9 0 4 0. 9 2 5 1 3. 3 6 0 1 2. 5 0 2 1 0. 4 0 1 K T O A e st h eti c s 0. 2 4 3 0. 3 1 1 1. 0 0 4 0. 8 3 1 0. 8 6 3 0. 8 5 9 9. 1 3 5 9. 1 8 2 1 0. 9 6 8 D P O A e st h eti c s ( 4 0 / 4 0 t r ai n /i nf e r e n c e st e p s) 0. 2 1 6 0. 2 6 2 0. 9 8 7 0. 8 2 6 0. 8 6 4 0. 8 4 8 9. 6 1 6 9. 2 8 8 1 0. 6 0 1 D P O A e st h eti c s ( 4 0 / 1 0 t r ai n /i nf e r e n c e st e p s) 0. 2 9 5 0. 3 2 1 1. 0 2 9 0. 8 7 6 0. 9 0 6 0. 8 9 9 1 2. 6 0 4 1 1. 8 4 4 8. 7 3 6 D P O A e st h eti c s ( 1 0 / 4 0 t r ai n /i nf e r e n c e st e p s) 0. 2 1 7 0. 2 6 5 0. 9 6 9 0. 8 3 2 0. 8 6 5 0. 8 4 3 9. 2 4 5 9. 0 2 2 1 0. 7 3 1 D P O A e st h eti c s ( 1 0 / 1 0 t r ai n /i nf e r e n c e st e p s) 0. 2 6 2 0. 2 9 7 1. 0 0 6 0. 8 7 6 0. 9 0 4 0. 8 8 8 1 3. 6 5 7 1 2. 7 9 8 8. 7 1 0 K T O-U n p ai r e d A e st h eti c s 0. 2 4 2 0. 3 0 3 0. 9 6 3 0. 8 0 6 0. 8 5 0 0. 8 1 9 9. 7 0 3 9. 5 6 0 1 0. 6 6 4 K T O C L A P S c o r e 0. 2 2 2 0. 3 0 9 0. 9 2 6 0. 8 0 4 0. 8 3 8 0. 8 0 6 6. 9 5 5 8. 1 3 5 1 0. R ef e r e n c e ( 4 0 i nf e r e n c e st e p s) 0. 1 0 7 0. 1 5 5 0. 9 0 1 0. 6 8 4 0. 7 1 6 0. 7 3 4 7. 4 7 0 7. 5 0 3 1 2. 2 3 0 R ef e r e n c e ( 1 0 i nf e r e n c e st e p s) 0. 2 6 7 0. 2 9 7 1. 0 0 2 0. 7 8 1 0. 8 0 4 0. 8 4 5 1 2. 8 9 8 1 2. 0 2 5 8. 7 0 2 K T O A e st h eti c s 0. 1 2 9 0. 1 9 8 0. 9 2 2 0. 7 1 6 0. 7 4 0 0. 7 6 1 8. 3 9 9 8. 4 4 3 1 0. 7 8 2 D P O A e st h eti c s ( 4 0 / 4 0 t r ai n /i nf e r e n c e st e p s) 0. 1 1 8 0. 1 6 4 0. 9 1 2 0. 7 2 1 0. 7 5 3 0. 7 5 8 8. 9 6 5 8. 6 3 5 9. 6 4 8 D P O A e st h eti c s ( 4 0 / 1 0 t r ai n /i nf e r e n c e st e p s) 0. 2 2 1 0. 2 4 7 0. 9 6 6 0. 7 8 9 0. 8 1 3 0. 8 2 5 1 2. 2 5 5 1 1. 4 7 5 7. 0 5 9 D P O A e st h eti c s ( 1 0 / 4 0 t r ai n /i nf e r e n c e st e p s) 0. 1 1 0 0. 1 6 1 0. 8 9 1 0. 7 2 3 0. 7 4 9 0. 7 4 9 8. 5 2 8 8. 2 9 1 1 0. 0 6 8 D P O A e st h eti c s ( 1 0 / 1 0 t r ai n /i nf e r e n c e st e p s) 0. 1 8 4 0. 2 1 9 0. 9 4 1 0. 7 8 6 0. 8 0 9 0. 8 1 1 1 3. 2 6 7 1 2. </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>W hil e S A C n o w h a s o v e r</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>3 8, 0 0 0 i m a g e s, i t o nl y h a d 4, 0 0 0-5, 0 0 0 w h e n u s e d t o t r ai n t h e L AI O N a e s t h e ti c s p r e di c t o r.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>i f n e w _ f a d _ p o s &lt; f a d _ p o s : # I f d a t a s e t F A D i m p r o v e d , a c c e p t t h e s w a p 3 f a d _ p o s = n e w _ f a d _ p o s 3 e l s e : # I f d a t a s e t F A D d i d n o t i m p r o v e , r e v e r t t h e s w a p</p></note>
		</body>
		</text>
</TEI>
