<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Conditional score-based diffusion models forBayesian inference in infinite dimensions</title></titleStmt>
			<publicationStmt>
				<publisher>NeurIPS</publisher>
				<date>12/14/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10518799</idno>
					<idno type="doi"></idno>
					
					<author>L Baldassari</author><author>A Siahkoohi</author><author>J Garnier</author><author>K Solna</author><author>V M de_Hoop</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Since their initial introduction, score-based diffusion models (SDMs) have been successfullyapplied to solve a variety of linear inverse problems in finite-dimensionalvector spaces due to their ability to efficiently approximate the posterior distribution.However, using SDMs for inverse problems in infinite-dimensional functionspaces has only been addressed recently, primarily through methods that learnthe unconditional score. While this approach is advantageous for some inverseproblems, it is mostly heuristic and involves numerous computationally costly forwardoperator evaluations during posterior sampling. To address these limitations,we propose a theoretically grounded method for sampling from the posterior ofinfinite-dimensional Bayesian linear inverse problems based on amortized conditionalSDMs. In particular, we prove that one of the most successful approaches forestimating the conditional score in finite dimensions—the conditional denoisingestimator—can also be applied in infinite dimensions. A significant part of ouranalysis is dedicated to demonstrating that extending infinite-dimensional SDMsto the conditional setting requires careful consideration, as the conditional scoretypically blows up for small times, contrarily to the unconditional score. Weconclude by presenting stylized and large-scale numerical examples that validateour approach, offer additional insights, and demonstrate that our method enableslarge-scale, discretization-invariant Bayesian inference.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Inverse problems seek to estimate unknown parameters using noisy observations or measurements. One of the main challenges is that they are often ill-posed. A problem is ill-posed if there are no solutions, or there are many (two or more) solutions, or the solution is unstable in relation to small errors in the observations <ref type="bibr">[1]</ref>. A common approach to transform the original ill-posed problem into a well-posed one is to formulate it as a least-squares optimization problem that minimizes the difference between observed and predicted data. However, minimization of the data misfit alone negatively impacts the quality of the obtained solution due to the presence of noise in the data and the inherent nullspace of the forward operator <ref type="bibr">[2,</ref><ref type="bibr">3]</ref>. Casting the inverse problem into a Bayesian probabilistic framework allows, instead, for a full characterization of all the possible solutions <ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref>. The Bayesian approach consists of putting a prior probability distribution describing uncertainty in the parameters of interest, and finding the posterior distribution over these parameters <ref type="bibr">[7]</ref>. The prior must be chosen appropriately in order to mitigate the ill-posedness of the problem and facilitate computation of the posterior. By adopting the Bayesian formulation, rather than finding one single solution to the inverse problem (e.g., the maximum a posteriori estimator <ref type="bibr">[8]</ref>), a distribution of solutions-the posterior-is finally obtained, whose samples are consistent with the observed data. The posterior distribution can then be sampled to extract statistical information that allows for uncertainty quantification <ref type="bibr">[9]</ref>.</p><p>Over the past few years, deep learning-based methods have been successfully applied to analyze linear inverse problems in a Bayesian fashion. In particular, recently introduced score-based diffusion models (SDMs) <ref type="bibr">[10]</ref> have become increasingly popular, due to their ability of producing approximating samples from the posterior distribution <ref type="bibr">[11,</ref><ref type="bibr">12]</ref>. An SDM consists of a diffusion process, which gradually perturbs the data distribution toward a tractable distribution according to a prescribed stochastic differential equation (SDE) by progressively injecting Gaussian noise, and a generative model, which entails a denoising process defined by approximating the time-reversal of the diffusion. Crucially, the denoising stage is also a diffusion process <ref type="bibr">[13]</ref> whose drift depends on the logarithmic gradients of the noised data densities-the scores-which are estimated by Song et al. <ref type="bibr">[10]</ref> using a neural network. Among the advantages of SDMs over other deep generative models is that they produce high-quality samples, matching the performance of generative adversarial networks <ref type="bibr">[14]</ref>, without suffering from training instabilities and mode-collapse <ref type="bibr">[10,</ref><ref type="bibr">15]</ref>. Additionally, SDMs are not restricted to invertible architectures like normalizing flows <ref type="bibr">[16]</ref>, which often limits the complexity of the distributions that can be learned. Finally, and most importantly to the scope of this work, SDMs have demonstrated superior performance in a variety of inverse problems, such as image inpainting <ref type="bibr">[10,</ref><ref type="bibr">17]</ref>, image colorization <ref type="bibr">[10]</ref>, compressing sensing, and medical imaging <ref type="bibr">[12,</ref><ref type="bibr">18]</ref>.</p><p>In the aforementioned cases, SDMs have been applied by assuming that the data distribution of interest is supported on a finite-dimensional vector space. However, in many inverse problems, especially those governed by partial differential equations (PDEs), the unknown parameters to be estimated are functions (e.g., coefficient functions, boundary and initial conditions, or source functions) that exist in a suitable function space, typically an infinite-dimensional Hilbert space. The inverse heat equation or the elliptic inverse source problem presented in <ref type="bibr">[19]</ref> are typical examples of ill-posed inverse problems that are naturally formulated in infinite-dimensional Hilbert spaces. In addition to these PDE-based examples, other interesting cases that are not PDE-based include geometric inverse problems (e.g., determining the Riemann metric from geodesic information or the background velocity map from travel time information in geophysics <ref type="bibr">[20]</ref>) and inverse problems involving singular integral operators <ref type="bibr">[21]</ref>. A potential solution for all of these problems could be to discretize the input and output functions into finite-dimensional vectors and apply SDMs to sample from the posterior. However, theoretical studies of current diffusion models suggest that performance guarantees do not generalize well on increasing dimensions <ref type="bibr">[22]</ref><ref type="bibr">[23]</ref><ref type="bibr">[24]</ref>. This is precisely why Andrew Stuart's guiding principle to study a Bayesian inverse problem for functions-"avoid discretization until the last possible moment" <ref type="bibr">[5]</ref>-is critical to the use of SDMs.</p><p>Motivated by Stuart's principle, in this work we define a conditional score in the infinite-dimensional setting, a critical step for studying Bayesian inverse problems directly in function spaces through SDMs. In particular, we show that using this newly defined score as a reverse drift of the diffusion process yields a generative stage that samples, under specified conditions, from the correct target conditional distribution. We carry out the analysis by focusing on two cases: the case of a Gaussian prior measure and the case of a general class of priors given as a density with respect to a Gaussian measure. Studying the model for a Gaussian prior measure provides illuminating insight, not only because it yields an analytic formula of the score, but also because it gives a full characterization of SDMs in the infinite-dimensional setting, showing under which conditions we are sampling from the correct target conditional distribution and how fast the reverse SDE converges to it. It also serves as a guide for the analysis in the case of a general class of prior measures. Finally, we conclude this work by presenting, in Section 6, stylized and large-scale numerical examples that demonstrate the applicability of our SDM. Specifically, we show that our SDM model (i) is able to approximate non-Gaussian multi-modal distributions, a challenging task that poses difficulties for many generative models <ref type="bibr">[25]</ref>; (ii) is discretization-invariant, a property that is a consequence of our theoretical and computational framework being built on the infinite-dimensional formulation proposed by Stuart <ref type="bibr">[5]</ref>; and (iii) is applicable to solve large-scale Bayesian inverse problems, which we demonstrate by applying it to a large-scale problem in geophysics, i.e., the linearized wave-equation-based imaging via the Born approximation that involves estimating a 256&#215;256-dimensional unknown parameter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Related works</head><p>Our work is primarily motivated by Andrew Stuart's comprehensive mathematical theory for studying PDE-governed inverse problems in a Bayesian fashion <ref type="bibr">[5]</ref>. In particular, we are interested in the infinite-dimensional analysis <ref type="bibr">[7,</ref><ref type="bibr">26]</ref>, which emphasizes the importance of analyzing PDE-governed inverse problems directly in function space before discretization.</p><p>Our paper builds upon a rich and ever expanding body of theoretical and applied works dedicated to SDMs. Song et al. <ref type="bibr">[10]</ref> defined SDMs integrating both score-based (Hyv&#228;rinen <ref type="bibr">[27]</ref>; Song and Ermon <ref type="bibr">[17]</ref>) and diffusion (Sohl-Dickstein et al. <ref type="bibr">[28]</ref>; Ho et al. <ref type="bibr">[29]</ref>) models into a single continuous-time framework based on stochastic differential equations. The generative stage in SDMs is based on a result from Anderson <ref type="bibr">[13]</ref> proving that the denoising process is also a diffusion process whose drift depends on the scores. This result holds only in vector spaces, which explains the difficulties to extend SDMs to more general function spaces. Initially, there have been attempts to project the input functions into a finite-dimensional feature space and then apply SDMs (Dupont et al. <ref type="bibr">[30]</ref>; Phillips et al. <ref type="bibr">[31]</ref>). However, these approaches are not discretization-invariant. It is only very recently that SDMs have been directly studied in function spaces, specifically infinite-dimensional Hilbert spaces. Kerrigan et al. <ref type="bibr">[32]</ref> generalized diffusion models to operate directly in function spaces, but they did not consider the time-continuous limit based on SDEs (Song et al. <ref type="bibr">[10]</ref>). Dutordoir et al. <ref type="bibr">[33]</ref> proposed a denoising diffusion generative model for performing Bayesian inference of functions. Lim et al. <ref type="bibr">[34]</ref> generalized score matching for trace-class noise corruptions that live in the Hilbert space of the data. However, as Kerrigan et al. <ref type="bibr">[32]</ref> and Dutordoir et al. <ref type="bibr">[33]</ref>, they did not investigate the connection to the forward and backward SDEs as Song et al. <ref type="bibr">[10]</ref> did in finite dimensions. Three recent works, Pidstrigach et al. <ref type="bibr">[24]</ref>, Franzese et al. <ref type="bibr">[35]</ref> and Lim et al. <ref type="bibr">[36]</ref>, finally established such connection for the unconditional setting. In particular, Franzese et al. <ref type="bibr">[35]</ref> used results from infinite-dimensional SDEs theory <ref type="bibr">(F&#246;llmer and Wakolbinger [37]</ref>; Millet et al. <ref type="bibr">[38]</ref>) close to Anderson <ref type="bibr">[13]</ref>.</p><p>Among the mentioned works, Pidstrigach et al. <ref type="bibr">[24]</ref> is the closest to ours. We adopt their formalism to establish theoretical guarantees for sampling from the conditional distribution. Another crucial contribution comes from Batzolis et al. <ref type="bibr">[39]</ref>, as we build upon their proof to show that the score can be estimated by using a denoising score matching objective conditioned on the observed data <ref type="bibr">[17,</ref><ref type="bibr">40]</ref>. A key element in Pidstrigach et al. <ref type="bibr">[24]</ref>, emphasized also in our analysis, is obtaining an estimate on the expected square norm of the score that needs to be uniform in time. We explicitly compute the expected square norm of the conditional score in the case of a Gaussian prior measure, which shows that a uniform in time estimate is not always possible in the conditional setting. This is not surprising, given that the singularity in the conditional score as noise vanishes is a well-known phenomenon in finite dimensions and has been investigated in many works, both from a theoretical and a practical standpoint <ref type="bibr">[41,</ref><ref type="bibr">42]</ref>. In our paper, we provide a set of concrete conditions to be satisfied to ensure a uniform estimate in time for a general class of prior measures in infinite dimensions.</p><p>Pidstrigach et al. <ref type="bibr">[24]</ref> have also proposed a method for performing conditional sampling, building upon the approach introduced by Song et al. <ref type="bibr">[12]</ref> in a finite-dimensional setting. Like our approach, their method can be viewed as a contribution to the literature on likelihood-free, simulation-based inference <ref type="bibr">[43,</ref><ref type="bibr">44]</ref>. Specifically, the algorithm proposed by Pidstrigach et al. <ref type="bibr">[24]</ref> relies on a projectiontype approach that incorporates the observed data into the unconditional sampling process via a proximal optimization step to generate intermediate samples consistent with the measuring acquisition process. This allows Pidstrigach et al. <ref type="bibr">[24]</ref> to avoid defining the conditional score<ref type="foot">foot_0</ref> . While their method has been shown to work well with specific inverse problems, such as medical imaging <ref type="bibr">[12]</ref>, it is primarily heuristic, and its computational efficiency varies depending on the specific inverse problem at hand. Notably, their algorithm may require numerous computationally costly forward operator evaluations during posterior sampling. Furthermore, their implementation does not fully exploit the discretization-invariance property achieved by studying the problem in infinite dimensions since they employ a UNet to parametrize their score, limiting the evaluation of their score function to the training interval. The novelty of our work is then twofold. First, we provide theoretically grounded guarantees for an approach that is not heuristic and can be implemented such that it is not constrained to the grid on which we trained our network. As a result, we show that we effectively take advantage of the discretization-invariance property achieved by adopting the infinite-dimensional formulation proposed by Stuart <ref type="bibr">[5]</ref>. Second, we perform discretization-invariant Bayesian inference by learning an amortized version of the conditional score. This is done by making the score function depending on the observations. As a result, provided that we have access to high-quality training data, during sampling we can input any new observation that we wish to condition on directly during simulation of the reverse SDE. In this sense, our method is data-driven, as the information about the forward model is implicitly encoded in the data pairs used to learn the conditional score. This addresses a critical gap in the existing literature, as the other approach using infinite-dimensional SDM resorts to projections onto the measurement subspace for sampling from the posterior-a method that not only lacks theoretical interpretation but may also yield unsatisfactory performance due to costly forward operator computations. There are well-documented instances in the literature where amortized methods can be a preferred option in Bayesian inverse problems <ref type="bibr">[45]</ref><ref type="bibr">[46]</ref><ref type="bibr">[47]</ref><ref type="bibr">[48]</ref><ref type="bibr">[49]</ref><ref type="bibr">[50]</ref>, as they reduce inference computational costs by incurring an offline initial training cost for a deep neural network that is capable of approximating the posterior for unseen observed data, provided that one has access to a set of data pairs that adequately represent the underlying joint distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Main contributions</head><p>The main contribution of this work is the analysis of conditional SDMs in infinite-dimensional Hilbert spaces. More specifically,</p><p>&#8226; We introduce the conditional score in an infinite-dimensional setting (Section 3).</p><p>&#8226; We provide a comprehensive analysis of the forward-reverse conditional SDE framework in the case of a Gaussian prior measure. We explicitly compute the expected square norm of the conditional score, which shows that a uniform in time estimate is not always possible for the conditional score. We prove that as long as we start from the invariant distribution of the diffusion process, the reverse SDE converges to the target distribution exponentially fast (Section 4). &#8226; We provide a set of conditions to be satisfied to ensure a uniform in time estimate for a general class of prior measures that are given as a density with respect to a Gaussian measure. Under these conditions, the conditional score-used as a reverse drift of the diffusion process in SDMs-yields a generative stage that samples from the target conditional distribution (Section 5). &#8226; We prove that the conditional score can be estimated via a conditional denoising score matching objective in infinite dimensions (Section 5). &#8226; We present examples that validate our approach, offer additional insights, and demonstrate that our method enables large-scale, discretization-invariant Bayesian inference (Section 6).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background</head><p>Here, we review the definition of unconditional score-based diffusion models (SDMs) in infinitedimensional Hilbert spaces proposed by Pidstrigach et al. <ref type="bibr">[24]</ref>, as we will adopt the same formalism to define SDMs for conditional settings. We refer to Appendix A for a brief introduction to key tools of probability theory in function spaces.</p><p>Let &#181; data be the target measure, supported on a separable Hilbert space (H, &#10216;&#8226;, &#8226;&#10217;). Consider a forward infinite-dimensional diffusion process (X t ) t&#8712;[0,T ] for continuous time variable t &#8712; [0, T ], where X 0 is the starting variable and X t its perturbation at time t. The diffusion process is defined by the following SDE:</p><p>where C : H &#8594; H is a fixed trace class, positive-definite, symmetric covariance operator and W t is a Wiener process on H. Here and throughout the paper, the initial conditions and the driving Wiener processes in (1) are assumed independent.</p><p>The forward SDE evolves X 0 &#8764; &#181; 0 towards the Gaussian measure N (0, C) as t &#8594; &#8734;. The goal of score-based diffusion models is to convert the SDE in (1) to a generative model by first sampling X T &#8764; N (0, C), and then running the correspondent reverse-time SDE. In the finite-dimensional case, Song et al. <ref type="bibr">[10]</ref> show that the reverse-time SDE requires the knowledge of the score function &#8711; log p t (X t ), where p t (X t ) is the density of the marginal distribution of X t (from now on denoted P t ) with respect to the Lebesgue measure. In infinite-dimensional Hilbert spaces, there is no natural analogue of the Lebesgue measure (for additional details, see <ref type="bibr">[51]</ref>) and the density is thus no longer well defined. However, Pidstrigach et al. <ref type="bibr">[24,</ref><ref type="bibr">Lemma 1]</ref> notice that, in the finite-dimensional setting where H = R D , the score can be expressed as follows:</p><p>for t &gt; 0. Since the right-hand side of the expression above is also well-defined in infinite dimensions, Pidstrigach et al. <ref type="bibr">[24]</ref> formally define the score as follows: Definition 1. In the infinite-dimensional setting, the score or reverse drift is defined by</p><p>Assuming that the expected square norm of the score is uniformly bounded in time, Pidstrigach et al.</p><p>[24, <ref type="bibr">Theorem 1]</ref> shows that the following SDE</p><p>is the time-reversal of ( <ref type="formula">1</ref>) and the distribution of Z T is thus equal to &#181; 0 , proving that the forwardreverse SDE framework of Song et al. <ref type="bibr">[10]</ref> generalizes to the infinite-dimensional setting. The reverse SDE requires the knowledge of this newly defined score, and one approach for estimating it is, similarly to <ref type="bibr">[10]</ref>, by using the denoising score matching loss <ref type="bibr">[40]</ref> E</p><p>where S(t, X t ) is typically approximated by training a neural network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The conditional score in infinite dimensions</head><p>Analogous to the score function relative to the unconditional SDM in infinite dimensions, we now define the score corresponding to the reverse drift of an SDE when conditioned on observations. We consider a setting where X 0 is an H-valued random variable and H is an infinite-dimensional Hilbert space. Denote by</p><p>a noisy observation given by n linear measurements, where the measurement acquisition process is represented by a linear operator A : H &#8594; R n , and B &#8764; N (0, C B ) represents the noise, with C B a n &#215; n nonnegative matrix. Within a Bayesian probabilistic framework, solving (6) amounts to putting an appropriately chosen prior probability distribution &#181; 0 on X 0 , and sampling from the conditional distribution of X 0 given Y = y.</p><p>To the best of our knowledge, the only existing algorithm which performs conditional sampling using infinite-dimensional diffusion models on Hilbert spaces is based on the work of Song et al. <ref type="bibr">[12]</ref>. The idea, adapted to infinite dimensions by Pidstrigach et al. <ref type="bibr">[24]</ref>, is to incorporate the observations into the unconditional sampling process of the SDM via a proximal optimization step to generate intermediate samples that are consistent with the measuring acquisition process. Our method relies instead on utilizing the score of infinite-dimensional SDMs conditioned on observed data, which we introduce in this work. We begin by defining the conditional score, by first noticing that, in finite dimensions, we have the following lemma: Lemma 1. In the finite-dimensional setting where H = R D , we can express the conditional score function for t &gt; 0 as</p><p>Since the right-hand side of ( <ref type="formula">7</ref>) is well-defined in infinite dimensions, by following the same line of thought of Pidstrigach et al. <ref type="bibr">[24]</ref> we formally define the score as follows: Definition 2. In the infinite-dimensional setting, the conditional score is defined by</p><p>Remark 1. It is possible to define the conditional score in infinite-dimensional Hilbert spaces by resorting to the results of <ref type="bibr">[37,</ref><ref type="bibr">38]</ref>, see Appendix C.1.</p><p>For Definition 2 to make sense, we need to show that if we use (8) as the drift of the time-reversal of the SDE in (1) conditioned on y, then it will sample the correct conditional distribution of X 0 given Y = y in infinite dimensions. In the next sections, we will carry out the analysis by focusing on two cases: the case of a Gaussian prior measure N (0, C &#181; ), and the case where the prior of X 0 is given as a density with respect to a Gaussian measure, i.e.,</p><p>where C &#181; is positive and trace class and &#934; is bounded with</p><p>4 Forward-reverse conditional SDE framework for a Gaussian prior measure</p><p>We begin our analysis of the forward-reverse conditional SDE framework by examining the case where the prior of X 0 is a Gaussian measure. This case provides illuminating insight, not only because it is possible to get an analytic formula of the score, but also because it offers a full characterization of SDMs in the infinite-dimensional setting, showing under which conditions we are sampling from the correct target conditional distribution and how fast the reverse SDE converges to it. We also show that the conditional score can have a singular behavior at small times when the observations are noiseless, in contrast with the unconditional score under similar hypotheses.</p><p>We assume that &#934; = 0 in <ref type="bibr">(9)</ref>. All distributions in play are Gaussian:</p><p>where <ref type="bibr">[52]</ref>, there exist (&#181; j ) in [0, +&#8734;) and an orthonormal basis (v j ) in H such that C &#181; v j = &#181; j v j &#8704;j. We consider the infinite-dimensional case with &#181; j &gt; 0 &#8704;j. We assume that C &#181; is trace class so that j &#181; j &lt; +&#8734;. We assume that the functions v j are eigenfunctions of C and we denote by &#955; j the corresponding eigenvalues.</p><p>We assume an observational model corresponding to observing a finite-dimensional subspace of H spanned by v &#951;(1) , . . . , v &#951;(n) corresponding to g k = v &#951;(k) , k = 1, . . . , n, where g j &#8712; H is such that (Af ) j = &#10216;g j , f &#10217;. We denote I (n) = {&#951;(1), . . . , &#951;(n)}. We assume moreover C B = &#963; 2 B I n . Let Z t be the solution of reverse-time SDE:</p><p>We want to show that the reverse SDE we have just formulated in <ref type="bibr">(14)</ref> indeed constitutes a reversal of the stochastic dynamics from the forward SDE in (1) conditioned on y. To this aim, we will need the following lemma:</p><p>Lemma 2. We define Z (j) = &#10216;v j , Z&#10217;, p (j) = &#955; j /&#181; j for all j. We also define y (j) = y &#951;(j) for j &#8712; I (n) and y (j) = 0 otherwise, and q (j) = &#181; j /&#963; 2 B for j &#8712; I (n) and q (j) = 0 otherwise. Then we can write for all j dZ</p><p>with W (j) independent and identically distributed standard Brownian motions,</p><p>, &#181; (y,j) (t) = e t/2 p (j) q (j) 1 + (e t -1)p (j) (1 + q (j) )</p><p>.</p><p>Proof. The proof is a Gaussian calculation. It relies on computing &#10216;S, v j &#10217;, which yields an analytic formula. See Appendix B. Lemma 2 enables us to discuss when we are sampling from the correct target conditional distribution</p><p>We can make a few remarks:</p><p>&#8226; In the limit T &#8594; &#8734;, we get &#181; (x,j) (T -t) &#8594; -1/2 and &#181; (y,j) (T -t) &#8594; 0.</p><p>&#8226; If j / &#8712; I (n) then we have the same mode dynamics as in the unconditional case. Thus we sample from the correct target distribution if T is large or if we start from Z</p><p>T ) with</p><p>z(j) T = z(j) 0 e T /2 1 + (e T -1)p (j) (1 + q (j) ) + y (j) q (j) 1 + q (j) 1 -</p><p>The distribution of X (j) 0</p><p>= &#10216;X 0 , v j &#10217; given Y = y is N (y (j) q (j) /(1 + q (j) ), &#181; j /(1 + q (j) ). As z(j) T &#8594; y (j) q (j) /(1 + q (j) ) and &#931; (j)</p><p>T &#8594; &#181; j /(1 + q (j) ) as T &#8594; +&#8734;, this shows that we sample from the exact target distribution (the one of X 0 given Y = y) for T large.</p><p>&#8226; If we start the reverse-time SDE from the correct model z(j) 0 = e -T /2 y (j) q (j) 1 + q (j) , &#931;</p><p>then indeed Z</p><p>T &#8764; N (y (j) q (j) /(1 + q (j) ), &#181; j /(1 + q (j) )). This shows that, for any T , Z T has the same distribution as X 0 given Y = y, which is the exact target distribution. We can show similarly that Z T -t has the same distribution as X t given Y = y for any t &#8712; [0, T ].</p><p>&#8226; In the case that &#963; B = 0 so that we observe the mode values perfectly for j &#8712; I (n) , then</p><p>and indeed lim t&#8593;T Z (j) t = y (j) a.s. Indeed the t -1 singularity at the origin drives the process to the origin like in the Brownian bridge.</p><p>Our analysis shows that, as long as we start from the invariant distribution of the diffusion process, we are able to sample from the correct target conditional distribution and that happens exponentially fast. This proves that the score of Definition 2 is the reverse drift of the SDE in <ref type="bibr">(14)</ref>. Additionally, the analysis shows that the score is uniformly bounded, except when there is no noise in the observations, blowing up near t = 0. Remark 2. Note that, for q (j) = 0, we obtain the unconditional model:</p><p>If C = C &#181; , the square expectation of the norm and the Lipschitz constant of the score are uniformly bounded in time:</p><p>Proof. The proof is a Gaussian calculation given in Appendix B.</p><p>In the unconditional setting, we have E[&#8741;S(t, X t )&#8741; 2 H ] = j e t 1+(e t -1)p (j) &#955; 2 j &#181;j which is equal to j &#955; j when C = C &#181; . It is indeed uniformly bounded in time.</p><p>In the conditional and noiseless setting (&#963; B = 0), we have</p><p>e t 1+(e t -1)p (j)</p><p>1-e -t , which blows up as 1/t as t &#8594; 0. This result shows that the extension of the score-based diffusion models to the conditional setting is not trivial.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Well-posedness for the reverse SDE for a general class of prior measures</head><p>We are now ready to consider the case of a general class of prior measures given as a density with respect to a Gaussian measure. The analysis of this case resembles the one of Pidstrigach et al. <ref type="bibr">[24]</ref> for the unconditional setting. The main challenge is the singularity of the score for small times, an event that in the Gaussian case was observed in the noiseless setting. In this section we will provide a set of conditions to be satisfied by &#934; in <ref type="bibr">(9)</ref>, so that the conditional score is bounded uniformly in time. The existence of this bound is needed to make sense of the forward-reverse conditional SDE, and to prove the accuracy and stability of the conditional sampling.</p><p>We start the analysis by recalling that, in the infinite-dimensional case, the conditional score is <ref type="bibr">(8)</ref>. It is easy to get a first estimate:</p><p>The proof follows from Jensen inequality and the law of total expectation, see Appendix C. Note that ( <ref type="formula">23</ref>) is indeed an upper bound of <ref type="bibr">(22)</ref> since Tr(C) = j &#955; j .</p><p>Note that the bound ( <ref type="formula">23</ref>) is also valid for the unconditional score</p><p>. We can observe that the upper bound (23) blows up in the limit of small times. We can make a few comments: &#8226; The bound ( <ref type="formula">23</ref>) is convenient for positive times, but the use of Jensen's inequality results in a very crude bound for small times. As shown in the previous section, we know that there exists a bound <ref type="bibr">(21)</ref> for the unconditional score in the Gaussian case that is uniform in time. &#8226; The singular behavior as 1/t at small time t is, however, not artificial. Such a behavior is needed in order to drive the state to the deterministic initial condition when there are exact observations. This behavior has been exhibited by ( <ref type="formula">20</ref>) and ( <ref type="formula">22</ref>) in the Gaussian case when &#963; B = 0. This indicates that the following assumption ( <ref type="formula">24</ref>) is not trivial in the conditional setting.</p><p>For Definition 2 to make sense in the more general case where the prior of X 0 is given as a density with respect to a Gaussian measure, we will need to make the following assumption. Assumption 1. For any y &#8712; R n , we have sup</p><p>We are now ready to state the analogous result to Pidstrigach et al. <ref type="bibr">[24,</ref><ref type="bibr">Theorem 1]</ref>. Proposition 2. Under Assumption 1, the solution of the reverse-time SDE</p><p>satisfies Z T &#8764; X 0 |Y = y.</p><p>Proof. Given Assumption 1, the proof follows the same steps as the one given in <ref type="bibr">[24]</ref> for the unconditional score. See Appendix C for the full proof.</p><p>Assumption 1 is satisfied under some appropriate conditions. In the following proposition, we provide a set of conditions that ensure the satisfaction of this assumption. It shows that it is possible to get an upper bound in <ref type="bibr">(23)</ref> that is uniform in time provided some additional conditions are fulfilled. Proposition 3. We assume that C &#181; in (9) and C in (1) have the same basis of eigenfunctions (v j ) and we define X (j) t = &#10216;X t , v j &#10217; and S (j) (t, x, y) = &#10216;S(t, x, y), v j &#10217; so that in (1) S(t, x, y) = j S (j) (t, x, y)v j . We assume an observational model as described in Section 4 and that the p (j) (1 + q (j) ) are uniformly bounded with respect to j and that C is of trace class. We make a modified version of assumption in (9) as follows. We assume that 1) the conditional distribution of X 0 given Y = y is absolutely continuous with respect to the Gaussian measure &#181; with a Radon-Nikodym derivative proportional to exp(-&#936;(x 0 , y)); 2) we have &#936;(x 0 , y) = j &#936; (j) (x (j) 0 , y), x (j) 0 = &#10216;x 0 , v j &#10217;; 3) for &#968; (j) (x (j) , y) = exp(-&#936; (j) (x (j) , y)) we have</p><p>where K and L do not depend on j. Then Assumption 1 holds true.</p><p>Proof. The proof is given in Appendix C.</p><p>To use the new score function of Definition 2 for sampling from the posterior, we need to define a way to estimate it. In other words, we need to define a loss function over which the difference between the true score and a neural network s &#952; (t, x t , y) is minimized in &#952;. A natural choice for the loss function is</p><p>however it cannot be minimized directly since we do not have access to the ground truth conditional score S(t, x t , y). Therefore, in practice, a different objective has to be used. Batzolis et al. <ref type="bibr">[39]</ref> proved that, in finite dimensions, a denoising score matching loss can be used:</p><p>This expression involves only &#8711; xt log p(x t |x 0 ) which can be computed analytically from the transition kernel of the forward diffusion process, also in infinite dimensions. In the following proposition, we build on the arguments of Batzolis et al. <ref type="bibr">[39]</ref> and provide a proof that the conditional denoising estimator is a consistent estimator of the conditional score in infinite dimensions. Proposition 4. Under Assumption 1, the minimizer in &#952; of</p><p>is the same as the minimizer of</p><p>The same result holds if we add t &#8764; U (0, T ) in the expectations.</p><p>Proof. The proof combines some of the arguments of Batzolis et al. <ref type="bibr">[39]</ref> and steps of the proof of Lemma 2 in <ref type="bibr">[24]</ref>, see Appendix C.</p><p>Remark 3. A statement of robustness can be written as in <ref type="bibr">[24,</ref><ref type="bibr">Theorem 2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Numerical experiments</head><p>To put the presented theoretical results into practice, we provide two examples. The first stylized example aims at showcasing (i) the ability of our method in capturing nontrivial conditional distributions; and (ii) the discretization-invariance property of the learned conditional SDM. In the second example, we sample from the posterior distribution of a linearized seismic imaging problem in order to demonstrate the applicability of our method to large-scale problems. In both examples, in order to enable learning in function spaces, we parameterize the conditional score using Fourier neural operators <ref type="bibr">[53]</ref>. Details regarding our experiment and implementation<ref type="foot">foot_2</ref> are presented at Appendix D.</p><p>Stylized example Inspired by Phillips et al. <ref type="bibr">[31]</ref>, we define the target density via the relation  and <ref type="figure">1c</ref> show the predicted samples for grid sizes of 25 and 35, respectively. The marginal conditionals associated with y = -1.0, 0.0, 0.5 are shown in Figures <ref type="figure">1d-1f</ref>, respectively. The gray shaded density in the bottom row of Figure <ref type="figure">1</ref> indicates the ground truth density, and colored estimated densities correspond to different discretizations of the horizontal axis. The visual inspection of samples and estimated densities indicates that our approach is indeed discretization-invariant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Linearized seismic imaging example</head><p>In this experiment, we address the problem of estimating the short-wavelength component of the Earth's subsurface squared-slowness model (i.e., seismic image; cf. Figure <ref type="figure">2a</ref>) given surface measurements and a long-wavelength, smooth squared-slowness model (cf. Figure <ref type="figure">2b</ref>). Following Orozco et al. <ref type="bibr">[54]</ref>, in order to reduce the high dimensionality of surface measurements, we apply the adjoint of the forward operator, the Born scattering operator, to the measurements and use the outcome (cf. (a) (b) (c) (d) (e) (f) Figure 2: Seismic imaging and uncertainty quantification. (a) Ground-truth seismic image. (b) Background squared-slowness. (c) Data after applying the adjoint Born operator. (d) Conditional (posterior) mean. (e) Pointwise standard deviation. (f) Absolute error between Figures <ref type="figure">2a</ref> and <ref type="figure">2d</ref>.</p><p>samples to estimate the conditional mean (cf. Figure <ref type="figure">2d</ref>), which corresponds to the minimum-variance estimate <ref type="bibr">[55]</ref>, and the pointwise standard deviation (cf. Figure <ref type="figure">2e</ref>), which we use to quantify the uncertainty. As expected, the pointwise standard deviation highlights areas of high uncertainty, particularly in regions with complex geological structures-such as near intricate reflectors and areas with limited illumination (deep and close to boundaries). We also observe a strong correlation between the pointwise standard deviation and the error in the conditional mean estimate (Figure <ref type="figure">2f</ref>), confirming the accuracy of our Bayesian inference method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions</head><p>We introduced a theoretically-grounded method that is able to perform conditional sampling in infinitedimensional Hilbert (function) spaces using score-based diffusion models. This is a foundational step in using diffusion models to perform Bayesian inference. To achieve this, we learned the infinite-dimensional score function, as defined by Pidstrigach et al. <ref type="bibr">[24]</ref>, conditioned on the observed data. Under mild assumptions on the prior, this newly defined score-used as the reverse drift of the diffusion process-yields a generative model that samples from the posterior of a linear inverse problem. In particular, the well-known singularity in the conditional score for small times can be avoided. Building on these results, we presented stylized and large-scale examples that showcase the validity of our method and its discretization-invariance, a property that is a consequence of our theoretical and computational framework being built on infinite-dimensional spaces.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A Probability measures on infinite-dimensional Hilbert spaces</head><p>In this section, we briefly present some fundamental notions related to probability measures on infinite-dimensional spaces, specifically separable Hilbert spaces (H, &#10216;&#8226;, &#8226;&#10217;). There is abundant literature on the subject. For more details we refer to Stuart <ref type="bibr">[5]</ref>, Pidstrigach et al. <ref type="bibr">[24]</ref>, Kerrigan et al. <ref type="bibr">[32]</ref>, Prato <ref type="bibr">[51]</ref> and references therein.</p><p>A.1 Gaussian measures on Hilbert spaces Definition 3. Let (&#8486;, F , P) be a probability space. A measurable function X : &#8486; &#8594; H is called a Gaussian random element (GRE) if for any h &#8712; H, the random variable &#10216;h, X&#10217; has a scalar Gaussian distribution.</p><p>Every GRE X has a mean element m &#8712; H defined by</p><p>and a linear covariance operator C : H &#8594; H defined by</p><p>We denote X &#8764; N (m, C) for a GRE in H with mean element m and covariance operator C. It can be shown that the covariance operator of a GRE is trace class, positive-definite and symmetric. Conversely, for any trace class, positive-definite and symmetric linear operator C : H &#8594; H and every m &#8712; H, there exists a GRE with X &#8764; N (m, C). This leads us to the following definition: Definition 4. If X is a GRE, the pushforward of P through X, denoted by P X , is called a Gaussian probability measure on H. We will write P X = N (m, C).</p><p>Let X &#8764; N (m, C). We can make a few remarks: 1) For any h &#8712; H, we have &#10216;h, X&#10217; &#8764; N (&#10216;h, m&#10217;, &#10216;Ch, h&#10217;).</p><p>2) C is compact. By Mercer theorem <ref type="bibr">[52]</ref> there exists (&#955; j ) and an orthonormal basis of eigenfunctions (v j ) such that &#955; j &#8805; 0 and Cv j = &#955; j v j &#8704;j. We consider the infinite-dimensional case in which &#955; j &gt; 0 &#8704;j.</p><p>3) Suppose m = 0 (we call the Gaussian measure of X centered). The expected square norm of X is given by</p><p>which is finite since C is trace class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2 Absolutely continuous measures and the Feldman-Hajek theorem</head><p>Here we introduce the notion of absolute continuity for measures. Definition 5. Let &#181; and &#957; be two probability measures on H equipped with its Borel &#963;-algebra B(H). Measure &#181; is absolutely continuous with respect to &#957; (we write &#181; &#8810; &#957;) if &#181;(&#931;) = 0 for all &#931; &#8712; B(H) such that &#957;(&#931;) = 0. Definition 6. If &#181; &#8810; &#957; and &#957; &#8810; &#181; then &#181; and &#957; are said to be equivalent and we write &#181; &#8764; &#957;. If &#181; and &#957; are concentrated on disjoint sets then they are called singular; in this case we write &#181; &#8869; &#957;.</p><p>Another notion that will be used throughout the paper is the Radon-Nikodym derivative. Theorem 1. Let &#181; and &#957; be two measures on (H, B(H)) and &#957; be &#963;-finite. If &#181; &#8810; &#957;, then there exists a &#957;-measurable function f on H such that</p><p>Furthermore, f is unique &#957;-a.e. and is called the Radon-Nikodym derivative of &#181; with respect to &#957;. It is denoted by d&#181;/d&#957;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C Proofs of Section 5 C.1 Discussion about an alternative approach</head><p>The following lemma is a complementary result related to Remark 1. It shows that we can actually derive the expression of the score from the results contained in Millet et al. <ref type="bibr">[38]</ref>. The result is powerful, but requires the verification of technical conditions.</p><p>Lemma 3. Under the conditions stated in Proposition 3 the score is defined by</p><p>and the time reversed diffusion takes the form in <ref type="bibr">(14)</ref>.</p><p>Proof. Define</p><p>Then</p><p>and where we assume that C is of trace class. This is then an infinite dimensional system of the type considered in Millet et al. <ref type="bibr">[38]</ref>. We proceed to verify some conditions stated in Millet et al. <ref type="bibr">[38]</ref>: (i) the coefficients of the system (33) satisfy standard growth and Lipschitz continuity conditions (assumption (H1, H4) satisfied); (ii) the coefficients depend on finitely many coordinates (assumption (H2) satisfied); the system is time independent and diagonal (assumption (H5) satisfied). Moreover define x(j) = (x 1 , . . . , x j-1 , x j+1 , . . .), then the law of X (j) t given X(j) t has for t &gt; 0 density p t (x (j) | X(j) t = x(j) , Y = y) with respect to Lebesgue measure and so that for t 0 &gt; 0 and each j:</p><p>Then it follows from Theorems 3.1 and 4.3 in Millet et al. <ref type="bibr">[38]</ref> that the time reversed problem is associated with the well-posed martingale problem defined by the coefficients in <ref type="bibr">(14)</ref> for the score being:</p><p>with the convention that the right hand side is null on the set {p t (x (j) | X(j)</p><p>It then follows for t &gt; 0</p><p>.</p><p>We then get</p><p>x j -e -t/2 x (j) 0</p><p>1 -e -t .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2 A preliminary lemma</head><p>The following lemma is the equivalent of <ref type="bibr">[24,</ref><ref type="bibr">Lemma 3]</ref>. It is used in the forthcoming proof of Proposition 2. Lemma 4. In the finite-dimensional setting x &#8712; R D , we have for any 0 &#8804; s &#8804; t &#8804; T :</p><p>where E y is the expectation with respect to the distribution of X 0 and W given Y = y and p t,y is the pdf of X t under this distribution.</p><p>Proof. We can write</p><p>y (x s )p t|s,y (x t |x s )dx s , where p t|s,y (&#8226;|x s ) is the pdf of X t given Y = y and X s = x s . It is, in fact, equal to the pdf of X t given X s = x s , which is the pdf of the multivariate Gaussian distribution with mean exp(-(t -s)/2)x s and covariance (1 -exp(-(t -s)))C. Therefore p t,y (x t ) = R D p s,y (x s )p t|s (x t |x s )dx s .</p><p>We can then deduce that</p><p>s)/2 R D dx s &#8711; xs p s,y (x s ) p t|s (x t |x s ), which gives &#8711;p t,y (x t ) = e (t-s)/2 R D p t|s (x t |x s )p s,y (x s )&#8711; log p s,y (x s )dx s . Using again that p t|s,y (&#8226;|x s ) = p t|s (&#8226;|x s ) and p t|s,y (x t |x s ) = p (s,t),y (xs,xt) ps,y(xs) , we get &#8711;p t,y (x t ) = e (t-s)/2 R D p t|s,y (x t |x s )p s,y (x s )&#8711; log p s,y (x s )dx s = e (t-s)/2 R D p (s,t),y (x s , x t )&#8711; log p s,y (x s )dx s . Since &#8711; log p t,y (x t ) = &#8711;pt,y(xt) pt,y(xt) and p s|t,y (x s |x t ) = p (s,t),y (xs,xt) pt,y(xt) we get that &#8711; log p t,y (x t ) = e (t-s)/2 R D p s|t,y (x s |x t )&#8711; log p s,y (x s )dx s = e (t-s)/2 E y &#8711; log p s,y (X s )|X t = x t ].</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.3 Proof of Proposition 2</head><p>The proof adapts the one of <ref type="bibr">[24]</ref> to the conditional setting. The only difference is that the expectation is E y , which affects the distribution of X 0 but not the one of W . Moreover, Lemma 4 shows that the key to the proof (the reverse-time martingale property of the finite-dimensional score) is still valid.</p><p>Here E y is the expectation with respect to the distribution of X 0 and W given Y = y.</p><p>To prove Proposition 2, we are left to show that the solution of the reverse-time SDE</p><p>satisfies Z T &#8764; X 0 |Y = y. We recall that X t is the solution to the SDE</p><p>We first notice that X t is given by the following stochastic convolution:</p><p>For P D the orthogonal projection on the subspace of H spanned by v 1 , . . . , v D (the eigenfunctions of C), X D t = P D (X t ) are solutions to</p><p>where the superscript D : M indicates the projection onto span{v D+1 , . . . , v M }. It holds that</p><p>as D &#8594; &#8734;, where we used Doob's L 2 inequality to bound the stochastic integral. Therefore (X N t ) is a Cauchy sequence and converges to X t in L 2 (P y ). Consequently, the distribution of X N t given Y = y converges to the distribution of X t given Y = y as N &#8594; +&#8734;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Recall that</head><p>In particular, due to the tower property of the conditional expectations,</p><p>are bounded in L 2 (P y ) and will converge to the limit, E y [S(t, X t , y) | X t ] = S(t, X t , y), by the Martingale convergence theorem. We get rid of the projection P D by</p><p>The first term vanishes due to our previous discussion. The second term vanishes since</p><p>We now make use of the fact that &#8711; log p D t,y is a square-integrable Martingale in the reverse-time direction by Lemma 4. We therefore get a sequence of continuous L 2 -bounded Martingales converging to a stochastic process. Since the space of continuous L 2 -bounded martingale is closed and pointwise convergence translates to uniform convergence, we get that S is a L 2 -bounded martingale, with the convergence of of C D &#8711; log p D t,y to S being uniform in time. We have that</p><p>Since all the terms on the left-hand side converge in L 2 , uniformly in t, so does the right-hand side. Using again the closedness of the spaces of Martingales and Levy's characterization of Wiener process, we find that (C D )W D t converges to</p><p>Therefore, Z t is indeed a solution to (34) and Z T &#8764; X 0 |Y = y. Using uniqueness of the solution we then conclude that this holds for any solution Z t .</p><p>C.4 Proof of ( <ref type="formula">23</ref>)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.5 Proof of Proposition 3</head><p>Note that with the assumptions in Proposition 3 with C &#181; and C having the same basis of eigenfunctions and the separability assumption on the Radon-Nikodym derivative for the modes, the system for the modes again diagonalizes. However, in this case the (conditional) distribution for X (j) 0 is non-Gaussian in general and the change of measure with respect to the Gaussian measure characterized by &#968; (j) (x (j) , y). We let the superscript g denote the Gaussian case with &#968; &#8801; 1, then p (j) (1 + q (j) ) &#8593; &#8734; would happen for instance in a limit of perfect mode observation so that &#963; B &#8595; 0 and thus q (j) &#8593; &#8734;. Indeed in the limit of small (conditional) target mode variabilty relative to the diffusion noise parameter the score drift becomes large for small time to drive the mode to the conditional target distribution. We here thus assume p (j) (1 + q (j) ) is uniformly bounded with respect to mode (j index), moreover, that C is of trace class. We then find that Assumption 1 is satisfied with the following bound &#955; j e T K 4 p (j) (1 + q (j) ) + 2&#955; j e T (LK) 2 1 + 3 p (j) (1 + q (j) )(e T -1)</p><p>We remark that in the case that we do not have a uniform bound on the p (j) (1 + q (j) )'s it follows from <ref type="bibr">(23)</ref> that the rate of divergence of the expected square norm of the score is at most t -1 as t &#8595; 0 with C of trace class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.6 Proof of Proposition 4</head><p>We start from ( <ref type="formula">30</ref>):</p><p>y&#8764;L(Xt,Y ) &#8741;S(t, x t , y) -s &#952; (t, x t , y)&#8741; 2 H = E xt,y&#8764;L(Xt,Y ) &#8741;S(t, x t , y)&#8741; 2 H + E xt,y&#8764;L(Xt,Y ) &#8741;s &#952; (t, x t , y)&#8741; 2 H -2E xt,y&#8764;L(Xt,Y ) &#10216;S(t, x t , y), s &#952; (t, x t , y)&#10217; . From Definition 1 we have E xt,y&#8764;L(Xt,Y ) &#10216;S(t, x t , y), s &#952; (t, x t , y)&#10217; = -(1 -e -t ) -1 E xt,y&#8764;L(Xt,Y ) x t -e -t/2 E x0&#8764;L(X0|Xt=xt,Y =y) [x 0 ], s &#952; (t, x t , y) = -(1 -e -t ) -1 E xt,y&#8764;L(Xt,Y ) E x0&#8764;L(X0|Xt=xt,Y =y) x t -e -t/2 x 0 , s &#952; (t, x t , y) = -(1 -e -t ) -1 E (x0,xt,y)&#8764;L(X0,Xt,Y ) x t -e -t/2 x 0 , s &#952; (t, x t , y) . We obtain that E xt,y&#8764;L(Xt,Y ) &#8741;S(t, x t , y) -s &#952; (t, x t , y)&#8741; 2 H = B + E (x0,xt,y)&#8764;L(X0,Xt,Y ) &#8741; -(1 -e -t ) -1 (x t -e -t/2 x 0 ) -s &#952; (t, x t , y)&#8741; 2 H , with B = E xt,y&#8764;L(Xt,Y ) &#8741;S(t, x t , y)&#8741; 2 H -E (x0,xt)&#8764;L(X0,Xt) &#8741;(1 -e -t ) -1 (x t -e -t/2 x 0 )&#8741; 2 H that does not depend on &#952;. Since L(X t |X 0 = x 0 , Y = y) = L(X t |X 0 = x 0 ) we finally get that E xt,y&#8764;L(Xt,Y ) &#8741;S(t, x t , y) -s &#952; (t, x t , y)&#8741; 2 H = B + E x0,y&#8764;L(X0,Y ),xt&#8764;L(Xt|X0=x0) &#8741; -(1 -e -t ) -1 (x t -e -t/2 x 0 ) -s &#952; (t, x t , y)&#8741; 2 H .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D Numerical experiments: details and additional results</head><p>In this section, we provide additional details regarding our numerical experiments. In both experiments, we parameterize the conditional score s &#952; (t, x t , y) using discretization-invariant Fourier neural operators <ref type="bibr">[FNOs;</ref><ref type="bibr">53]</ref>. This parameterization enables mapping input triplets (t, x t , y) to the score conditioned on y at time t. Once trained-by minimizing the objective function in equation ( <ref type="formula">28</ref>) with respect to &#952;-we use the FNO as an approximation to the conditional score to sample new realizations of the conditional distribution by simulating the reverse-time SDE in equation <ref type="bibr">(14)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.1 Stylized example</head><p>In this example, the conditional distribution that we approximate is defined using the relation</p><p>(a) (b) (c) (d) (e) (f) (g) </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>We note that, in a newer version of their paper submitted to arXiv on October 3,</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2023" xml:id="foot_1"><p>(four months after our submission to arXiv and NeurIPS 2023), Pidstrigach et al.<ref type="bibr">[24]</ref> abandoned the projection-type approach. Instead, they invoke the conditional score function to perform posterior sampling and solve an inverse problem in a similar fashion to ours (we refer to Section 8.3 of their paper for details). However, they still do not address the well-posedness of the forward-reverse conditional SDE and the singularity of the conditional score, and their implementation is based on UNets and, thus, is not discretization-invariant.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>Code to reproduce results can be found at https://github.com/alisiahkoohi/csgm.</p></note>
		</body>
		</text>
</TEI>
