<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Event-horizon-scale Imaging of M87* under Different Assumptions via Deep Generative Image Priors</title></titleStmt>
			<publicationStmt>
				<publisher>American Astronomical Society</publisher>
				<date>11/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10490772</idno>
					<idno type="doi">10.3847/1538-4357/ad737f</idno>
					<title level='j'>The Astrophysical Journal</title>
<idno>0004-637X</idno>
<biblScope unit="volume">975</biblScope>
<biblScope unit="issue">2</biblScope>					

					<author>Berthy T Feng</author><author>Katherine L. Bouman</author><author>William T. Freeman</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<title>Abstract</title> <p>Reconstructing images from the Event Horizon Telescope (EHT) observations of M87*, the supermassive black hole at the center of the galaxy M87, depends on a prior to impose desired image statistics. However, given the impossibility of directly observing black holes, there is no clear choice for a prior. We present a framework for flexibly designing a range of priors, each bringing different biases to the image reconstruction. These priors can be weak (e.g., impose only basic natural-image statistics) or strong (e.g., impose assumptions of black hole structure). Our framework uses Bayesian inference with score-based priors, which are data-driven priors arising from a deep generative model that can learn complicated image distributions. Using our Bayesian imaging approach with sophisticated data-driven priors, we can assess how visual features and uncertainty of reconstructed images change depending on the prior. In addition to simulated data, we image the real EHT M87* data and discuss how recovered features are influenced by the choice of prior.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In 2019, the Event Horizon Telescope (EHT) Collaboration obtained the first picture of M87 * through computational imaging methods (Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019a</ref><ref type="bibr">Collaboration et al. , 2019d, 2019f), 2019f)</ref>. The published images gave humans a glimpse of the shadow cast by the supermassive black hole (J.-P. <ref type="bibr">Luminet 1979;</ref><ref type="bibr">H. Falcke et al. 2000</ref>; R.-S. <ref type="bibr">Lu et al. 2014)</ref> in the galaxy M87 based on data that EHT telescopes had collected in 2017 April (Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019b</ref><ref type="bibr">Collaboration et al. , 2019c))</ref>, and more recent images showed the persistence of the shadow a year later (K. <ref type="bibr">Akiyama et al. 2024</ref>). However, these images necessarily incorporated imaging assumptions that were independent of telescope data. Because measurements obtained from very long baseline interferometry (VLBI; P. H. van Cittert 1934; A. R. <ref type="bibr">Thompson et al. 2017)</ref> with EHT telescopes are corrupted and limited in number, infinitely many imagesmany of them implausible and not interpretable-would agree with a given set of measurements. Therefore, reconstructing an image from VLBI data requires assumptions about plausible image statistics in order to constrain the space of possible images (Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019d)</ref>.</p><p>Imaging assumptions can be formalized as a prior, or a probability distribution of images that are acceptable regardless of observations (J. A. <ref type="bibr">Scales &amp; L. Tenorio 2001)</ref>. More formally, we define a prior as a distribution of images x with a probability density function p(x). In Bayesian inference the prior helps determine the image posterior p(x|y) given observations y. However, designing a prior is not a straightforward task, especially considering the absence of true images of black holes.</p><p>We address this problem with a principled strategy: we collect a range of priors that each impose different visual biases and plug these priors into a Bayesian imaging algorithm along with EHT VLBI data. Whereas the EHT Collaboration explored different imaging assumptions via the use of different imaging pipelines (e.g., CLEAN (J. <ref type="bibr">H&#246;gbom 1974;</ref><ref type="bibr">U. Schwarz 1978;</ref><ref type="bibr">B. Clark 1980;</ref><ref type="bibr">F. Schwab 1984;</ref><ref type="bibr">T. Cornwell et al. 1999;</ref><ref type="bibr">M. Shepherd 2011;</ref><ref type="bibr">H. M&#252;ller &amp; A. Lobanov 2023)</ref> and regularized maximum likelihood (RML) methods (A. A. <ref type="bibr">Chael et al. 2016</ref>; K. L. <ref type="bibr">Bouman et al. 2016;</ref><ref type="bibr">K. Akiyama et al. 2019)</ref>, we explore different priors within the same imaging pipeline. Our imaging approach allows us to assess how visual characteristics and uncertainty, as quantified through a Bayesian posterior, vary with the choice of prior.</p><p>An aim of our method is to easily move along the spectrum between strong constraints and weak constraints on the image. On one side of the spectrum lie model-fitting strategies, which find the parameters of an underlying geometric (Event Horizon Telescope Collaboration et al. 2019f; K. <ref type="bibr">Nalewajko et al. 2020;</ref><ref type="bibr">F. Vincent et al. 2021;</ref><ref type="bibr">H. Sun et al. 2022</ref>; W. Lockhart &amp; S. E. Gralla 2022), physical (K. <ref type="bibr">Gebhardt et al. 2011</ref>; J. L. <ref type="bibr">Walsh et al. 2013;</ref><ref type="bibr">R. Nemmen 2019;</ref><ref type="bibr">Event Horizon Telescope Collaboration et al. 2019e;</ref><ref type="bibr">T. Kawashima et al. 2021;</ref><ref type="bibr">F. Yuan et al. 2022)</ref>, or statistical (L. <ref type="bibr">Medeiros et al. 2023</ref>) model that best match the observations. On the other side lie traditional imaging approaches using weak regularizers like total variation (K. <ref type="bibr">Akiyama et al. 2017;</ref><ref type="bibr">K. Kuramochi et al. 2018</ref>) and maximum entropy (R. <ref type="bibr">Narayan &amp; R. Nityananda 1986)</ref>. However, each side has its own limitations: model fitting prevents discovering new features that cannot be explained by the assumed model, whereas traditional regularizers struggle to produce visually rich images. This motivates a method for imaging under a diverse array of priors, ranging from those akin to model fitting (e.g., by assuming black hole structure) to those similar to weak regularizers (e.g., by assuming basic properties of natural images). One can in principle obtain different priors by changing the regularization function (H. <ref type="bibr">M&#252;ller et al. 2023)</ref> or tuning regularization parameters <ref type="bibr">(Event Horizon Telescope Collaboration et al. 2019d</ref>), but it is impractical to hand-design a regularizer for every desired prior.</p><p>A promising avenue is to use a data-driven prior that is fit to a training set of images with the desired statistics. Parameterizing the data-driven prior to be expressive enough is crucial. For example, principal component analysis offers a simple probabilistic model of images, but it has been shown to not accurately express complicated image distributions (B. T. <ref type="bibr">Feng et al. 2023)</ref>. We choose to parameterize the prior by a scorebased diffusion model (score-based prior), which has been demonstrated as a powerful deep generative model capable of modeling a variety of image probability distributions (J. <ref type="bibr">Ho et al. 2020;</ref><ref type="bibr">P. Dhariwal &amp; A. Nichol 2021;</ref><ref type="bibr">Y. Song et al. 2021b)</ref>, no matter how simple or complicated. That is, given a desired image prior p(x), it can be assumed that the learned prior p &#952; (x) with parameters &#952; trained on samples from p(x) satisfies p &#952; (x) &#8776; p(x) (B. T. <ref type="bibr">Feng et al. 2023)</ref>.</p><p>With score-based priors, we achieve a collection of M87 * images that all fit the observed data but differ in certain visual characteristics. Specifically, we trained a score-based prior on each of the following data sets: CIFAR-10 (A. <ref type="bibr">Krizhevsky &amp; G. Hinton 2009</ref>; generic natural images), general relativistic magnetohydrodynamic (GRMHD) simulations (G. N. <ref type="bibr">Wong et al. 2022)</ref>, radially inefficient accretion flow (RIAF) simulations (A. E. <ref type="bibr">Broderick et al. 2011)</ref>, and CelebA (Z. <ref type="bibr">Liu et al. 2015</ref>; celebrity faces). We use a Bayesian imaging technique to apply each prior to the M87 * observations, resulting in a collection of image posteriors. Each posterior is a probability distribution of images conditioned on the M87 * data but incorporating a different prior. The visual biases of images from different posteriors are strikingly different, yet the images share structure that is prior invariant, such as the ring shape and brightness asymmetry. We thus present two contributions based on our results: (1) a collection of possible M87 * images that humans can selectively interpret based on their trust of the assumed biases, and (2) analysis of which extracted black hole features are robust to the prior and can be reliably used in scientific analysis.</p><p>In this manuscript, we first describe relevant background in Section 2 and our Bayesian imaging method involving scorebased priors in Section 3. In Section 4, we validate the imaging approach on simulated data using a collection of score-based priors ranging from weak biases (e.g., a prior trained on generic natural images) to strong biases (e.g., a prior trained on RIAF images). In Section 5, we present image posteriors of M87 * based on the same collection of priors. Next, in Section 6, we analyze the influence of the prior on characteristic ring features, including diameter, width, and orientation, by performing tests on both the simulated-data and M87 * images. Finally, in Section 7, we summarize our findings and conclude that our proposed imaging strategy allows one to define any collection of priors and analyze their effect on reconstructed images.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Bayesian Imaging</head><p>Given measurements y = f(x * ), where f is a known forward model and x * is an unknown source image, we would like to recover an image x such that f(x) &#8776; y. However, when y is sparse and corrupted, there is inherent uncertainty in the inverse problem of finding x (J. A. <ref type="bibr">Scales &amp; L. Tenorio 2001)</ref>. Bayesian imaging accounts for this uncertainty by modeling a probability distribution known as the posterior, or p(x|y). According to Bayes's rule, the log probability of an image under the posterior is given by</p><p>We refer to p(y|x) as the measurement likelihood, or simply likelihood, and we refer to p(x) as the prior. In this work, the posterior is based on a forward model for interferometric data and a score-based prior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Score-based Priors</head><p>A score-based diffusion model is a deep generative model that learns to sample from an image distribution (J. Sohl-Dickstein et al. 2015; J. <ref type="bibr">Ho et al. 2020;</ref><ref type="bibr">Y. Song et al. 2021b</ref>). We refer to its generative image distribution as a score-based prior and denote it as p &#952; , where &#952; are the learned parameters. Given training images from a target prior p data , the diffusion model is trained so that p &#952; &#8776; p data .</p><p>Many methods have been developed to solve inverse problems with a pretrained diffusion model (A. <ref type="bibr">Jalal et al. 2021;</ref><ref type="bibr">J. Choi et al. 2021;</ref><ref type="bibr">A. Adam et al. 2022;</ref><ref type="bibr">A. Graikos et al. 2022;</ref><ref type="bibr">B. Kawar et al. 2022;</ref><ref type="bibr">H. Chung &amp; J. C. Ye 2022;</ref><ref type="bibr">H. Chung et al. 2022a</ref><ref type="bibr">H. Chung et al. , 2022b</ref><ref type="bibr">H. Chung et al. , 2023;;</ref><ref type="bibr">Y. Song et al. 2022;</ref><ref type="bibr">J. Song et al. 2023;</ref><ref type="bibr">M. Mardani et al. 2023;</ref><ref type="bibr">N. Dia et al. 2023</ref>). However, these methods typically produce a conditional distribution that does not correspond to an exact posterior. Since we instead prioritize posterior estimation, we turn to a more accurate approach that approximates the posterior through variational inference with a stand-alone score-based prior (B. T. Feng &amp; K. L. Bouman 2023; B. T. <ref type="bibr">Feng et al. 2023)</ref>.</p><p>The crucial benefit of a score-based diffusion model is that it allows for computing image probabilities under p &#952; (Y. <ref type="bibr">Song et al. 2021b)</ref>. That is, given any image x, we can compute ( ) q x p log through an analytical, differentiable formula. To sample from a posterior whose prior is a score-based prior, we can use any posterior sampling approach that requires the value or gradient of the posterior log density. While (&#8226;) q p log has been used for posterior sampling (B. T. <ref type="bibr">Feng et al. 2023)</ref>, the function is slow to compute and only feasible for images with at most 32 &#215; 32 pixels. We therefore appeal to a recently proposed surrogate score-based prior that is more computationally efficient (B. T.</p><p>Feng &amp; K. L. Bouman 2023). The surrogate score-based prior is based on the evidence lower bound (ELBO) b &#952; ( &#8226; ) of a score-based prior. Instead of evaluating ( ) q x p log , we evaluate ( ) ( ) q q &#61572; x x b p log as the surrogate log density. Please refer to Appendix C for details about score-based diffusion models, including the formulae for (&#8226;) station-dependent phase errors (Event Horizon Telescope Collaboration et al. 2019c). To overcome station-dependent errors, we use robust data products known as closure quantities. A closure phase (R. Jennison 1958) is given by a triplet of telescopes i, j, k and computed as ( <ref type="bibr">Twiss et al. 1960</ref>) is given by a combination of four telescopes i, j, k, &#8467; and computed as</p><p>. We denote the set of all linearly independent observed closure phases as &#206; &#61522; y N cp cp and that of log closure amplitudes as &#206; &#61522; y N logca logca . In the case of visibilities with a high signalto-noise ratio (i.e., SNR &gt; 1), closure phases and log closure amplitudes approximately experience mean-zero Gaussian thermal noise (A. E. <ref type="bibr">Rogers et al. 1995;</ref><ref type="bibr">A. E. Broderick &amp; D. W. Pesce 2020;</ref><ref type="bibr">L. Blackburn et al. 2020</ref>) with standard deviations s &#206; &#61522; N cp cp and s &#206; &#61522; N logca logca , respectively. We assume the high-SNR setting. Conditioned on an image x, the measurement distribution can be modeled as Gaussian with log likelihoods</p><p>where f cp and f logca are nonlinear forward models. We note that the Gaussian noise is not purely (statistically)</p><p>independent, but it is approximately independent under high-SNR visibilities or can be made independent under a linear transformation (A. E. Broderick &amp; D. W. Pesce 2020; L. Blackburn et al. 2020; P. Arras et al. 2022). Please refer to Appendix D for details about interferometric data products and their forward models. It is possible to use the same imaging algorithm with visibility amplitudes instead of log closure amplitudes. Visibility amplitudes, which have been used for other imaging results (Event Horizon Telescope Collaboration et al. 2019d; L. <ref type="bibr">Medeiros et al. 2023)</ref>, are more constraining than closure amplitudes, but they require calibration according to assumptions such as station-dependent systematic noise. In this work, we focus on using log closure amplitudes in order to avoid tuning the calibration assumptions. The original M87 * work includes reconstructions from both types of data products for reference <ref type="bibr">(Event Horizon Telescope Collaboration et al. 2019d)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method</head><p>Here we describe how to sample from an image posterior given EHT measurements and a score-based prior. The method is that of B. T. Feng &amp; K. L. <ref type="bibr">Bouman (2023)</ref> with a measurement likelihood based on closure quantities (Equation ( <ref type="formula">2</ref>)). We formulate the following posterior log density:</p><p>where p &#952; (x|y) is given a &#952; subscript to clarify its dependence on the score-based prior p &#952; . We include a flux constraint objective</p><p>V 2 , where V(x) is the total flux (i.e., the sum of the pixel values) of the image x and V is the target total flux, because closure quantities do not constrain the total flux. We set V as the median total flux of images sampled from the score-based prior p &#952; and then scale posterior images to the original total flux as measured in the zero-baseline visibility. Please see Appendix A for a discussion on the flux constraint objective. Note that since our priors were trained on compact centered images, we do not need an explicit center-of-light constraint.</p><p>We use variational inference to sample from p &#952; (x|y), approximating the complicated posterior with a tractable distribution known as the variational distribution. We use a RealNVP (L. <ref type="bibr">Dinh et al. 2016)</ref>, a type of deep generative model known as a normalizing flow (G. <ref type="bibr">Papamakarios et al. 2021)</ref>, with parameters f as the variational distribution (H. Sun &amp; K. L. <ref type="bibr">Bouman 2021)</ref>. Samples from the RealNVP are images x from a distribution q f , and we optimize f to minimize an upper bound on the Kullback-Leibler (KL) divergence D KL (q f &#8741;p &#952; ( &#8226; |y)):</p><p>where ( ) ( ) We approximately solve Equation (4) with stochastic gradient descent, iteratively Monte Carlo approximating the KL upper bound with a batch of samples from q f and computing the gradient with respect to f. We find that annealing the weight of the data-fit terms gradually from 0 to 1 helps prevent bad local minima (see Appendix A.2 for a discussion on data annealing). Furthermore, we find that optimization can be sensitive to the chosen target flux V and data annealing schedule. One way to mitigate this is to make sure the diffusion model has a median total flux that is close to the median total flux of the training images. The data annealing may need to be tuned to achieve a local minimum at which the posterior images exhibit characteristic features of the prior (e.g., posterior images should be centered if all the training images for the prior are centered). Once f is optimized, samples x &#8764; q f can be efficiently obtained as</p><p>x q cp logca 2</p><p>samples from an approximate posterior. The RealNVP occasionally outputs slightly negative pixel values, so we clip samples at inference time to a minimum value of 0 to impose a positivity constraint. Please refer to Appendix F for implementation details. Figure <ref type="figure">1</ref> illustrates the essential components of our imaging method: the RealNVP variational posterior and the score-based prior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Score-based Priors Used in This Work</head><p>In this work, we focus on the following score-based priors, each trained on a data set of images assuming a 128 &#956;as field of view (FOV). Unless stated otherwise, we use the terms "prior" and "score-based prior" synonymously. Figure <ref type="figure">2</ref> shows samples from each prior.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>=</head><p>f f q q i are evaluated. The average gradient is computed with respect to f to update f ( i) . In other words, q f is optimized to produce samples that have high probability under both measurement likelihood and prior. Panel (b) zooms in to the score-based prior. A score-based prior is based on a score-based diffusion model, a deep generative model with parameters &#952;, that is trained on images from a target prior. Once trained, the diffusion model generates samples from a generative image distribution p &#952; . There is an analytical formula for computing the ELBO b &#952; (x) of the log probability ( ) q x p log for any image x, even for out-of-distribution images and images of pure noise. zoomed between -16.7% (zoomed in) and +14.5% (zoomed out), thus varying the diameter of the thin ring to be between 35 and 48 &#956;as. 3. The RIAF prior was trained on 9070 images of RIAF (A. E. <ref type="bibr">Broderick et al. 2011)</ref> simulations. The RIAF images were downloaded<ref type="foot">foot_1</ref> with all available spin and inclination parameters and resized to 32 &#215; 32. During training, they were also randomly zoomed between -16.7% and +14.5%, randomly flipped horizontally, and randomly rotated between -2&#960; and +2&#960;. 4. The CelebA prior was trained on the CelebA (Z. <ref type="bibr">Liu et al. 2015)</ref> data set of celebrity faces. We used a training set of 160K images that were resized to 32 &#215; 32. The same tapering effect that was used for CIFAR-10 was applied.</p><p>Although far from a reasonable prior for astronomical images, a prior trained on faces helps us see what happens when strong but probably incorrect assumptions are made.</p><p>The RIAF and GRMHD priors incorporate strong assumptions about a ring structure. The CIFAR-10 and CelebA priors, on the other hand, do not assume any ring structure or even the presence of an astronomical object. One might make the following order of priors from weak to strong assumptions: CIFAR-10, GRMHD, RIAF. In addition, we have the CelebA prior, which makes specific assumptions against our expectations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Simulated-data Results</head><p>We validate our imaging approach on simulated observations of synthetic source images. A crucial advantage of our approach is that it does not involve parameter tuning. As we do not need to hand-tune hyperparameters based on a calibration data set, the experiments presented in this section are simply meant to verify the efficacy of the approach.</p><p>Figure <ref type="figure">3</ref> shows results for a data set of eight source images. Among the images are validation images used in the original M87 * imaging work (Event Horizon Telescope Collaboration et al. 2019d) and two images of an elliptical object used in the Sgr A * imaging work (E. H. T. <ref type="bibr">Collaboration et al. 2022)</ref>. All observations were simulated based on the April 6 observing array using code provided by Event Horizon Telescope <ref type="bibr">Collaboration et al. (2019d)</ref>. We used closure phases and log closure amplitudes of the combined low-band and high-band data and followed the same preprocessing steps as the ehtimaging algorithm (assuming nonclosing fractional systematic noise of 0.03), except we did not add station-dependent systematic noise since we do not need to calibrate the visibility amplitudes. Although imaging was done with a priordependent total flux and either 32 &#215; 32 pixels or 64 &#215; 64 pixels depending on the prior, we rescale images to have a total flux of 0.6 Jy and resize them to 128 &#215; 128 for visualization.</p><p>The quality of image reconstruction heavily depends on the prior. For example, the GRMHD reconstruction of GRMHD 1 appears more convincing than the GRMHD reconstruction of the Double image in Figure <ref type="figure">3</ref>. On the other hand, the RML methods used in previous EHT imaging efforts <ref type="bibr">(Event Horizon Telescope Collaboration et al. 2019d;</ref><ref type="bibr">E. H. T. Collaboration et al. 2022</ref>) achieve overall cleaner reconstructions of synthetic data. One reason for the better performance of those RML methods is that they use regularization parameters chosen based on a calibration data set that is very similar to their test images. In contrast, we consider priors that may be profoundly different from the true source image (e.g., CelebA prior applied to the Double data). Another reason is that RML methods produce a mean image that tends to be cleaner than individual posterior samples, which are shown in Figure <ref type="figure">3</ref>. We emphasize that the goal of our work is not to achieve the cleanest or most accurate reconstructions; rather, we aim to showcase the effects of different priors, even when those priors might not lead to the best reconstruction owing to mismatch with the data.</p><p>Overall, the reconstructed images make sense according to the biases of the prior. The CelebA prior introduces face-like features, especially when the ground-truth source object is fairly "flat" (e.g., the Disk and Elliptical images), and it is the only prior that leads to multimodal estimated posteriors. Images under the RIAF prior are always centered and ring-or disklike. The GRMHD prior always prefers the presence of a thin ring at the center of the image. The CIFAR-10 prior imposes weak biases and appears to assemble images from small, locally smooth patches.</p><p>Table <ref type="table">1</ref> quantifies the performance of the various priors on each source image. As in previous work <ref type="bibr">(Event Horizon Telescope Collaboration et al. 2019d;</ref><ref type="bibr">E. H. T. Collaboration et al. 2022)</ref>, we evaluate the normalized cross-correlation (NCC). Since our approach does not explicitly constrain the center of light, we use a shift-invariant NCC metric, which is computed as the maximum NCC between all shifted versions of the reconstructed image and the ground-truth image. As expected, the closer the prior is to the ground-truth image, the more accurately it recovers the ground-truth image. For example, the GRMHD prior excels at recovering the Ring, Crescent, and GRMHD images but struggles with the nonringlike images. The RIAF prior performs well on ringlike images and disklike images. The CelebA prior, unsurprisingly, performs poorly on this data set of images. The CIFAR-10 prior does best compared to the other priors at recovering the non-ringlike images (e.g., Double and Point+Elliptical). It performs generally well across all the source images, suggesting that it serves as an effective "general-purpose" prior.</p><p>Table <ref type="table">2</ref> quantifies agreement with the simulated data using the reduced &#967; 2 metric-we note that it is not a true reduced &#967; 2 since we only incorporate image pixels as degrees of freedom, but it is useful as a proxy metric of data consistency. There does not appear to be a correlation between the &#967; 2 statistics in Table <ref type="table">2</ref> and the NCC statistics in Table <ref type="table">1</ref>. The results in Table <ref type="table">2</ref> simply confirm data consistency of the reconstructed images, as &#967; 2 values are consistently less than 2 and often close to 1 (lower &#967; 2 corresponds to more data consistency, and &#967; 2 &#8776; 1 is considered a sign of a good balance between data and prior). The RIAF prior results in the highest &#967; 2 values, perhaps because it is the most constraining prior. Overall, our tests on simulated data confirm that the score-based priors impose the expected biases on the image reconstruction while allowing for reasonable data consistency.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Characterizing the Simulated-data Posteriors</head><p>In addition to evaluating single samples from the posterior (Figure <ref type="figure">3</ref>), we can assess aspects of the posterior distribution such as uncertainty and multimodality. Figure <ref type="figure">4</ref> shows the mean and pixel-wise standard deviation of posteriors under the CIFAR-10, GRMHD, and RIAF priors. We find that uncertainty decreases as the prior becomes stronger. For a weak prior like CIFAR-10, which leads to high posterior uncertainty, it can be helpful to consider the mean reconstruction instead of noisier individual samples.</p><p>The CelebA prior leads to bimodal posteriors, which are characterized in Figure <ref type="figure">5</ref>. We note that the number of modes in the estimated posterior may be due to the variational family being used; in the future, a more expressive family of distributions parameterized through a better network architecture may identify more modes. It is perhaps reassuring that a prior with such erroneous assumptions (i.e., that face statistics well describe these particular source images) is able to account for mismatches with the data through a wider posterior. For example, for the Ring, Double, and GRMHD 1 data, the posterior covers two modes, one of which accurately recovers the ground-truth image.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Biases of the CIFAR-10 Prior</head><p>While CIFAR-10 represents a "generic" natural-image prior, the data set itself still contains biases. The CIFAR-10 data set comprises upright images of animals and man-made objects, which tend to exhibit horizontal or vertical lines. As Figure <ref type="figure">6(a)</ref> shows, the average log power spectrum of CIFAR-10 images Qualitatively, the CIFAR-10 prior adds the least amount of bias, producing reasonable reconstructions of each image in this data set. The GRMHD prior strongly prefers a centered ring in the image. The RIAF prior prefers a centered ring-or disklike structure in the image. The CelebA prior struggles to recover these source images, in some cases adding face features, and it leads to the most multimodal posteriors. However, it performs decently well on certain images like the Crescent and GRMHD images. When the source image is known to be well approximated by a GRMHD or RIAF model, the more constrained GRMHD or RIAF prior may be the best choice.</p><p>has most power in the purely horizontal or vertical spatial frequencies. This preference for vertically or horizontally oriented edges results in images of objects that look somewhat rectangular instead of circular, even given measurements of a ring structure. See, for example, the CIFAR-10 reconstructions from the Crescent and GRMHD 1 data in Figures <ref type="figure">3</ref> and <ref type="figure">4</ref> or the April 10 and 11 CIFAR-10 reconstructions of M87 * in the following section. In Figure <ref type="figure">6</ref>, we demonstrate on the Crescent data how boxy artifacts can be mitigated with a prior trained on warped CIFAR-10 images or a prior trained on images with a 1/f 2 spectral distribution. By distorting CIFAR-10 images with warped random affine transforms, we expect to reduce the presence of straight lines by perturbing them to have more curvature. Alternatively, by randomly sampling from a 1/f 2 spectral distribution, we create a data set of images that follow a simplified statistical model without any preference for straight lines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">M87 * Results</head><p>We estimated image posteriors of M87 * given the EHT observations from 2017 April. For each of the four observation days, we gathered the closure phases and log closure amplitudes from the combined low-band and high-band public data. <ref type="foot">5</ref> We used the same data preprocessing and visualization steps as for the synthetic data. Note. The avg. &#177; std. dev. of the NCC metric for 128 samples from the posterior is reported (highest NCC in each row is shown in bold). In general, the closer the prior is to the ground-truth image, the closer its posterior samples are to the ground truth. For example, the Ring, Crescent, and GRMHD images are best reconstructed with the GRMHD prior, whereas non-ringlike images are poorly reconstructed with the GRMHD prior. The CIFAR-10 prior may be the best "general-purpose" prior, giving NCC values between about 0.80 and 0.95 across these images. Note. &#967; cp and &#967; logca are the &#967; 2 metrics for closure phases and log closure amplitudes, respectively. The avg. &#177; std. dev. of 128 samples from the estimated posterior is reported. Lower &#967; 2 is a sign of higher data consistency. &#967; 2 &#8776; 1 is considered an indication of a good balance between the observed data and the prior. The Disk, Elliptical, and Point+Elliptical images are the most challenging cases for these particular priors, as evidenced by the high &#967; 2 values that indicate data-fitting difficulty.</p><p>Figure <ref type="figure">7</ref> shows a posterior sample for each of the observation days, under each prior. The CIFAR-10 and CelebA priors, which incorporate no assumptions about black holes, lead to images with a ringlike structure. The CelebA reconstructions include some face-like details, especially for April 10, the day with the fewest observations and thus the day whose image is least constrained by measurements. Nonetheless, the fact that both the black-hole-agnostic priors recover ring shapes is strong evidence of a ringlike structure in the measurements. It also reveals that basic natural-image statistics shared by CIFAR-10 and CelebA may be sufficient to retrieve the ring structure under a constrained FOV.</p><p>The GRMHD and RIAF priors lead to images with a welldefined ring structure. The GRMHD prior is visually richer, encouraging a thin ring with wispy features in the bright region of the ring. The RIAF prior constrains the image according to a simplified geometric model without adding any lower-level details.</p><p>Table <ref type="table">3</ref>, which reports reduced &#967; 2 values, confirms that all the reconstructed images are consistent with the EHT measurements. Across priors and observation days, the &#967; 2 values are &lt;1.5, which is considered a good sign of fitting the measured data. As with the simulated data, the RIAF prior struggles most to fit the data (most of its &#967; 2 values are greater than 1) because it imposes the most constraining black hole model. The rest of the priors tend to result in &#967; 2 &lt; 1, which means that they are flexible enough to somewhat overfit the data and that any differences between their posteriors are likely due to the visual biases of the priors that are not constrained by the data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Characterizing the Posteriors</head><p>Our imaging approach leads to single-mode M87 * image posteriors, except for some CelebA posteriors that are bimodal. Figure <ref type="figure">8</ref> shows Gaussian fits of the posteriors estimated under the CIFAR-10, GRMHD, and RIAF priors. Under all these  priors, the mean image shows a clear progression of the bright spot of the ring moving counterclockwise over the four observation days. The pixel-wise standard deviation shows areas of uncertainty around the mean. Similarly to our results with simulated data, the results in Figure <ref type="figure">8</ref> show that as the prior becomes stronger (i.e., from CIFAR-10 to GRMHD to Figure <ref type="figure">6</ref>. Reducing the CIFAR-10 bias toward horizontal lines and vertical lines. For example, given simulated data of the Crescent image from Figure <ref type="figure">3</ref>, this bias results in posterior images containing a boxy object, even though the actual Crescent is a ring without any straight edges. This is likely due to the statistics of the CIFAR-10 data set, which contains mostly man-made objects and animals. Such images include many sharp corners and lines arising from boxy objects like cars, the legs of standing animals, or horizon lines. Although CIFAR-10 is our choice of a generic natural-image prior, it is possible to define an alternative natural-image prior with less preference for vertically or horizontally oriented edges. We tested a prior trained on CIFAR-10 images that underwent warped random affine transforms and a prior trained on 32 &#215; 32 images with power spectral density proportional to 1/f 2 (since the power spectra of natural images have been shown to follow a 1/f &#945; trend, where f is the spatial frequency in cycles per image (A. van der Schaaf &amp; J. van Hateren 1996), and &#945; is typically between 1 and 3). The CIFAR-10, Warped CIFAR-10, and 1/f 2 priors were all trained with the same number of training images (45K), with the same tapering effect described in Section 3.1, and for the same number of iterations (100K). (a) Nine training samples and the average log power spectrum of 10K training samples (without the taper) for each prior. The average spectral power of CIFAR-10 images is relatively high in the horizontal and vertical frequencies. Warping the CIFAR-10 images seems to reduce the preference for vertical frequencies, but a large presence of horizontal frequencies remains (perhaps in part because horizons are still distinguishable after warping, as can be seen in some of the training samples). The 1/f 2 noise images have isotropically distributed spectral power. (b) Results of imaging the Crescent simulated data under the different priors. A posterior sample and the average of 128 posterior samples are shown for each prior. With the regular CIFAR-10 prior, there is a sharp corner at the top right of the ringlike object that makes it look squarish. This artifact is not present in the Warped CIFAR-10 reconstructions, which more resemble a smooth ring. The 1/f 2 prior, which does not have a preference for any frequency orientation, also recovers an object that resembles a smooth ring. Note. &#967; cp and &#967; logca are the &#967; 2 metrics for closure phases and log closure amplitudes, respectively. The avg. &#177; std. dev. of 128 samples from the estimated posterior is reported. &#967; 2 &#8776; 1 indicates a good balance between fitting the observed data and fitting the prior. Lower &#967; 2 means more data consistency. The RIAF prior leads to the highest &#967; 2 values, meaning that the strength of this prior relative to the data is strongest. This is probably because the RIAF model is the most constraining of the priors.</p><p>RIAF), the uncertainty goes down. Under the CIFAR-10 and GRMHD priors there is uncertainty throughout the ring, whereas under the RIAF prior uncertainty lies mainly along the edges of the ring. Figure <ref type="figure">9</ref> visualizes the bimodal posteriors under the CelebA prior. Like with CIFAR-10, the magnitude of the uncertainty is higher than that of the GRMHD and RIAF priors. The presence of multiple modes further reflects higher uncertainty under the CelebA prior, which makes sense for a prior that is so mismatched with the observed data.</p><p>A feature that stands out in the CIFAR-10, GRMHD, and CelebA reconstructions is a southwest region of extended flux outside the ring. It is especially noticeable in the April 5 and 6 images. See, for example, the faint spot of brightness in the CIFAR-10 reconstructions and the faint wisp to the southwest in the GRMHD reconstructions in Figures <ref type="figure">7</ref> and <ref type="figure">8</ref>. A disconnected southwest region of brightness also appears in both modes of the CelebA posterior on April 5 and in the second mode of the April 6 posterior in Figure <ref type="figure">9</ref>. Such a feature is not visible in the RIAF reconstructions. With previous imaging results that only incorporated one prior, it would have been difficult to conclude whether this feature was an artifact of imaging or a clue from the data. Our results in Figures <ref type="figure">8</ref> and <ref type="figure">9</ref> suggest that it is a prior-dependent feature, with different priors placing different amounts of brightness in that southwest region.</p><p>To summarize our findings from the estimated M87 * posteriors, the most notable result is that all priors recover ringlike structure. The priors that do not assume a black hole (i.e., CIFAR-10 and CelebA) exhibit the most uncertainty in the posterior and are most flexible with adding flux outside of the ring. The priors based on a black hole model (i.e., GRMHD A random posterior image is shown for each observation day and prior. The CIFAR-10 prior was trained on images of everyday objects and makes no assumptions of black hole structure, yet it recovers a ringlike structure for all four observation days. The GRMHD prior assumes a fluid-flow model of black holes, which helps it recover visually striking images of a thin ring with wisps. The RIAF prior assumes a simplified crescent model of black holes, which results in simplified crescent images of M87 * . The CelebA prior was trained on images of human faces, so its preferred images are presumably far away from the true source image. Even with its incorrect and strong biases, the CelebA prior recovers a ringlike structure, here favoring an eye from the face prior to explain the ring. These images under various priors all fit the EHT observations but incorporate different visual biases. and RIAF) reconstruct the clearest rings, with the GRMHD prior providing the most visual detail. In general, the more constraining the prior, the less uncertainty there is in the posterior, but the more potential there is to overfit to prior assumptions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Extracted Ring Features</head><p>An important stage of analysis is to extract ring features from reconstructed images, distilling any ring structure into a few parameters dictating its geometry and brightness profile. In Event Horizon Telescope <ref type="bibr">Collaboration et al. (2019d</ref>, the EHT Collaboration extracted, among other quantities, the following characteristic ring features: diameter, width, orientation, asymmetry, and fractional central brightness. It found the ring diameter and orientation angle to be most consistent across imaging methods, with the other quantities varying depending on the imaging pipeline. In this paper, we focus on these   equations provided in Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019d)</ref>. Appendix E contains the exact equations we used to compute these features. Figure <ref type="figure">10</ref> helps visualize some of the features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Ring Features of Simulated-data Reconstructions</head><p>We performed feature extraction on images from simulated data, focusing on the source images with ringlike structure: Crescent, Ring, GRMHD 1, and GRMHD 2. We analyzed the recovered posteriors under all the score-based priors but excluded the non-ringlike samples from the CelebA posteriors for the Ring and GRMHD 1 data (see Figure <ref type="figure">3</ref>, which shows that the second mode of the Ring posterior and the first mode of the GRMHD 1 posterior do not have a ringlike structure).</p><p>Figure <ref type="figure">11</ref> shows all the extracted features and their error bars. In the following paragraphs, we discuss each feature and its dependence (or lack thereof) on the prior.   <ref type="figure">12</ref>. * eht-imaging is not exactly comparable to the score-based priors because it is a different imaging algorithm that uses hand-crafted regularizers instead of a data-driven prior. There are small differences from the values for eht-imaging reported in Table <ref type="table">7</ref> of Event Horizon Telescope <ref type="bibr">Collaboration et al. (2019d)</ref>. This may be due to differences in implementation/hardware and in the exact definitions of the features (see Appendix E for details).</p><p>Diameter. All priors recover the diameter of the source image (within one standard deviation from the mean extracted diameter). The only exception is that the RIAF prior leads to a larger diameter for GRMHD 1. A possible explanation is that the RIAF model is too strong of a prior for these data, as it must account for all the flux in the image with a thick crescent (as evidenced by the relatively high extracted diameter and width of the RIAF-reconstructed images). The RIAF prior also leads to the highest &#967; 2 values (Table <ref type="table">2</ref>), further evidence that it has difficulty fitting the GRMHD 1 observations. On the other hand, the RIAF prior correctly recovers the diameter and has lower &#967; 2 values for the Ring and Crescent, two objects that can be well approximated with an RIAF model. Even though the diameter should be well constrained by the measurements, our results demonstrate how a strong-enough prior may recover a ring structure but with an incorrect diameter.</p><p>The CIFAR-10 prior gives the most accurate mean diameter, although with greater uncertainty than the GRMHD and RIAF priors. This makes sense, as the CIFAR-10 prior, in making the weakest assumptions, is most flexible with the image reconstruction. The CelebA prior exhibits significantly high uncertainty for GRMHD 2, perhaps because its recovered images have the least ringlike structure (Figure <ref type="figure">3</ref>). The GRMHD prior accurately recovers the diameter across all these ringlike data and with relatively little uncertainty.</p><p>Width. The ring width varies significantly with the prior, which supports previous findings that the width is less well constrained by the measurements than the diameter is (Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019d</ref>. The GRMHD prior recovers the thinnest rings as a result of being trained on GRMHD images that exhibit thin rings.</p><p>Orientation. All priors recover the orientation angle of the source image, except the CelebA prior given GRMHD 2 data. Like the high diameter uncertainty, this may be due to the reconstructed images having relatively weak ring structure. As can be seen in the two samples from this posterior in Figure <ref type="figure">3</ref>, there is actually some brightness outside of the ringlike area. These results further highlight that strong and incorrect assumptions in the prior may inhibit correct recovery of ring features that should be constrained by the observations. Besides this extreme case of applying a CelebA prior to GRMHD data, the ring orientation appears to be independent of the prior.</p><p>Asymmetry. The brightness asymmetry appears to be fairly robust to the prior. Again, the exception is the CelebA prior applied to the GRMHD 2 data, which can be explained by the weak ring structure present in the CelebA-recovered images. We note that for the Ring data, all priors produce a slight asymmetry even though the ground-truth object has no asymmetry, which is unavoidable with most imaging algorithms <ref type="bibr">(Event Horizon Telescope Collaboration et al. 2019d)</ref>.</p><p>Fractional central brightness. This feature varies extremely with the prior, and the original M87 * imaging work also found that f C is not well constrained by the data (Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019d</ref>). The GRMHD prior recovers the lowest fractional central brightness for the GRMHD observations. Along with smaller ring widths, these low f C values indicate that using a well-matched GRMHD prior on GRMHD observations gives the benefit of sharper images than could be obtained with a weaker prior like CIFAR-10.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.1.">Discussion</head><p>To summarize our results on simulated data, we find that the recovered ring diameter, orientation angle, and asymmetry are fairly robust to the image prior, while the width and fractional central brightness are tied to the prior. However, it is possible to overly bias the diameter with an overly biased prior. In particular, the RIAF prior applied to observations of a GRMHD simulation with flux extending beyond the ring may be too constraining and cause overestimation of the diameter. Reassuringly, priors that are less constraining or more accurate dependably recover the diameter, orientation, and asymmetry. On the other hand, features like the ring width and fractional central brightness can be adjusted by imposing different priors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.">Ring Features of M87 * Reconstructions</head><p>Figure <ref type="figure">12</ref> and Table <ref type="table">4</ref> show the results of feature extraction for the M87 * image reconstructions. Table <ref type="table">4</ref> includes the results of using the eht-imaging algorithm with fiducial parameters (Event Horizon Telescope Collaboration et al. 2019d) for reference, although eht-imaging is not directly comparable to our score-based priors since it utilizes handcrafted regularizers as a proxy for a prior and visibility amplitudes instead of closure amplitudes. Figure <ref type="figure">10</ref> visualizes the ring features on April 6 reconstructions. We assumed ring structure in all the posterior samples under the different scorebased priors. As shown in Figures <ref type="figure">7</ref> and <ref type="figure">9</ref>, the CelebA images have the weakest ring structure, which may have caused higher variance in the extracted features.</p><p>The CIFAR-10 prior recovers a mean diameter of 41.2-42.5 &#956;as across the four observations days. Accounting for error bars, all score-based priors agree on a range of possible diameters. The diameters recovered by our scorebased priors are consistent with the diameter recovered by eht-imaging, although with slightly higher means and larger error bars. Like with the simulated data, the RIAF prior  <ref type="figure">11</ref>. The different priors (CIFAR, GRMHD, RIAF, CelebA) all agree on the diameter, orientation, and asymmetry up to error bars. There is some disagreement in asymmetry for April 10, which is the day with fewest observations. We note a slight upward trend in diameter and orientation angle over the observation days. The width and fractional central brightness change with the prior, with the GRMHD prior providing the thinnest rings and most brightness contrast. Please see Figure <ref type="figure">7</ref> for image samples. leads to relatively high diameters. Combined with the fact that the RIAF prior has the highest &#967; 2 values in Table <ref type="table">3</ref>, this suggests that the RIAF model is likely too constraining for M87 * .</p><p>The ring width depends on the prior, with the GRMHD prior resulting in the lowest width (about 9 &#956;as). The RIAF prior causes the largest width (about 20 &#956;as), which is only slightly larger than the width recovered by eht-imaging (about 16 &#956;as, as listed in Table <ref type="table">4</ref>).</p><p>The orientation angle is consistent across the priors, roughly ranging from 150&#176;on April 5 to 170&#176;on April 11. Like previous work, we find that both the diameter and orientation angle have an upward trend over the observation days as brightness shifts to the southwest (Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019d</ref>.</p><p>The amount of asymmetry is also roughly consistent across the priors. Interestingly, there is most discrepancy between the priors in the April 10 reconstructions. April 10 is the day with fewest observations, which may cause the brightness asymmetry to be more flexible under the data.</p><p>As expected, the fractional central brightness varies significantly with the prior. As with the simulated data, the GRMHD prior achieves the greatest brightness contrast.</p><p>Overall, the conclusions regarding the effects of priors on the M87 * reconstructions are consistent with the conclusions of previous analyses (Event Horizon Telescope <ref type="bibr">Collaboration et al. 2019d</ref> and our simulated-data analysis. The diameter, orientation, and asymmetry are robust to the prior, although some priors result in greater variability than others. There is a slight upward bias in diameter from the RIAF prior, due to wider rings that account for all the flux in one RIAF model. The most prior-dependent features are the width and fractional central brightness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Discussion</head><p>We have presented a strategy for probing the influence of different priors on imaging from EHT VLBI data. Our imaging approach allows one to infer a diverse collection of image posteriors, each incorporating different assumptions. From these results, one can analyze the effect of the prior on the visual quality and uncertainty of reconstructed images, as well as on the recovered ring features. We applied this strategy to EHT observations of M87 * and found that a GRMHD prior is most effective at recovering a visually detailed image with a thin ring and sharp contrast. However, it exhibits less posterior uncertainty than some other priors do, meaning that one should only trust the GRMHD reconstructions if one believes in the existence of an underlying compact ring structure. On the other hand, it is possible to allow for more flexibility in the posterior with a weaker prior. For example, a CIFAR-10 prior still recovers a ring with similar diameter, orientation, and brightness asymmetry, albeit with blurrier quality. Our proposed strategy allows scientists to reliably interpret reconstructed images and account for the role of prior assumptions in their analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix A Flux Constraint and Data Annealing</head><p>As mentioned in Section 3, we make two additions to a vanilla VI approach: a flux constraint objective and a dataweight annealing schedule. Figure <ref type="figure">13</ref> demonstrates the effect of each addition through an ablation study. A flux constraint imposes the assumption that all posterior images have the same total flux, which reduces the complexity of the posterior distribution. Data annealing helps prevent falling into bad local minima when optimizing the variational distribution (Equation ( <ref type="formula">4</ref>)), since the magnitude of the data loss is much higher than the magnitude of the prior loss at the beginning of optimization. As Figure <ref type="figure">13</ref> shows, both additions help make optimization more stable.  <ref type="figure">3</ref>. The posterior samples are shown with their original number of pixels (i.e., 32 &#215; 32 or 64 &#215; 64) and with their original pixel values (which should be between 0 and 1). "None" is vanilla optimization without data-weight annealing or a flux constraint objective. Without a flux constraint, the total flux V(x) of samples varies widely; furthermore, a posterior that is unconstrained in the total flux may be too complicated to model with VI. Without data annealing, optimization may become unstable, as is the case for the GRMHD prior here. Optimization is generally more stable when both data annealing and a flux constraint are used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.1. Flux Constraint</head><p>Recall that a flux constraint is imposed through the following objective term in Equation (4):</p><p>where V(x) is the total flux of the image x and V is the target total flux. The flux constraint should be considered an additional prior on top of the score-based prior. It imposes the assumption of a constant total flux across the posterior (since closure quantities in the log likelihood do not constrain total flux). The value of V also influences the posterior: a lower V encourages sparser, more compact images than a higher V . Figure <ref type="figure">14</ref> demonstrates this when using the CelebA prior on simulated data of the Crescent image.</p><p>Our approach is to set V according to the preferred total flux of the score-based prior. That is, we set it as the median total flux of 512 samples from the prior. This results in &#175;= V 38, 173, 90, and 112 for the CIFAR-10, GRMHD, RIAF, and CelebA priors, respectively. Images can be scaled to the actual target flux Vorig measured by the zero-baseline (e.g., </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2. Data Annealing</head><p>The variational distribution is randomly initialized to solve the VI optimization problem (Equation ( <ref type="formula">4</ref>)), which means that the data loss, or negative log likelihood, is quite high at the beginning of optimization. This can cause convergence to posterior samples that are not preferred by the score-based prior (see the GRMHD example in Figure <ref type="figure">13</ref> when only a flux constraint is used). To avoid poor local minima, we initialize the weight of the data loss at 0 and then gradually anneal it to 1 with the following annealing schedule:</p><p>is the optimization step, r &#228; [0, 1] is the annealing rate, and &#206; + &#61530; s is the pivot step at which &#955; data (i) = 0.5. We use r = 0.002 and s = 12000. Therefore, strictly speaking, our optimization approach is to iteratively apply the gradient of the following loss function with respect to the RealNVP parameters f at each step i:</p><p>Since we run optimization for 100K steps, &#955; data (i) is mostly equal to 1 except at the very beginning of optimization.</p><p>x q data cp logca 2 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix C Score-based Priors</head><p>To obtain a score-based prior, we train a score-based diffusion model with parameters &#952; on samples from the target prior p data . Once trained, the diffusion model can sample from a learned distribution p &#952; &#8776; p data . In this appendix, we provide technical background on score-based diffusion models and how to evaluate p &#952; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.1. Score-based Diffusion Models</head><p>A score-based diffusion model learns to gradually denoise samples from the tractable distribution ( ) p = &#61518; I 0, into samples from a clean image distribution p data . Denoising is treated as a continuous-time process such that an image is a time-dependent variable x(t) for t &#228; [0, T]. A higher t corresponds to higher independent Gaussian noise in x(t).</p><p>The following stochastic differential equation (SDE) dictates how x(t) is "diffused" as t goes from 0 to T:</p><p>is the drift coefficient that controls the deterministic evolution of x(t), and ( ) &#206; &#61522; g t is the diffusion coefficient that controls the rate of noise increase in x(t). (Note that f( &#8226; , t) here is distinct from the forward model f( &#8226; ) in Section 2.1.) The stochastic trajectory { ( )} = x t t T 0 gives rise to a time-dependent marginal probability distribution p t . In addition, f( &#8226; , t) and g(t) are constructed such that if p 0 = p data then p T &#8776; &#960;.</p><p>A diffusion model learns to reverse the diffusion process. One can sample from the complicated distribution p data by first sampling from the tractable &#960; distribution and then using the diffusion model to transform those samples into clean images from p data . The reverse of the diffusion process is given by the following reverse-time SDE:</p><p>x t g t p x t g t w t T d , l o g d d , 0 , . C2 x t 2 The gradient ( ) &#61649; x p log x t is known as the Stein score (C. Stein 1972) of x under p t , and it helps bring noisy images closer to the clean image distribution. It is also the component in Equation (C2) that is learned by the diffusion model. A convolutional neural network with parameters &#952;, known as the score model s &#952; , is trained to approximate the true score such that ( ) ( ) &#187; &#61649; q s x x t p , l o g x t . The score model appears in the following learned approximation of Equation (C2), which is used to sample from an approximation of p data : [ ( ) ( ) ( )] ( ) &#175;[ ] ( ) = -+ &#206; q x f x s x t g t t t g t t T w d , , d d , 0 , . C3 2</p><p>For x(T) &#8764; &#960;, we denote the time-dependent marginal distribution of x(t) according to Equation (C3) as pt . We denote the diffusion model prior as &#8788; q p p 0 . For a well-trained score model, we have that p &#952; &#8776; p data .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2. Image Probabilities under a Score-based Diffusion Model</head><p>To use p &#952; as a prior in an inference algorithm that optimizes the posterior log density, we need access to the function (&#8226;)</p><p>. Computing the probability of an image x under p &#952; requires inverting x(0) = x to x(T) (i.e., we need to find the x(T) that would result in x(0) through the reverse diffusion defined by Equation (C3)). However, although we can use Equation (C3) to sample from p &#952; , the presence of Brownian motion makes the sampling function not invertible. As a result, there is no tractable way to compute ( ) q x p log</p><p>. Instead, we can appeal to an ordinary differential equation (ODE) for tractable log probabilities or to the ELBO as an efficient proxy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2.1. Computing Probabilities with an ODE</head><p>The following probability flow ODE defines a generative image distribution q p ODE theoretically equal to p &#952; :</p><p>g t t t d d , 1 2 , : , . C4 2 Like the reverse-time SDE, this ODE can be used to sample ( ) ~q x p 0 ODE by starting with x(T) &#8764; &#960;. Furthermore, an ODE can be solved in both directions of time, making the sampling function invertible. To compute ( ) q x p log ODE</p><p>for an image x, we map x(0) = x to x(T) by solving the ODE in the forward time direction. The log probability is given by the log probability of x(T) under &#960; (which is tractable to evaluate) plus a normalization factor accounting for the change in log density throughout the ODE: Equation (C5) is tractable to evaluate with an ODE solver, but it is computationally expensive. It can be prohibitively expensive when the image is large or when used in an iterative optimization algorithm.</p><p>C.2.2. Surrogate Probabilities with an Evidence Lower Bound Y. Song et al. (2021a) derived the ELBO b &#952; of a score-based diffusion model such that ( ) ( )</p><p>for any x. The lower bound, which is similar to the denoising-based training objective of diffusion models, is given by</p><p>t p p g t t , , l o g log 2 , . x x x x x p t t 0 2 2 0 2 2 2 t 0 &#61520; &#61520; &#61520; &#61520; Here, ( | ) &#162; x x p t 0 denotes the transition distribution of ( ) = &#162; x x t given x(0) = x under the diffusion SDE (Equation (C1)). In denoising diffusion models, which use a linear ( ( )) ( ( )) &#732;( ( ) ) ( ) ( ) &#242; p = + &#61649;&#8901; = q q x x f x x x p T t t t log 0 log , d , 0 . C5 T ODE 0 drift coefficient f( &#8226; , t), this transition distribution is Gaussian: ( | ) ( ( ) ( ) ) a b &#162; = &#162; &#61518; x x x x I p t t ; , t 0 2 . This in turn makes ( ) ( ) &#242; x g t h t t , d T 0 2 similar to a denoising scorematching objective. In fact, Equation (C6) is known as the "denoising score-matching loss" in the original theoretical work (Y. Song et al. 2021a). To see the correspondence to denoising, note that for ( | ) ( ( ) ( ) ) a b &#162; = &#162; &#61518; x x x x I p t t ; , t 0 2 we can sample &#162; x as &#945; t x + &#946; t z for ( ) ~&#61518; z I 0, . Then the gradient ( | ) &#61649; &#162; &#162; x x p log x t 0 in Equation (C7) is given by</p><p>The first term in Equation (C7), (</p><p>&#61520; , thus can be thought of as the denoising error of the score model. A lower error corresponds to a higher ELBO according to Equation (C6). The remaining two terms in Equation (C7) are normalizing factors independent of &#952;. Loosely speaking, the ELBO indicates how well the diffusion model can denoise the given image: an image with high probability under the diffusion model is easy to denoise, whereas a low-probability image is difficult to denoise.</p><p>The quantity b &#952; (x) is efficient to compute by adding Gaussian noise to x and seeing how well the score model estimates the noise. We Monte Carlo approximate the expectation over ( | ) &#162; x x p T 0 with N z noise samples, and we Monte Carlo approximate the time integral in Equation (C6) with importance sampling of N t time samples. In practice, N z = N t = 1 is sufficient for our imaging algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix D Interferometric Data Products</head><p>In this appendix, we provide background on the data products obtained with VLBI. In VLBI, a network of radio telescopes collects spatial frequency measurements of the sky's image. We denote the source image as I(x, y), where (x,y) are 2D spatial coordinates. Each pair of telescopes is called a baseline and provides a Fourier measurement called a visibility. The van Cittert-Zernike Theorem (P. H. <ref type="bibr">van Cittert 1934;</ref><ref type="bibr">F. Zernike 1938t)</ref> states that the ideal visibility v ij * measured by the baseline b ij between telescopes i and j is a single (u,v) measurement on the complex 2D Fourier plane (A. R. <ref type="bibr">Thompson et al. 2017)</ref></p><p>y e x y : , , d d . D1 ij xu yv 2 i (Here i is used to denote the imaginary unit to avoid confusion with the telescope index i.) The coordinates (u,v) (measured in wavelengths) are the projected baseline orthogonal to the line of sight. An array of N s telescopes, or stations, has &#9115; &#9117; &#9118; &#9120; N 2 s independent baselines, each providing a visibility at each point in time. In practice, ideal visibilities are corrupted owing to multiple factors: (1) baseline-dependent thermal noise, (2) stationdependent gain errors, and (3) station-dependent phase errors. Baseline-dependent thermal noise is modeled as a Gaussian random variable ( ) e s ~&#61518; 0, ij ij 2</p><p>, where &#963; ij is based on the system equivalent flux density (SEFD) of each telescope:</p><p>The station-dependent gain error g i arises from each telescope i using its own time-dependent 2 &#215; 2 Jones matrix (J. <ref type="bibr">Hamaker et al. 1996)</ref>. The station-dependent phase error f i arises from atmospheric turbulence that causes light to travel at different velocities toward each telescope (G. I. <ref type="bibr">Taylor 1938;</ref><ref type="bibr">R. Hinder 1970;</ref><ref type="bibr">A. N. Kolmogorov 1991)</ref>. Other sources of corruption, including polarization leakage and bandpass errors, may introduce baseline-dependent errors, but they are slow-varying and assumed to be removable with a priori calibration (A. A. <ref type="bibr">Chael et al. 2018</ref>). The measured visibility of baseline b ij can be written as</p><p>* where all systematic errors (i.e., those besides thermal noise) are wrapped into station-dependent gain/phase errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.1. Closure Quantities</head><p>Station-dependent errors are difficult to remove owing to the absence of corroborating information from other stations. Calibrating the measured visibilities calls for an iterative self-calibration process that introduces many a priori assumptions and becomes infeasible at high telescope frequencies (A. A. <ref type="bibr">Chael et al. 2018</ref>). An alternative avenue is to use closure quantities that are unchanged by stationdependent errors.</p><p>Closure phases (R. Jennison 1958) are robust to stationdependent phase errors. They arise from a data product known as the complex bispectrum, which is formed by multiplying the three baselines within a triangle of telescopes i, j, k:</p><p>ijk ij jk ki ijk 2 * * * where &#949; ijk is the thermal noise in the measured bispectrum. Importantly, Equation (D3) does not include any phase errors. Thus the closure phase is given by the phase of the bispectrum and is robust to phase corruption. While the total number of triplets in the telescope array is &#9115; &#9117; &#9118; &#9120; N 3 s , the total number of linearly independent closure phases is</p><p>. To understand this number, see that the set of independent closure phases can be formed by selecting one station as a reference and then creating the set of all triangles that contain that station.</p><p>( ) ( ) ( ) ( ) ( ) ( ) ( ) e e e = + + + f f f f f f --v v v g g e v g g e v g g e v D3 ij jk ki i j i ij ij j k i jk jk k i i ki ki i j j k k j * * *</p><p>The resulting closure phases are independent in that no one closure phase can be formed as a linear combination of other closure phases. Closure amplitudes (R. <ref type="bibr">Twiss et al. 1960)</ref> address the issue of station-dependent gain errors. A closure amplitude arises from a combination of four telescopes i, j, k, &#8467;: .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Appendix E Ring Feature Extraction</head><p>We used the REx feature extraction algorithm (A. A. Chael 2019) to compute the characteristic features in Section 6. The diameter and fractional central brightness follow the same formulae as in Event Horizon Telescope <ref type="bibr">Collaboration et al. (2019d)</ref>. Except for the fractional central brightness, which is not included in REx, all features were computed exactly according to the latest implementation of REx<ref type="foot">foot_3</ref> (as of 2023 October). The REx implementation corresponds to slightly different equations for computing features than those given in Event Horizon Telescope <ref type="bibr">Collaboration et al. (2019d)</ref>. In this appendix, we will note any differences from the equations used in Event Horizon Telescope <ref type="bibr">Collaboration et al. (2019d)</ref>.</p><p>REx first preprocesses the image by blurring it with a 2 &#956;as FWHM Gaussian and regridding it to 160 &#215; 160 pixels. It identifies the ring center based on the image thresholded to 5% of the peak brightness, and then it computes characteristic features. The characteristic features, which we define in the following paragraphs, are all computed based on radialangular profiles I(r, &#952;) of the centered image, where I(r, &#952;) is the brightness at radius r and azimuthal angle &#952; from the measured center. The profiles are interpolated over the domains r &#228; [0, 50] &#956;as and &#952; &#228; [0, 2&#960;] rad.</p><p>The diameter d is measured as twice the mean radial distance of the peak brightness: We define the fractional central brightness f C (which is not included in REx) as the ratio of an "interior" mean flux to the mean flux along the ring:</p><p>, 2, . E 5 C r 0,5 , 0,2 0,2</p><p>Here the inside is defined as the inner disk of radius 5 &#956;as, and the outside is defined as the region with radius larger than the measured radius d/2.</p><p>There is no uncertainty quantification for f C . This definition of f C exactly agrees with Equation (23) in Event Horizon Telescope Collaboration et al. (2019d).</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>The Astrophysical Journal, 975:201 (22pp), 2024 November 10 Feng, Bouman, &amp; Freeman</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1"><p>http://vlbiimaging.csail.mit.edu/myData</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2"><p>https://datacommons.cyverse.org/browse/iplant/home/shared/commons_ repo/curated/EHTC_FirstM87Results_Apr2019</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3"><p>https://github.com/achael/eht-imaging/blob/main/ehtim/features/rex.py</p></note>
		</body>
		</text>
</TEI>
