<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Diffusion models in bioinformatics and computational biology</title></titleStmt>
			<publicationStmt>
				<publisher>Springer Nature</publisher>
				<date>02/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10496833</idno>
					<idno type="doi">10.1038/s44222-023-00114-9</idno>
					<title level='j'>Nature Reviews Bioengineering</title>
<idno>2731-6092</idno>
<biblScope unit="volume">2</biblScope>
<biblScope unit="issue">2</biblScope>					

					<author>Zhiye Guo</author><author>Jian Liu</author><author>Yanli Wang</author><author>Mengrui Chen</author><author>Duolin Wang</author><author>Dong Xu</author><author>Jianlin Cheng</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Denoising difusion models embody a type of generative artifcial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three difusion modelling frameworks (denoising difusion probabilistic models, noise-conditioned scoring networks and score stochastic diferential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein3ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source difusion model tools and consider the future applications of difusion models in bioinformatics.N Ι
Key points• Difusion models are a generative artificial intelligence technology that can be applied in natural language processing, image synthesis and bioinformatics.• Difusion models have contributed greatly to computational protein design and generation, drug and small-molecule design, protein3ligand interaction modelling, cryo-electron microscopy data enhancement and single-cell data analysis.• Many difusion models are also available as open-source tools.• Although difusion models may potentially outperform other generative approaches, such as generative adversarial networks and variational auto-encoders, their computational resource requirements remain high. Conditioned Score SDE Variant EGNN CODE Cryo-EM data analysis CryoDRGN 98 Conditioned Score SDE VAE CODE Single-cell image and gene-expression data analysis DISPR 99]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>probabilities through deep generative models with up to thousands of layers or time steps as well as the computing of conditional and posterior probabilities under the learned model. Based on this concept, denoising diffusion probabilistic models (DDPMs) <ref type="bibr">33</ref> can achieve performance comparable to or better than other generative models (for example, decoder, energy-based models and GANs) <ref type="bibr">46,</ref><ref type="bibr">94396</ref> in image generation tasks. The diffusion network structure and training strategy can further be improved to boost performance <ref type="bibr">50</ref> , surpassing GANs in image synthesis. For example, a multi-head attention mechanism and the BigGAN's residual module <ref type="bibr">95</ref> can be applied for up-sampling and down-sampling of data to improve the resolution and quality of generated images. In addition, a denoising diffusion implicit model (DDIM) <ref type="bibr">97</ref> can be used to increase sampling rate.</p><p>Importantly, diffusion models can be applied in bioinformatics, for example, for denoising cryo-electron microscopy (cryo-EM) data <ref type="bibr">98</ref> , single-cell gene-expression analysis <ref type="bibr">99,</ref><ref type="bibr">100</ref> , protein design and generation <ref type="bibr">84,</ref><ref type="bibr">91,</ref><ref type="bibr">1013107</ref> , drug and small-molecule design <ref type="bibr">54,</ref><ref type="bibr">1083113</ref> and protein3 ligand interaction modelling <ref type="bibr">1143118</ref> . Diffusion models have the advantage of being able to handle high-dimensional data with high diversity and scalability.</p><p>In this Review, we provide a detailed survey of diffusion models, including denoising diffusion models, noise-conditioned score networks (NCSNs) and stochastic differential equations (SDEs), and discuss their applications in bioinformatics. We further highlight possible future developments of diffusion models, aiming to propose some challenging bioinformatics problems that may be tackled by creative diffusion models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The concept of diffusion models</head><p>Diffusion models learn to reverse the process of data destruction or corruption (for example, introduced by noise), allowing the generation of realistic, clean data samples (for example, restoration of uncorrupted data). Thus, diffusion models can learn from data that has been progressively destroyed or degraded to generate new samples from a given distribution or to estimate the distribution from which a given sample is drawn (Box 2).</p><p>Diffusion models are based mainly on three frameworks, each with a different formulation of the forward and reverse processes (Fig. <ref type="figure">2</ref>), that is, DDPMs <ref type="bibr">32,</ref><ref type="bibr">33</ref> , NCSNs <ref type="bibr">34,</ref><ref type="bibr">119</ref> and score SDEs <ref type="bibr">35,</ref><ref type="bibr">120</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Denoising diffusion probabilistic models</head><p>DDPMs, which were the first diffusion models able to generate highresolution data, typically contain two Markov chains (Box 2): the forward chain gradually adds noise to scramble the original data, followed by a reverse chain that removes the noise from the data to recover the original data. If q x ( ) 0 denotes the distribution of the original data, in which x 0 denotes uncorrupted data, the transition kernel q x x ( | ) t t-1 of the forward Markov process adding Gaussian perturbation at time t is denoted x &#946; x &#946; ( ; 1 -, ),</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>N</head><p>&#921; in which t T {1, &#8230;, }. Here T represents the number of diffusion steps; &#946; [0,1) t is the hyperparameter denoting the variance schedule across diffusion steps; &#921; is the identity matrix; and N(x;&#181;,&#963;) is the normal distribution of x with mean &#181; and covariance &#963;.</p><p>, a noisy sample x t can be obtained directly from the distribution conditioned on the original input x 0 :</p><p>t t t 0</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Deep learning <ref type="bibr">1</ref> was introduced to the field of bioinformatics and computational biology in 2012 (ref.</p><p>2) (Box 1) and has been applied to many bioinformatics problems, such as protein structure prediction <ref type="bibr">3</ref> , protein function prediction <ref type="bibr">439</ref> , protein3ligand interaction prediction 10314 , gene-expression prediction <ref type="bibr">15320</ref> and gene regulatory network modelling <ref type="bibr">21325</ref> . Various deep learning architectures, including convolutional neural networks <ref type="bibr">26</ref> , long short-term memory networks <ref type="bibr">27</ref> , residual networks <ref type="bibr">28</ref> , generative adversarial networks (GAN) <ref type="bibr">29</ref> , graph neural networks (GNN) <ref type="bibr">30</ref> (Box 2) and transformers <ref type="bibr">31</ref> have been developed for bioinformatics data analysis.</p><p>Diffusion models leverage deep learning technology 32335 ; however, they outperform other deep learning methods in many domains, including in image generation <ref type="bibr">36342</ref> , image inpainting <ref type="bibr">43,</ref><ref type="bibr">44</ref> and speech synthesis <ref type="bibr">45</ref> . Diffusion models are deep learning-based generative models 32335 (Box 2) that aim to generate artificial yet realistic data (for example, a computer-generated Picasso painting or an answer to a user's question) from input parameters. Compared to other generative models, such as autoregressive models <ref type="bibr">46</ref> , normalizing flows <ref type="bibr">47</ref> , energy-based models <ref type="bibr">48</ref> , variational auto-encoders (VAEs) <ref type="bibr">49</ref> or GANs <ref type="bibr">29</ref> , diffusion-based generative models have the ability to learn complex distributions, handle high-dimensional data and generate diverse data <ref type="bibr">50355</ref> . In particular, diffusion models can surpass GANs <ref type="bibr">29</ref> , which consist of a generator that generates data and a discriminator that can differentiate the generated data, in the challenging task of image synthesis <ref type="bibr">33,</ref><ref type="bibr">50</ref> . In addition, diffusion models can be applied for computer vision <ref type="bibr">43,</ref><ref type="bibr">51,</ref><ref type="bibr">56371</ref> , natural language processing <ref type="bibr">55,</ref><ref type="bibr">72375</ref> , temporal data modelling <ref type="bibr">76381</ref> , multi-modal modelling <ref type="bibr">36,</ref><ref type="bibr">37,</ref><ref type="bibr">82,</ref><ref type="bibr">83</ref> , and in medical image reconstruction <ref type="bibr">84393</ref> .</p><p>Diffusion models were originally introduced <ref type="bibr">32</ref> to address a central problem in machine learning, that of modelling complex datasets using highly flexible families of probability distributions while ensuring that learning, sampling, inference and evaluation remain analytically or computationally tractable (Fig. <ref type="figure">1</ref>). Inspired by non-equilibrium statistical physics, this approach systematically and slowly destroys the structure of data through an iterative forward diffusion process. Then, a reverse diffusion process is applied to restore the structure in the data, yielding a highly flexible and tractable generative model of the data, thereby enabling rapid learning, data sampling and evaluating</p><p>The forward process gradually introduces noise into the original data until it is completely replaced by noise. The reverse process is the opposite operation, resulting in the generation of new samples. This process typically starts with unstructured noise obeying the prior distribution, and then, by applying a model 4 typically a trainable neural network 4 that has learning ability, noise is removed step by step to restore the original data. The neural network N can be formulated as:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>N</head><p>Given the starting point data of the reverse process as</p><p>, the distribution of X 0 conditioned on X T is given by:</p><p>Eventually, a marginal distribution of X 0 close to the original data x 0 can be obtained by</p><p>1: . To train the model parameterized with &#952; so that it can learn the pattern of the original data and make p x ( ) 0 close to the true data distribution q x ( ) 0 , the loss function to be minimized is set as the negative log-likelihood (equation ( <ref type="formula">5</ref>)). We note that the process of minimizing the negative log-likelihood of the observed data under the model is equivalent to minimizing the Kullback3Leibler (KL) divergence between the empirical distribution defined by the original data &#8943; q x x x ( , , , )</p><p>and the model distribution</p><p>The objective of DDPM training is to minimize L VLB , also known as the variational lower bound of the log-likelihood. L VLB can also be parameterized to increase the quality of sample generation <ref type="bibr">33</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Noise-conditioned score networks</head><p>In NCSNs, the score function of a probability density function p x ( ) is represented by the gradient of the log density with respect to the input as p x &#8711; log ( ) x . To learn and estimate the score function, a score-matching neural network s &#952; is trained. The goal of this neural network is to make s x p x ( ) &#8776; &#8711; log ( )</p><p>. Therefore, the objective function of the scoring network can be defined as:</p><p>Even though the problem is well defined, optimizing equation ( <ref type="formula">6</ref>) is numerically impossible because the value of p x &#8711; log ( )</p><p>x cannot be known. However, score functions can be learned from data by applying score matching <ref type="bibr">121</ref> , denoising score matching 1223124 or sliced score matching <ref type="bibr">125</ref> .</p><p>Moreover, training remains difficult because the trained score functions are unreliable in low-dimensional manifold, because low-dimensional data is typically embedded in a high-dimensional space (the manifold hypothesis) <ref type="bibr">34</ref> . This challenge can be addressed by introducing Gaussian noise to the data at various scales, which improves the data distribution's suitability for score-based generative modelling. Thus, a single NCSN can be applied to estimate the score corresponding to each noise level. If</p><p>is a sequence of Gaussian noise levels,</p><p>). The NCSN s x &#963; ( , ) &#952; t with the denoising score matching can then approximate the gradient log density function, making</p><p>. And for x t , p x &#8711; log( ( ))</p><p>x &#963; t is derived as:</p><p>Consequently, the optimization objective function in equation ( <ref type="formula">6</ref>) can be transformed into: . The Langevin method recursively computes x i as follows:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Deep learning</head><p>Deep learning is a machine learning technology that applies artificial neural networks with many layers of neurons (hence, 8deep9) to model and extract complex patterns in data. Deep learning can then learn patterns and features from complex data to perform intelligent tasks, such as speech and image recognition, natural language processing and protein structure prediction. The artificial neurons in each layer receive input from the neurons in the previous layers until the final output layer produces a prediction (for example, classifying an image into a category or generating a sentence of text). During training of a deep learning model, the weights associated with the connections between neurons are adjusted to fit the training data. A major advantage of deep learning models over other machine learning methods is their ability to automatically learn hierarchical representations from raw data through multiple layers of abstraction. This enables deep learning models to achieve high prediction accuracy in many domains, such as precision medicine and healthcare (for example, medical image segmentation <ref type="bibr">237,</ref><ref type="bibr">2583261</ref> and disease diagnosis 2623265 ), finance (for example, algorithmic trading <ref type="bibr">266,</ref><ref type="bibr">267</ref> and risk management <ref type="bibr">268</ref> ) and agriculture (for example, crop monitoring <ref type="bibr">269,</ref><ref type="bibr">270</ref> and pest detection <ref type="bibr">271</ref> ). Some notable applications of deep learning are ChatGPT <ref type="bibr">272</ref> for natural language processing, DALL-E-2 (ref. 83) and GLIDE <ref type="bibr">273</ref> for image generation, and AlphaFold2 (ref. 163) for protein structure prediction.</p><p>where &#947; determines the amplitude of the update in the score's direction; x 0 is sampled from the prior distribution; and the noise is drawn according to &#921; &#969; ~(0, i N ). NCSNs and DDPMs both operate on the principle of converting a basic noise distribution into a more intricate data distribution by collecting information during the introduction of noise, which is then reapplied when removing the noise. Both models are trained to tackle a noise regression problem, based on the principle of maximum likelihood estimation. Notably, the objective formulation of score matching with Langevin dynamics in NCSN aligns with that of the re-weighted variant of the evidence lower bound of DDPM <ref type="bibr">35,</ref><ref type="bibr">126,</ref><ref type="bibr">127</ref> . In terms of sample generation, both models employ ancestral sampling, which progressively transforms a noise sample into a data sample, guided by data distribution gradients.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Score stochastic differential equations</head><p>With unlimited time steps or noise levels, DDPMs and NCSNs can be further generalized to a situation in which the perturbation and denoising processes can be described as SDEs. This generalized approach 35 of gradually transforming data into noise is called score SDE. The forward process of score SDE uses SDEs and requires an estimated score function of the noisy data distribution. It is equivalent to the It&#244; SDE <ref type="bibr">128</ref> solution, which consists of a drift component for mean transformation and a diffusion coefficient for describing noise:</p><p>where w represents the standard Wiener process known as Brownian motion, and f x t ( , ) and g t ( ) are the drift and diffusion coefficients of SDE, respectively. The forward process in DDPMs and score-based generative models is a special case of the discretizational SDE.</p><p>The formulation of the reverse diffusion process of SDE is given by equation (11)  . The objective function can be defined as:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Key concepts relevant to difusion models</head><p>Difusion: the movement of molecules, atoms, ions or energy from a region of higher concentration to a region of lower concentration along a concentration gradient until the concentration becomes equal in both regions. Difusion, which is driven by a gradient in Gibbs free energy or chemical potential, is a stochastic process owing to the inherent randomness in the movement of the difusing entities.</p><p>Generative model: a type of machine learning model that aims at learning the underlying distribution of data to generate new, similar data. These models can approximate the joint probability distribution of input features and labels, if available, and generate new data points by sampling from the learned distribution.</p><p>Markov chain: a stochastic model that describes a sequence of possible states, in which the probability of a state depends (or is conditioned) only on its previous state.</p><p>Markov chain Monte Carlo: a statistical or computational simulation method that constructs a Markov chain to iteratively generate a sequence of samples according to a conditional probability distribution between two consecutive states. After running the Markov chain for enough iterations, the generated samples converge to the desired posterior distribution.</p><p>Graph neural network (GNN) <ref type="bibr">30</ref> : a type of deep learning model for processing graph-structured data (for example, molecular graphs and biological networks). Each node in a GNN receives messages from its neighbouring nodes, which are used to update its hidden representation. By iteratively updating node representations, the GNN can aggregate information from both the local neighbourhood and remotely connected nodes in the graph.</p><p>Equivariant GNN <ref type="bibr">161</ref> : a special type of GNN that is equivariant to a transformation (for example, translation and rotation) in the input data (for example, of a three-dimensional object, such as a protein structure). For example, the translation of an object in the input space leads to the translation of the same output of the object generated by the equivariant GNN in the output space without changing the value of the output. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SE(3)-equivariant networks</head><p>-transformer <ref type="bibr">198</ref> : a specific implementation of SE(3)-equivariant networks using the transformer9s self-attention mechanism to achieve SE(3) symmetry, including three-dimensional rotations and translations. The SE(3)-transformer is particularly useful for tasks involving three-dimensional structures, such as protein structure prediction and protein design, where diferent (x, y, z) coordinates of the same protein structure appearing in diferent orientations and positions can be treated as the exact same object.</p><p>where t T ~([0, ]) U denotes the uniform distribution over T [0, ] and &#955; is a weighting function. In addition, several sampling techniques, such as the predictor3corrector sampler, can be employed to generate good samples. This procedure uses a score-based method (that is, annealed Langevin dynamics) as a corrector after using a numerical approach to sample data from the reverse-time SDE.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Improving diffusion models</head><p>The aforementioned diffusion models can be further improved through extension in training speed <ref type="bibr">126,</ref><ref type="bibr">1303133</ref> , increasing data sampling (data generation) speed <ref type="bibr">97,</ref><ref type="bibr">1343139</ref> , integration with other neural networks <ref type="bibr">38,</ref><ref type="bibr">120,</ref><ref type="bibr">1403142</ref> , and applications to different data types <ref type="bibr">53,</ref><ref type="bibr">73,</ref><ref type="bibr">1433151</ref> . Many of these improvement strategies are available as open-source tools 152 (Table <ref type="table">1</ref>), which has opened up their application to a diverse range of bioinformatics problems (Box 3). Importantly, diffusion models can handle different data types, such as one-dimensional (1D) DNA and protein sequences, two-dimensional (2D) biomedical images, three-dimensional (3D) protein structures and vectorized gene-expression data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Protein design and generation</head><p>The computational generation of new, physically foldable protein structures allows the design of proteins with specific functions or structural properties for protein engineering and drug discovery. However, deep generative models (Box 2), such as VAEs and GANs 1533159 , are limited to generating only small proteins or domains of large proteins (for example, of immunoglobulins). Alternatively, diffusion models can be applied to protein design and generation, because large and diverse proteins can be generated by guiding the model at each step of the iterative generation process.</p><p>Protein structures in protein generation <ref type="bibr">153,</ref><ref type="bibr">154</ref> are typically described by a 2D matrix (map) that contains the pairwise distances and angles between all the residues in the protein. For example, ProteinSGM, based on a score-based generative model <ref type="bibr">91</ref> , applies a diffusion model of 2D image generation using such a representation to create protein structures: a score-based generation diffusion model with SDEs is used to generate a series of 2D matrices that include inter-residue pairwise distances d, and the &#969;, &#952; and &#966; angles between two residues. These constraints are then fed into Rosetta <ref type="bibr">160</ref> to build native-like protein structures. For unconditional protein structure generation, ProteinSGM can generate proteins from random noise. For conditional protein structure generation, such as scaffold inpainting and functional site inpainting, the tool can generate protein structures that satisfy user-defined constraints, similar to solving an image inpainting problem. However, ProteinSGM requires post-processing by Rosetta using Markov Chain Monte Carlo (Box 2), which makes the prediction computationally expensive.</p><p>Unlike ProteinSGM, Foldingdiff 101 represents the protein backbone structures (only N3Ca3C atoms for each residue) with a series of consecutive angles to capture the relative orientation of the constituent atom acid residues. A simple language transformer model <ref type="bibr">31</ref> with DDPM can then be applied to generate protein structures unconditionally, as the angles are invariant to translation and rotation. However, using a transformer to predict sequence-like consecutive angles has the drawback that errors from the early prediction accumulate and considerably affect the final structure, including collisions between atoms. In addition, the approach cannot be generalized to generate complex structures with more than one chain.</p><p>Inspired by Foldingdiff, DiffSDS 102 introduces a 1D directional representation derived from invariant atom features, similar to torsion angle representation, which enables an encoder3decoder language model to perform the diffusion process. In the language model, the encoder (with a hidden atom-direction-space layer) transforms the invariant features into equivalent direction vectors, whereas the decoder reverses the transformation. By performing the diffusion process in this direction and by conditioning angle spaces on geometric restraints, DiffSDS can restore protein backbone structures of higher quality than the deep-learning-based protein design method RFDesign 156 : DiffSDS is two times better at generating proteins that resemble natural proteins (protein likeness), as measured by Rosetta energies, about 18 times better in terms of connectivity errors and 60% better at generating non-overlapping scores with existing backbones than RFDesign.</p><p>The integration of diffusion models with GNNs 30 (Box 2) enables the direct generation of 3D protein coordinates, resulting in an end-to-end generative model. SE(3)-equivariant <ref type="bibr">161,</ref><ref type="bibr">162</ref> (Box 2) DDPMs, which are usually used in small-molecule generation, can also be applied to generate protein structures in a representation-frameindependent manner <ref type="bibr">163</ref> . For example, independent DDPM models equipped with invariant point attention <ref type="bibr">163</ref> structural modules can be trained with the distribution of atom features (for example, coordinates in a canonical frame with respect to backbone atoms, residue type and side-chain angles) to generate a protein's backbone, sequence and side-chain rotamers <ref type="bibr">84</ref> . By jointly diffusing the structure and sequence, while incorporating coarse structural constraints, the model can gradually generate the fully atomistic protein structure and sequence, allowing controllable protein backbone generation and protein structure inpainting. The sequence recovery rate of this method is comparable to that of other machine-learning-based and physics-based methods, such as 3DConv <ref type="bibr">164</ref> , RosettaFixBB and RosettaReIBB <ref type="bibr">165</ref> . Similarly, Genie 103 makes use of the SE(3)-equivariant feature from the invariant point attention module in conjunction with DDPM to generate protein backbones unconditionally, also introducing geometric asymmetry with an invariant encoder to directly inject noise into residue coordinates, as well as an SE(3) equivariant decoder with an invariant point attention module to predict noise.</p><p>SMCDiff <ref type="bibr">104</ref> applies a similar deep learning architecture (that is, an SE(3)-equivariant GNN) (Box 2) to the motif-scaffolding generation problem, dividing the problem into two parts: unconditional protein backbone generation (ProtDiff) and conditional sampling in diffusion models based on a protein motif (SMCDiff), similar to inpainting. Unconditional protein generation is achieved by training a SE(3)-equivariant GNN (Box 2), built from residue coordinates and embedded features from the protein sequence, to generate protein backbones. By contrast, conditional sampling is formulated on an unconditional diffusion model as a sequential Monte Carlo simulation problem, which may be solved by particle filtering. However, the network does not include torsion angles as features and may therefore generate unnatural proteins (for example, left-handed helices). SMCDiff was the first deep generative model that leveraged the power of diffusion models to address the motif-scaffolding generation problem.</p><p>RFdiffusion <ref type="bibr">105</ref> , which integrates a conditional DDPM diffusion model with the pre-trained protein 3D structure prediction model RoseTTAFold <ref type="bibr">166</ref> , can directly generate final 3D coordinates. Inspired by the recycling process in AlphaFold2, a self-conditioning prediction strategy is applied, in which the current prediction is conditioned on the prediction from the previous timesteps, thereby considerably improving the performance of the model. Starting from random noise, First di usion probabilistic models <ref type="bibr">32</ref> First-score-based graph generative models <ref type="bibr">151</ref> Denoising di usion implicit models <ref type="bibr">97</ref> Score-based generative modelling through SDE <ref type="bibr">35</ref> Improved denoising di usion probabilistic models <ref type="bibr">130</ref> Discrete denoising di usion model <ref type="bibr">72</ref> Analytic DPM: improve log likelihood, speed up DPM <ref type="bibr">138</ref> Di usion distillation for rapid sampling <ref type="bibr">133</ref> FoldingDi : single-chain protein structure generation <ref type="bibr">101</ref> Di SDS: protein backbone structure inpainting <ref type="bibr">102</ref> Chroma <ref type="bibr">107</ref> and Rfdi usion <ref type="bibr">105</ref> : protein complex generation DPM-solver: a fast ODE solver for DPM sampling in 10 steps <ref type="bibr">136</ref> EDM: small-molecule generation <ref type="bibr">54</ref> DGSM: predict 3D conformations from 2D graphs <ref type="bibr">110</ref> Di BP: ligand generation with known protein pocket <ref type="bibr">114</ref> Di SBDD: protein-ligand docking with known pocket <ref type="bibr">115</ref> High-resolution image synthesis with latent di usion models 36 ProteinSGM: protein structure generation <ref type="bibr">91</ref> ProSSDG: protein structure and sequence generation <ref type="bibr">84</ref> SMCDi : motif sca olding <ref type="bibr">104</ref> SDEGen: molecule generation from conformation and graph <ref type="bibr">111</ref> NeuralPLexer: protein-ligand structure prediction <ref type="bibr">117</ref> Di Bridge: molecule structure generation <ref type="bibr">109</ref> Di Dock: protein-ligand structure prediction <ref type="bibr">116</ref> Di Linker: molecule linkers generation <ref type="bibr">113</ref> Genie: protein backbone structure generation <ref type="bibr">103</ref> NERE: protein-ligand a inity prediction <ref type="bibr">118</ref> CDGS: molecular graph generation <ref type="bibr">108</ref> DreamFusion: text to 3D using 2D di usion <ref type="bibr">256</ref> Di MD: molecular dynamics simulation <ref type="bibr">112</ref> Video di usion models <ref type="bibr">61</ref> Enhancing DPM sample quality with self-attention <ref type="bibr">255</ref> DISPR: single-cell image reconstruction <ref type="bibr">99</ref> Analog bits: generating discrete data using di usion models <ref type="bibr">150</ref> DPM-solver++: fast solver for guided sampling of DPM <ref type="bibr">139</ref>  CryoDRGN: cryo-EM data reconstruction <ref type="bibr">236</ref> FrameDi : protein backbone generation <ref type="bibr">106</ref> RFdiffusion can generate large protein structures unconditionally, which can then be used in the design of protein monomers. Using protein motif coordinates as input, RFdiffusion can also construct scaffolds conditionally for functional motif and enzyme active site scaffolding <ref type="bibr">105</ref> . Given a point group symmetry, RFdiffusion can maintain the symmetry during the prediction owing to the equivariance design of RoseTTAFold. Therefore, this approach can be applied to symmetric protein oligomer and motif scaffolding (for example, for the design of therapeutic <ref type="bibr">167</ref> and metal-binding proteins <ref type="bibr">168,</ref><ref type="bibr">169</ref> ). We note that compared to the other methods discussed in this section, some proteins designed by RFdiffusion have not only been validated in silico, but also by biochemical and biophysical experiments <ref type="bibr">170,</ref><ref type="bibr">171</ref> , making it one of the first generative artificial intelligence methods of protein design that have been experimentally validated. Furthermore, RFdiffusion outperforms other methods, such as RFDesign, in the design of large protein structures and high-order protein oligomers, demonstrating the advantage of diffusion models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Di usion equation development</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Optimization of di usion equations</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Tipping point of di usion applications</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Applications in bioinformatics</head><p>FrameDiff <ref type="bibr">106</ref> applies diffusion models to explore whether a pre-trained protein structure predictor is necessary for protein backbone generation. Here, using denoising score matching, a principled SE(3) diffusion model can better formulate the protein backbone generation problem, achieving comparable performance with four-fold fewer network weights and without the need to train another protein structure prediction network, compared to RFdiffusion.</p><p>Chroma <ref type="bibr">107</ref> is a GNN <ref type="bibr">30</ref> -based conditional diffusion model designed to generate large single-chain proteins and protein complexes with desired properties and functions. This model can generate protein structures that are over 3,000 residues in size, which surpasses the size limit for proteins generated by several other networks (that is, Protein-SGM, Foldingdiff, DiffSDS and SMCDiff) (&lt;2,000 residues). To reduce computational complexity, Chroma uses a random graph generation procedure that preserves both short-and long-range interactions. As a result, Chroma can produce high-quality, diverse new protein structures, and enables the programmable generation of proteins that are conditioned on several different properties, such as residue3residue distances, symmetry and shape.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Small-molecule generation and drug design</head><p>Drug discovery involves the identification and optimization of small molecules that can interact with specific biological targets, such as enzymes or receptors, to modulate their activity and ultimately achieve a therapeutic effect. Deep learning, particularly deep generative models, enables the rapid generation and evaluation of a large number of such potential drug candidates <ref type="bibr">1723175</ref> .</p><p>The conditional diffusion model, which is a deep learning method based on discrete graph structures (CDGS), allows the generation of molecular graphs of small molecules with similar data distributions to real-number molecular graphs <ref type="bibr">108</ref> . This method employs a hybrid message-passing block architecture, which comprises a standard message-passing layer for collecting local features, such as node-edge dependencies, and an attention-based message-passing layer for extracting and transmitting global information in the architecture. The molecular graphs are embedded with distinct components for node features and edge matrices, with channels for edge existence and edge types. The CDGS model has enabled the application of diffusion models in the molecular graph domain, which is crucial for drug discovery and material science. This approach accurately models the complex dependency between graph structures and features during the generative process, using SDEs to describe the graph diffusion process. The continuous forward process is applied directly to edge existence variables, and the reverse process first decodes discrete graph structures, which serve as the condition for each sampling step. A specialized hybrid graph noise prediction model is used to extract global and local node-edge dependencies from intermediate graph states. This diffusion-based model can obtain high-fidelity samples in 200 steps of network evaluations using the Euler3Maruyama method <ref type="bibr">176</ref> . In addition, a fast ordinary differential equation solver, which applies the semi-linear structure of probability flow ordinary differential equations for graphs, promotes rapid, high-quality graph sampling. CDGS outperforms other methods in molecular graph generation, including flow-based methods (for example, GraphAF 177 , GraphDF <ref type="bibr">178</ref> , MoFlow <ref type="bibr">179</ref> and GraphCNF <ref type="bibr">180</ref> ) and other diffusion models (EDP-GNN <ref type="bibr">151</ref> , GraphEBM <ref type="bibr">181</ref> and GDSS <ref type="bibr">148</ref> ). CDGS also performs better in generic graph generation than ER <ref type="bibr">182</ref> , VGAE <ref type="bibr">183</ref> , GpraphRNN <ref type="bibr">184</ref> and GRAN <ref type="bibr">185</ref> , demonstrating its potential to facilitate drug discovery and material design by representing molecular structures and restricting the molecule search space.</p><p>The E(3)-equivariant diffusion model (EDM) <ref type="bibr">54</ref> (Box 2) performs the diffusion process on atom coordinates and atom types in the Euclidean space to generate small molecule structures with up to 29 atoms, compared to nine heavy atoms that can be achieved with equivariant normalizing flows <ref type="bibr">186</ref> . An EDM represents each small molecule as a point cloud that can be described by a graph with nodes v V &#8712; i representing atoms in the molecule based on an equivalent transformation, thereby combining the equivariant GNN and the diffusion process. The former contains L layers of equivariant graph convolutional layers that take each atom's 3D coordinates and features as input to model molecule structures with geometric symmetries, whereas the latter gradually adds Gaussian noise to both the coordinates and features of the atom, thereby improving training, performance and scalability, compared to other E(3)-equivariant models, such as G-Schnet <ref type="bibr">187</ref> and equivariant normalizing flows <ref type="bibr">186</ref> as well as graph-based molecule-generative models, such as GraphVAE <ref type="bibr">188</ref> , GraphTransformer <ref type="bibr">189</ref> and Set2GraphVAE <ref type="bibr">190</ref> .</p><p>Based on the equivariant GNN architecture and inspired by the physics governing the formation of small molecules, the Lyapunov function applies physical and statistics prior information (diffusion informative prior bridge) <ref type="bibr">109</ref> to guide the diffusion process in model training and generate high-quality and realistic molecules. In this approach, problem-dependent prior information, in particular, physical and statistics information, is injected into the diffusion process instead of imposing or improving deep learning architectures. Several energy functions, integrated with the physical and statistical prior information, are then used as a prior bridge to guide the model training without any extra modification of the equivariant GNN architecture. Thereby, the Lyapunov function shows better molecule-generation performance in terms of physical energy and molecule stability <ref type="bibr">109</ref> and better uniformity-promoted 3D point cloud generation compared to EDM <ref type="bibr">54</ref> and point cloud diffusion <ref type="bibr">143</ref> , which apply the traditional Gaussian noise in model training, as well as equivariant normalizing flows <ref type="bibr">186</ref> .</p><p>Dynamic graph score matching (DGSM) <ref type="bibr">110</ref> is a deep learning model developed for predicting stable 3D conformations from 2D molecular graphs, primarily used in computational chemistry. The model can also be extended to protein sidechain conformation prediction and complex multi-molecular prediction (for example, predicting the interaction of more than three small molecules without explicit bonds) <ref type="bibr">110</ref> . Deep learning methods often consider only the local interactions between bonded atoms, while neglecting the long-range interactions among unbound atoms, which are crucial for constructing accurate 3D molecular structures. To overcome this limitation, DGSM treats each molecule as a graph = &lt; , &gt; g v e , where a node in v represents an atom and its features (for example, coordinates), and an edge in e represents a bond between two atoms. The distance D ij between each pair of atoms, that is, the edge length in the graph, can then be computed from their coordinates. For each pair of unbound atoms, the distance D ij can be perturbed by a Gaussian noise level at each training step. A message passing neural network <ref type="bibr">191</ref> is then applied, using edge length and edge type in the graph as inputs to dynamically embed the molecular 2D graph by adding Gaussian noise to the distance between pairs of unbound atoms. Using the score-matching method, the model can then directly estimate the gradient fields of the logarithm density of atomic coordinates. Importantly, the model can be trained in an end-to-end fashion, thereby addressing the limitation of physics-based simulation methods that do not account for long-range interactions between non-bounded atoms. Thus, DGSM outperforms other methods, including RDKit 192 , CGCF <ref type="bibr">193</ref> and ConfGF 144 in terms of matching score and coverage score, confirming the benefit of modelling long-range interactions.</p><p>SDEGen 111 is a multi-stage diffusion model that can generate molecules by adopting multiple architectures in different stages with different purposes; here, molecular conformations, including distances between two atoms within three-hop edges, edge type and atom type, and their corresponding graphs, are used as inputs for three different multilayer perceptrons to generate their embeddings. The distance embeddings are corrupted by Gaussian noise and the atom-type embeddings are then updated by a GNN (Box 2). The noisy distance embeddings, edge-type embeddings and the updated atom-type embeddings are then combined into final bond embeddings. Finally, the SDE network is parameterized. This multi-stage model is not as streamlined as end-to-end models, but it outperforms several other models, including DGSM <ref type="bibr">194</ref> , CGCF <ref type="bibr">193</ref> , ConfGF <ref type="bibr">144</ref> , CVGAE <ref type="bibr">195</ref> and DMCG <ref type="bibr">196</ref> , by multiple metrics, such as coverage score and matching score, in particular, when considering long-range interactions in molecules.</p><p>DiffMD <ref type="bibr">112</ref> is a score-based denoise diffusion model that can be applied to improve molecular dynamics simulations. Deep-learningbased molecular dynamics models typically depend on intermediate force fields and can thus only be applied to static molecules, not considering thermodynamics. DiffMD addresses this problem by applying score-based conditional diffusion models, employing the equivariant geometric transformer to take atomic coordinates, velocity</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>. . &#8230;</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Noise Bioinformatics data Annealed Langevin dynamics</head><p>Reverse process</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Forward process</head><p>Typical data ; w represents the standard Wiener process known as Brownian motion; f x t ( , ) and g t ( ) are the drift and diffusion coefficients of SDE, respectively; and p x ( ) is the probability density function.</p><p>and features embedded in molecular dynamics trajectories directly as input. In each layer, the model introduces velocities, directions and other geometric information using the spherical Fourier3Bessel transformation to update the input information. During the diffusion process, the conditional noise, based on the accelerations of atoms in previous frames, is added to the inputs for the equivariant geometric transformer to estimate the score function, that is, the gradient of the log density of the biomolecule conformations. DiffMD outperforms several deep-learning-based molecular dynamics methods, including tensor field networks <ref type="bibr">162</ref> , radial fields <ref type="bibr">197</ref> , SE(3)-transformers <ref type="bibr">198</ref> , graph mechanics networks <ref type="bibr">199</ref> and SCFNN <ref type="bibr">200</ref> in terms of average root-mean-squared error.</p><p>Fragment-based drug design can also be used for the discovery of new small molecules in a 3D space. Here, the aim is to design linkers consisting of atoms that can connect molecular fragments into a complete molecule. DiffLinker <ref type="bibr">113</ref> uses an E(3)-equivariant 3D conditional diffusion model to generate these molecular linkers and to connect multiple molecular fragments to form a single connected molecule. The prediction is made by applying a GNN to predict the linker size (the atom number of the linker) and atom types. The coordinates of the atoms are sampled from the normal distribution, followed by a reverse diffusion process of the atom features conditioned on the input fragments. Compared to DeLinker 201 and 3DLinker <ref type="bibr">202</ref> , DiffLinker can perform better in terms of average quantitative estimation of drug-likeness, synthetic accessibility, the average number of rings in the linker, and the validity, uniqueness and novelty of the samples, thereby generating more realistic molecules.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Protein-ligand interaction modelling</head><p>Predicting the conformation of a ligand bound to a protein is important in the investigation of protein3ligand interactions and protein function as well as for the discovery of new drugs. Various protein3ligand docking, machine learning and auto-regressive models have been developed to address this problem <ref type="bibr">10,</ref><ref type="bibr">2033206</ref> ; however, these approaches are limited by their low geometrical accuracy. Alternatively, DiffBP <ref type="bibr">114</ref> can generate ligands that bind to a specific protein pocket without requiring the ligand structure as input; here, a pre-generation network is used to generate the centre of mass and atom number of the ligand, followed by diffusion models in conjunction with equivariant GNNs <ref type="bibr">161,</ref><ref type="bibr">207</ref> to generate high-quality ligand candidates <ref type="bibr">33,</ref><ref type="bibr">35</ref> . Compared to auto-regressive methods, such as 3DSBDD <ref type="bibr">205</ref> , Pocket2Mol <ref type="bibr">206</ref> and GraphBP <ref type="bibr">203</ref> , which generate one atom at a time without considering interactions among all atoms, DiffBP can generate all atoms of a ligand that bind to a target protein, exhibiting high binding affinities (for example, 41.07% <ref type="bibr">114</ref> for DiffBP, compared to 12.22% <ref type="bibr">114</ref> for 3DSBDD, 23.98% <ref type="bibr">114</ref> for Pocket2Mol and 29.54% <ref type="bibr">114</ref> for GraphBP) on the CrossDocked 208 dataset curated from protein3ligand complex structures in the Protein Data Bank (PDB).</p><p>DiffSBDD <ref type="bibr">115</ref> adopts a DDPM equipped with an E(3)-equivariant neural network to generate new ligands, including atomic features binding to specific protein pockets; here, ligand generation can either be protein-conditioned, based on the binding site to the protein, or the ligand can be impainted after learning the joint distribution of the protein3ligand complexes. Compared to 3DSBDD and Pocket2Mol, DiffSBDD can generate more diverse ligands with higher affinity on the CrossDocked dataset <ref type="bibr">115</ref> .</p><p>Unlike diffusion models applied for protein pocket docking, DiffDock <ref type="bibr">116</ref> uses the structure of the protein and ligand as input and does not require knowledge of the location of the binding site (that is, blind docking); here, the diffusion process is applied to ligand positions, represented by ligand translation and rotation, sampling multiple positions, which are then ranked based on a confidence score using a trained scoring model and a trained confidence model, which are built on top of SE(3)-equivariant GNNs (Box 2). The scoring model samples different positions of the ligand, and the confidence model selects the ligand positions with the highest confidence score, similar to the structural and scoring modules of AlphaFold2 <ref type="bibr">163</ref> for protein structure prediction. DiffDock has been tested on the PDBBind dataset, outperforming search-based methods, such as SMINA <ref type="bibr">209</ref> , QuickVina-W 210 , GLIDE <ref type="bibr">211</ref> and GNINA <ref type="bibr">212</ref> , and the deep learning methods EquiBind <ref type="bibr">213</ref> and TANKBind <ref type="bibr">214</ref> . Specifically, DiffDock achieved a top-1 success rate of 38.2% (the percentage of top-1 predictions with root-mean-square deviation &lt;2)&#197;),</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Box 3</head><p>A practical guide for applying difusion models in bioinformatics Difusion models are particularly useful in the generation, design or analysis of small molecules, proteins and biological images.</p><p>To decide which difusion model to apply to a specific problem, the representation of the specific data type (for example, small molecules) needs to be considered to be suitable for processing by a deep learning model in the difusion process. The conformations of small molecules and drugs can be represented in several ways to facilitate the difusion process; for example, they can be treated as a string, such as the SELF-referencing embedded string, which can be converted into a two-dimensional (2D) matrix. This matrix can be used as input for graph neural networks (GNNs) under a difusion model framework to generate three-dimensional (3D) molecular graphs, as exemplified by dynamic graph score matching <ref type="bibr">110</ref> . Alternatively, they can be presented as 3D graphs that contain spatial direction and torsion angles between atoms, which can be used by a combination of SE(3)-equivariant GNNs <ref type="bibr">162</ref> and difusion models, such as the E(3)equivariant difusion model <ref type="bibr">54</ref> to capture their essential properties. In addition, small molecules can be represented as 3D atomic point clouds to be processed by equivariant GNNs, as in DifLinker <ref type="bibr">113</ref> . Proteins can be represented as either one-dimensional (1D) sequential features suitable for a 1D transformer or 2D contact and distance maps suitable for processing by convolutional neural networks. The 3D structure of proteins is usually represented as graphs that consist of nodes denoting residues and edges that represent residue pairs in contact, which can be handled by both standard GNNs and SE(3)-equivariant GNNs in combination with difusion models. For imaging data, such as cryo-electron microscopy images, various difusion models initially developed for image generation, such as CascadedDif <ref type="bibr">60</ref> , can be applied. Biomolecules or cell shapes may also be represented by 3D images, which can be reconstructed from 2D images by a combination of autoencoder or U-Net 237 with a difusion model, as in CryoDRGN <ref type="bibr">98</ref> and DISPR <ref type="bibr">99</ref> . These can model the distribution of ground-truth data to generate higher-quality 3D images than other generative artificial intelligence methods. For example, DISPR outperforms a VAE-based deep generative model SHAPR <ref type="bibr">238</ref> in the context of 3D cell shape reconstruction.</p><p>which is significantly better than the energetics-based method GLIDE (21.8%; P = 2.7)&#215;)10 27 ) and the geometric deep-learning-based method TANKBind (20.4%; P = 1.0)&#215;)10 212 ) <ref type="bibr">116</ref> .</p><p>Similar to DiffDock, NeuralPLexer 117 is a deep generative network that leverages SDEs to predict complex protein3ligand structures based on the protein structure and molecular graphs of the ligand as input in blind docking. The key component in the model is an equivariant structure diffusion module, which predicts the atomic coordinates on a heterogeneous graph formed by protein atoms, ligand atoms, protein backbone frames and ligand local frames. Using SDEs, the model can handle unbound or predicted protein structure inputs and can automatically accommodate changes in the protein structure in response to ligand binding. Compared with the deep learning method EquiBind <ref type="bibr">213</ref> and the physics-based method CB-Dock 215 on the PDBBind 216 dataset, NeuralPLexer can generate a more accurate ligand structure with higher geometrical accuracy, with an approximately 70% success rate for a ligand with root-mean-square deviation &lt;2)&#197;, which is higher than that of EquiBind (about 40%) and CB-Dock (about 38%) and has a lower steric clash rate of 0.105.</p><p>Finally, a deep generative energy-based diffusion model can predict the binding affinity for a protein3ligand pair, if trained with a set of protein3ligand complexes, without requiring labels for binding affinities <ref type="bibr">118</ref> . During training, the network first predicts the rotation score for the perturbed ligand with respect to the protein pocket using an equivariant rotation prediction network, called Neural Euler's Rotation Equation (NERE). By training the model with the SE(3) denoising score matching, the log-likelihood is considered to be the binding affinity between the protein and ligand in a pair. Tested on the protein3ligand dataset PDBbind <ref type="bibr">216</ref> and the structural antibody database SAbDab <ref type="bibr">217</ref> , the model achieves an accuracy of 0.656 in predicting protein3ligand binding affinity, which is better than that of other unsupervised methods: 0.647 for Molecular Mechanics Generalized Born Surface Area <ref type="bibr">218</ref> (MM/GBSA), 0.617 for Astex Statistical Potential <ref type="bibr">219</ref> (ASP) and 0.602 for DrugScore2018 (ref. 220). This model further performs comparably to other supervised methods in predicting antibody3antigen binding <ref type="bibr">118</ref> : Zlab RerANK 221 , ZRANK2 <ref type="bibr">222</ref> , RosettaDock <ref type="bibr">223</ref> , PyDock <ref type="bibr">224</ref> , Scoring by Intermolecular Pairwise Propensities of Exposed Residues (SIPPER) <ref type="bibr">225</ref> , Atomic Potential Protein Interactions Scored Atomically (AP_PISA) <ref type="bibr">226</ref> , Coarse Grained Protein Interaction Energy (CP_PIE) <ref type="bibr">227</ref> and FIREDOCK <ref type="bibr">228</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Cryo-electron microscopy data analysis</head><p>Single-particle cryo-electron microscopy (cryo-EM) <ref type="bibr">2293235</ref> is a key imaging technique for determining and visualizing the 3D conformation (structure) of large biomolecular complexes (for example, protein complexes) at atomic resolution; here, the images of protein complexes obtained by cryo-EM are used to reconstruct their 3D conformation represented by 3D density maps.</p><p>The protein structure reconstruction method CryoDRGN 236 introduces a latent variable Z to define a conformational space V for a protein complex on cryo-EM density maps. CryoDRGN is based on a VAE framework that learns a continuous distribution in the latent space for protein structures from cryo-EM data. However, although CryoDRGN can simulate complicated structural dynamics, the Gaussian prior distribution of VAE does not match the posterior aggregate approximation, which limits the generative capability of the model <ref type="bibr">236</ref> . Alternatively, a continuous-time diffusion model (that is, score SDEs) can be implemented in CryoDRGN to learn a high-quality generative model for capturing protein conformations directly from cryo-EM imaging data. This CryoDRGN 98 model is first trained with the standard VAE model using cryo-EM images in Fourier space. The latent space Z, which is predicted by the encoder of the trained VAE, is then fed into the denoise diffusion model based on a ResNet architecture <ref type="bibr">28</ref> to approximate the distribution of the latent variable Z. Finally, the synthesized latent variable Z, which is sampled from the diffusion model and is similar to the target protein's distribution, is used as input for the decoder of the VAE to generate protein structures with better quality (higher similarity with the target proteins' distribution) than a VAE, which directly reconstructs protein structures by learning continuous distribution in latent space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Single-cell image and gene-expression analysis</head><p>Reconstructing the 3D shape of a cell from a single-cell 2D microscopy image using computational methods is useful for studying the morphological features of cells. However, each 2D image may permit multiple 3D reconstructions, and therefore, different 2D slices may lead to different predictions of the 3D shape. To tackle this issue, DISPR <ref type="bibr">99</ref> employs the U-net architecture <ref type="bibr">237</ref> and a diffusion process to generate a single-cell 3D shape from 2D images. During training and evaluation, this approach uses a 2D image of an individual cell as an inductive bias. The 2D image is then concatenated with its 3D Gaussian noisy segmentation mask as input for the diffusion-based model to predict realistic 3D cell shapes. DISPR benefits from this training approach and its stochastic property. Unlike VAE-based architectures used in SHAPR <ref type="bibr">238</ref> and its variants <ref type="bibr">239</ref> , which produce a single, deterministic reconstruction, DISPR employs a stochastic model trained on Gaussian noise and is thus capable of predicting an infinite number of cell shapes, providing a more comprehensive representation of dynamic cell structures. DISPR represents the first use of a diffusion model in the context of 3D cell shape reconstruction, outperforming VAE-based deep generative models, such as SHAPR <ref type="bibr">238</ref> , in terms of volume, surface area and roughness reconstruction <ref type="bibr">99</ref> .</p><p>Single-cell RNA sequencing can assess the expression of genes in individual cells. However, cells typically contain low quantities of RNA, which may cause noisy measurements (for example, varied measurements and experimental bias) of gene expression; moreover, values may be missed (dropouts). Therefore, it is important to denoise single-cell RNA-sequencing data and impute missing values. DEWAKSS 100 applies a diffusion model with a K-nearest-neighbour (KNN) graph to select denoising hyperparameters using the noise2self self-supervision method, thereby not depending on an explicit noise model but on an invariant function of data features. Unlike heuristic-based methods, such as MAGIC <ref type="bibr">240</ref> and KNN-smoothing <ref type="bibr">241</ref> , which also use KNN graph architecture but may lead to over-smoothing of data variance, DEWAKSS can preserve variances across multiple gene-expression dimensions.</p><p>Open-source diffusion model tools Some diffusion models that can be applied to bioinformatics have been implemented as open-source tools (Table <ref type="table">2</ref>). However, these tools do not use NCSNs 34 as the diffusion framework, mainly because NSCNs face problems in terms of sampling and training and can thus not achieve high accuracy in image generation. Therefore, NCSNs are less adopted in bioinformatics and computational biology than DDPMs <ref type="bibr">33</ref> and score SDEs 35 , which are equipped with efficient sampling and training methods <ref type="bibr">32</ref> for high-definition image generation. Nevertheless, as the first diffusion model, NSCN has made substantial contributions to the development of the field. Furthermore, many bioinformatics applications also include deep learning components to deal with data generation and denoising challenges specific to their application.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Outlook</head><p>Diffusion models can be applied in several bioinformatics applications and may be further extended to other computational biology areas owing to their ability to denoise data and generate realistic new data (Table <ref type="table">3</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3D genomics data analysis</head><p>High-throughput chromosome conformation capture (Hi-C) is a key technology for studying 3D conformations of chromosomes and genomes, applying next-generation sequencing techniques to sequence chromosomal regions that are spatially close to each other (that is, in contact) <ref type="bibr">242</ref> . Thus, Hi-C data captures the interactions between chromosomal regions of a genome to build 3D conformations of the genome <ref type="bibr">243,</ref><ref type="bibr">244</ref> and study long-range gene-enhancer interactions. This approach typically requires the data to be converted into 2D chromosomal contact matrices (maps), which store the frequency at which chromosome region i interacts with chromosome region j, where i and j are the indices of chromosome regions. Therefore, a Hi-C contact matrix can be considered an image. However, Hi-C data, in particular, single-cell Hi-C data, are usually noisy and incomplete, so that chromosomal interactions in chromosomal contact matrices may be false positives or interactions may be missing in the matrices. Deep learning methods (for example, GANs) can be applied to denoise Hi-C data <ref type="bibr">245</ref> ; in addition, diffusion models (for example, DDPM) may enable denoising of Hi-C chromosomal contact matrices to improve 3D genome conformation modelling and to study spatial interactions between genes and regulatory elements (for example, enhancers). However, the deep learning architecture of DDPM is typically the U-Net, which may not be as powerful as the deep residual network used in the Hi-C data denoising method ScHiCEDRN <ref type="bibr">246</ref> . Thus, if applied to Hi-C data denoising, the architecture of DDPM would have to be updated to deep residual networks to improve its denoising ability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Single-cell reconstruction and inference</head><p>The activity of a single cell can be captured by various 'omics data, such as transcriptomics (RNA-seq), proteomics, chromosome accessibility (ATAC-Seq) and epigenetics (bisulfite sequencing), which may benefit from diffusion models; for example, data could be inferred to one modality (for example, RNA-Seq) from another modality, such as ATAC-Seq data and genome methylation data; missing spots in single-cell spatial transcriptomic data could be calculated; spots (each consisting of multiple cells) in 10&#215; spatial transcriptomic data could be decomposed into single cell data (super-resolution); and single-cell data could be used to build 3D models of the spatial arrangement of cells. Moreover, diffusion models designed to denoise images could also be applied to denoise single-cell 'omics data, such as transcriptomics, proteomics, metabolomics and epigenetics data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DNA regulatory element design</head><p>The expression of genes is modulated by short DNA sequences on genomes, called regulatory elements, such as enhancers and promoters. Designing regulatory elements is an important approach to designing synthetic cells using synthetic biology. Generative models, such as GANs <ref type="bibr">247</ref> , can be applied to design enhancers that regulate the expression of genes and the development of cell types. However, diffusion models have shown better performance in image synthesis than GANs <ref type="bibr">50</ref> and may thus be more suitable for the design of enhancers and other gene regulatory elements.   <ref type="bibr">118</ref> Benchmarks on protein3ligand and antibody3antigen dataset; outperforms other unsupervised methods</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Protein3ligand complex</head><p>Binding affinity of the protein3ligand structure</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ligand atom coordinates</head><p>Cryo-electron microscopy image denoising Diffusion models can reconstruct complex protein structures from 3D cryo-EM density maps, which are typically made of noisy and low-contrast 2D cryo-EM protein particle images, isolated from large 2D cryo-EM protein images (also called cryo-EM micrographs). However, denoising original cryo-EM images to build better 3D cryo-EM density maps remains challenging. Although image preprocessing techniques, such as EMAN2 (ref. 248), can denoise cryo-EM images <ref type="bibr">249</ref> , diffusion models trained on many noisy images at various noise levels and their clean counterparts may allow the recovery of clean cryo-EM images more effectively than conventional image processing techniques that have not been trained to learn the noise distribution of cryo-EM images <ref type="bibr">250</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Peptide design</head><p>Peptides, which are short, typically unfolded amino acid sequences, can bind to proteins to modulate their function, which has been explored for drug design. Diffusion models can not only be designed to generate new proteins but could also be adapted to create peptides that can modulate protein function. For example, diffusion models pretrained for protein design may be retrained on a peptide dataset to design peptides through transfer learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Protein structure refinement and mutation prediction</head><p>The tertiary and quaternary structures of many proteins and protein complexes can be fairly accurately predicted by AlphaFold2 (ref. 163)  and AlphaFold-multimer <ref type="bibr">251</ref> , respectively. However, such predicted structures may contain structural errors and may thus need to be refined. Conditioned on a predicted structure input, diffusion models may be able to remove noise from the predicted structure to bring it closer to the native structure. Similarly, it remains challenging to predict how an amino acid mutation alters the structure of a protein, which may affect protein function and phenotype. Diffusion models have capability, demonstrated in protein design, and may therefore be able to transform the known structure of a protein without mutation (wild-type protein) to the structure of the same protein with mutations (mutant) to predict structural changes induced by mutations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Limitations of current diffusion models</head><p>Although diffusion models may be applied for various bioinformatics applications, potentially outperforming GANs and VAEs, some limitations remain to be addressed. First, the training process of diffusion models involves the introduction of Gaussian noise to the data, resulting in a long training time. Second, although considerable efforts have been directed towards increasing the sampling speed in diffusion models, the sampling time of most models still exceeds that of other deep generative models (for example, GANs and VAEs). The long sampling time hinders some real-time applications of diffusion models. Developing a streamlined approach of single-step noise addition and removal may reduce the training and sampling time. Third, the computational resource requirements of diffusion models are higher than those of GANs and VAEs. Therefore, the trade-off between performance improvement and computational resource demand needs to be evaluated when deciding which model to use. Furthermore, new applications of diffusion models are often non-trivial and may require validation of suitable data representations (embeddings), types of diffusion model and deep learning architectures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Citation diversity statement</head><p>We acknowledge that papers authored by scholars from historically excluded groups are systematically under-cited. Here, we have made every attempt to reference relevant papers in a manner that is equitable in terms of racial, ethnic, gender and geographical representation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Published online: xx xx xxxx</head></div></body>
		</text>
</TEI>
