skip to main content


This content will become publicly available on May 2, 2024

Title: Autocorrelation analysis for cryo-EM with sparsity constraints: Improved sample complexity and projection-based algorithms
The number of noisy images required for molecular reconstruction in single-particle cryoelectron microscopy (cryo-EM) is governed by the autocorrelations of the observed, randomly oriented, noisy projection images. In this work, we consider the effect of imposing sparsity priors on the molecule. We use techniques from signal processing, optimization, and applied algebraic geometry to obtain theoretical and computational contributions for this challenging nonlinear inverse problem with sparsity constraints. We prove that molecular structures modeled as sums of Gaussians are uniquely determined by the second-order autocorrelation of their projection images, implying that the sample complexity is proportional to the square of the variance of the noise. This theory improves upon the nonsparse case, where the third-order autocorrelation is required for uniformly oriented particle images and the sample complexity scales with the cube of the noise variance. Furthermore, we build a computational framework to reconstruct molecular structures which are sparse in the wavelet basis. This method combines the sparse representation for the molecule with projection-based techniques used for phase retrieval in X-ray crystallography.  more » « less
Award ID(s):
2009753
NSF-PAR ID:
10417197
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
120
Issue:
18
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In single-molecule super-resolution microscopy, engineered point-spread functions (PSFs) are designed to efficiently encode new molecular properties, such as 3D orientation, into complex spatial features captured by a camera. To fully benefit from their optimality, algorithms must estimate multi-dimensional parameters such as molecular position and orientation in the presence of PSF overlap and model-experiment mismatches. Here, we present a novel joint sparse deconvolution algorithm based on the decomposition of fluorescence images into six basis images that characterize molecular orientation. The proposed algorithm exploits a group-sparsity structure across these basis images and applies a pooling strategy on corresponding spatial features for robust simultaneous estimates of the number, brightness, 2D position, and 3D orientation of fluorescent molecules. We demonstrate this method by imaging DNA transiently labeled with the intercalating dye YOYO-1. Imaging the position and orientation of each molecule reveals orientational order and disorder within DNA with nanoscale spatial precision. 
    more » « less
  2. Biological samples are radiation-sensitive and require imaging under low-dose conditions to minimize damage. As a result, images contain a high level of noise and exhibit signal-to-noise ratios that are typically significantly smaller than 1. Averaging techniques, either implicit or explicit, are used to overcome the limitations imposed by the high level of noise. Averaging of 2D images showing the same molecule in the same orientation results in highly significant projections. A high-resolution structure can be obtained by combining the information from many single-particle images to determine a 3D structure. Similarly, averaging of multiple copies of macromolecular assembly subvolumes extracted from tomographic reconstructions can lead to a virtually noise-free high-resolution structure. Cross-correlation methods are often used in the alignment and classification steps of averaging processes for both 2D images and 3D volumes. However, the high noise level can bias alignment and certain classification results. While other approaches may be implicitly affected, sensitivity to noise is most apparent in multireference alignments, 3D reference-based projection alignments and projection-based volume alignments. Here, the influence of the image signal-to-noise ratio on the value of the cross-correlation coefficient is analyzed and a method for compensating for this effect is provided. 
    more » « less
  3. In this thesis we propose novel estimation techniques for localization and planning problems, which are key challenges in long-term autonomy. We take inspiration in our methods from non-parametric estimation and use tools such as kernel density estimation, non-linear least-squares optimization, binary masking, and random sampling. We show that these methods, by avoiding explicit parametric models, outperform existing methods that use them. Despite the seeming differences between localization and planning, we demonstrate in this thesis that the problems share core structural similarities. When real or simulation-sampled measurements are expensive, noisy, or high variance, non-parametric estimation techniques give higher-quality results in less time. We first address two localization problems. In order to permit localization with a set of ad hoc-placed radios, we propose an ultra-wideband (UWB) graph realization system to localize the radios. Our system achieves high accuracy and robustness by using kernel density estimation for measurement probability densities, by explicitly modeling antenna delays, and by optimizing this combination with a non-linear least squares formulation. Next, in order to then support robotic navigation, we present a flexible system for simultaneous localization and mapping (SLAM) that combines elements from both traditional dense metric SLAM and topological SLAM, using a binary "masking function" to focus attention. This masking function controls which lidar scans are available for loop closures. We provide several masking functions based on approximate topological class detectors. We then examine planning problems in the final chapter and in the appendix. In order to plan with uncertainty around multiple dynamic agents, we describe Monte-Carlo Policy-Tree Decision Making (MCPTDM), a framework for efficiently computing policies in partially-observable, stochastic, continuous problems. MCPTDM composes a sequence of simpler closed-loop policies and uses marginal action costs and particle repetition to improve cost estimates and sample efficiency by reducing variance. Finally, in the appendix we explore Learned Similarity Monte-Carlo Planning (LSMCP), where we seek to enhance the sample efficiency of partially observable Monte Carlo tree search-based planning by taking advantage of similarities in the final outcomes of similar states and actions. We train a multilayer perceptron to learn a similarity function which we then use to enhance value estimates in the planning. Collectively, we show in this thesis that non-parametric methods promote long-term autonomy by reducing error and increasing robustness across multiple domains. 
    more » « less
  4. Principal Components Analysis (PCA) is a dimension-reduction technique widely used in machine learning and statistics. However, due to the dependence of the principal components on all the dimensions, the components are notoriously hard to interpret. Therefore, a variant known as sparse PCA is often preferred. Sparse PCA learns principal components of the data but enforces that such components must be sparse. This has applications in diverse fields such as computational biology and image processing. To learn sparse principal components, it’s well known that standard PCA will not work, especially in high dimensions, and therefore algorithms for sparse PCA are often studied as a separate endeavor. Various algorithms have been proposed for Sparse PCA over the years, but given how fundamental it is for applications in science, the limits of efficient algorithms are only partially understood. In this work, we study the limits of the powerful Sum of Squares (SoS) family of algorithms for Sparse PCA. SoS algorithms have recently revolutionized robust statistics, leading to breakthrough algorithms for long-standing open problems in machine learning, such as optimally learning mixtures of gaussians, robust clustering, robust regression, etc. Moreover, it is believed to be the optimal robust algorithm for many statistical problems. Therefore, for sparse PCA, it’s plausible that it can beat simpler algorithms such as diagonal thresholding that have been traditionally used. In this work, we show that this is not the case, by exhibiting strong tradeoffs between the number of samples required, the sparsity and the ambient dimension, for which SoS algorithms, even if allowed sub-exponential time, will fail to optimally recover the component. Our results are complemented by known algorithms in literature, thereby painting an almost complete picture of the behavior of efficient algorithms for sparse PCA. Since SoS algorithms encapsulate many algorithmic techniques such as spectral or statistical query algorithms, this solidifies the message that known algorithms are optimal for sparse PCA. Moreover, our techniques are strong enough to obtain similar tradeoffs for Tensor PCA, another important higher order variant of PCA with applications in topic modeling, video processing, etc. 
    more » « less
  5. We consider the multi-target detection problem of estimating a two-dimensional target image from a large noisy measurement image that contains many randomly rotated and translated copies of the target image. Motivated by single-particle cryo-electron microscopy, we focus on the low signal-to-noise regime, where it is difficult to estimate the locations and orientations of the target images in the measurement. Our approach uses autocorrelation analysis to estimate rotationally and translationally invariant features of the target image. We demonstrate that, regardless of the level of noise, our technique can be used to recover the target image when the measurement is sufficiently large.

     
    more » « less