skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fast principal component analysis for cryo-electron microscopy images
Abstract Principal component analysis (PCA) plays an important role in the analysis of cryo-electron microscopy (cryo-EM) images for various tasks such as classification, denoising, compression, and ab initio modeling. We introduce a fast method for estimating a compressed representation of the 2-D covariance matrix of noisy cryo-EM projection images affected by radial point spread functions that enables fast PCA computation. Our method is based on a new algorithm for expanding images in the Fourier–Bessel basis (the harmonics on the disk), which provides a convenient way to handle the effect of the contrast transfer functions. For $ N $ images of size $$ L\times L $$ , our method has time complexity $$ O\left({NL}^3+{L}^4\right) $$ and space complexity $$ O\left({NL}^2+{L}^3\right) $$ . In contrast to previous work, these complexities are independent of the number of different contrast transfer functions of the images. We demonstrate our approach on synthetic and experimental data and show acceleration by factors of up to two orders of magnitude.  more » « less
Award ID(s):
2009753
PAR ID:
10417219
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Biological Imaging
Volume:
3
ISSN:
2633-903X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Proteins and the complexes they form are central to nearly all cellular processes. Their flexibility, expressed through a continuum of states, provides a window into their biological functions. Cryogenic electron microscopy (cryo-EM) is an ideal tool to study these dynamic states as it captures specimens in noncrystalline conditions and enables high-resolution reconstructions. However, analyzing the heterogeneous distributions of conformations from cryo-EM data is challenging. We present RECOVAR, a method for analyzing these distributions based on principal component analysis (PCA) computed using a REgularized COVARiance estimator. RECOVAR is fast, robust, interpretable, expressive, and competitive with state-of-the-art neural network methods on heterogeneous cryo-EM datasets. The regularized covariance method efficiently computes a large number of high-resolution principal components that can encode rich heterogeneous distributions of conformations and does so robustly thanks to an automatic regularization scheme. The reconstruction method based on adaptive kernel regression resolves conformational states to a higher resolution than all other tested methods on extensive independent benchmarks while remaining highly interpretable. Additionally, we exploit favorable properties of the PCA embedding to estimate the conformational density accurately. This density allows for better interpretability of the latent space by identifying stable states and low free-energy motions. Finally, we present a scheme to navigate the high-dimensional latent space by automatically identifying these low free-energy trajectories. We make the code freely available athttps://github.com/ma-gilles/recovar. 
    more » « less
  2. Memory-hard functions (MHFs) are a key cryptographic primitive underlying the design of moderately expensive password hashing algorithms and egalitarian proofs of work. Over the past few years several increasingly stringent goals for an MHF have been proposed including the requirement that the MHF have high sequential space-time (ST) complexity, parallel space-time complexity, amortized area-time (aAT) complexity and sustained space complexity. Data-Independent Memory Hard Functions (iMHFs) are of special interest in the context of password hashing as they naturally resist side-channel attacks. iMHFs can be specified using a directed acyclic graph (DAG) $$G$$ with $N=2^n$ nodes and low indegree and the complexity of the iMHF can be analyzed using a pebbling game. Recently, Alwen et al. [CCS'17] constructed an DAG called DRSample which has aAT complexity at least $$\Omega\left( N^2/\log N\right)$$. Asymptotically DRSample outperformed all prior iMHF constructions including Argon2i, winner of the password hashing competition (aAT cost $$\mathcal{O}\left(N^{1.767}\right)$$), though the constants in these bounds are poorly understood. We show that the the greedy pebbling strategy of Boneh et al. [ASIACRYPT'16] is particularly effective against DRSample e.g., the aAT cost is $$\mathcal{O}\left( N^2/\log N\right)$$. In fact, our empirical analysis {\em reverses} the prior conclusion of Alwen et al. that DRSample provides stronger resistance to known pebbling attacks for practical values of $$N \leq 2^{24}$$. We construct a new iMHF candidate (DRSample+BRG) by using the bit-reversal graph to extend DRSample. We then prove that the construction is asymptotically optimal under every MHF criteria, and we empirically demonstrate that our iMHF provides the best resistance to {\em known} pebbling attacks. For example, we show that any parallel pebbling attack either has aAT cost $$\omega(N^2)$$ or requires at least $$\Omega(N)$$ steps with $$\Omega(N/\log N)$$ pebbles on the DAG. This makes our construction the first practical iMHF with a strong sustained space-complexity guarantee and immediately implies that any parallel pebbling has aAT complexity $$\Omega(N^2/\log N)$$. We also prove that any sequential pebbling (including the greedy pebbling attack) has aAT cost $$\Omega\left( N^2\right)$$ and, if a plausible conjecture holds, any parallel pebbling has aAT cost $$\Omega(N^2 \log \log N/\log N)$$ --- the best possible bound for an iMHF. We implement our new iMHF and demonstrate that it is just as fast as Argon2. Along the way we propose a simple modification to the Argon2 round function which increases an attacker's aAT cost by nearly an order of magnitude without increasing running time on a CPU. Finally, we give a pebbling reduction which proves that in the parallel random oracle model (PROM) the cost of evaluating an iMHF like Argon2i or DRSample+BRG is given by the pebbling cost of the underlying DAG. Prior pebbling reductions assumed that the iMHF round function concatenates input labels before hashing and did not apply to practical iMHFs such as Argon2i, DRSample or DRSample+BRG where input labels are instead XORed together. 
    more » « less
  3. A Boolean {\em $$k$$-monotone} function defined over a finite poset domain $${\cal D}$$ alternates between the values $$0$$ and $$1$$ at most $$k$$ times on any ascending chain in $${\cal D}$$. Therefore, $$k$$-monotone functions are natural generalizations of the classical {\em monotone} functions, which are the {\em $$1$$-monotone} functions. Motivated by the recent interest in $$k$$-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of $$k$$-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are $$k$$-monotone (or are close to being $$k$$-monotone) from functions that are far from being $$k$$-monotone. Our results include the following: \begin{enumerate} \item We demonstrate a separation between testing $$k$$-monotonicity and testing monotonicity, on the hypercube domain $$\{0,1\}^d$$, for $$k\geq 3$$; \item We demonstrate a separation between testing and learning on $$\{0,1\}^d$$, for $$k=\omega(\log d)$$: testing $$k$$-monotonicity can be performed with $$2^{O(\sqrt d \cdot \log d\cdot \log{1/\eps})}$$ queries, while learning $$k$$-monotone functions requires $$2^{\Omega(k\cdot \sqrt d\cdot{1/\eps})}$$ queries (Blais et al. (RANDOM 2015)). \item We present a tolerant test for functions $$f\colon[n]^d\to \{0,1\}$$ with complexity independent of $$n$$, which makes progress on a problem left open by Berman et al. (STOC 2014). \end{enumerate} Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid $$[n]^d$, and draw connections to distribution testing techniques. 
    more » « less
  4. null (Ed.)
    Abstract We propose a heuristic algorithm to solve the underlying hard problem of the CSIDH cryptosystem (and other isogeny-based cryptosystems using elliptic curves with endomorphism ring isomorphic to an imaginary quadratic order 𝒪). Let Δ = Disc(𝒪) (in CSIDH, Δ = −4 p for p the security parameter). Let 0 < α < 1/2, our algorithm requires: A classical circuit of size 2 O ˜ log ( | Δ | ) 1 − α . $$2^{\tilde{O}\left(\log(|\Delta|)^{1-\alpha}\right)}.$$ A quantum circuit of size 2 O ˜ log ( | Δ | ) α . $$2^{\tilde{O}\left(\log(|\Delta|)^{\alpha}\right)}.$$ Polynomial classical and quantum memory. Essentially, we propose to reduce the size of the quantum circuit below the state-of-the-art complexity 2 O ˜ log ( | Δ | ) 1 / 2 $$2^{\tilde{O}\left(\log(|\Delta|)^{1/2}\right)}$$ at the cost of increasing the classical circuit-size required. The required classical circuit remains subexponential, which is a superpolynomial improvement over the classical state-of-the-art exponential solutions to these problems. Our method requires polynomial memory, both classical and quantum. 
    more » « less
  5. null (Ed.)
    Abstract Background Cryo-EM data generated by electron tomography (ET) contains images for individual protein particles in different orientations and tilted angles. Individual cryo-EM particles can be aligned to reconstruct a 3D density map of a protein structure. However, low contrast and high noise in particle images make it challenging to build 3D density maps at intermediate to high resolution (1–3 Å). To overcome this problem, we propose a fully automated cryo-EM 3D density map reconstruction approach based on deep learning particle picking. Results A perfect 2D particle mask is fully automatically generated for every single particle. Then, it uses a computer vision image alignment algorithm (image registration) to fully automatically align the particle masks. It calculates the difference of the particle image orientation angles to align the original particle image. Finally, it reconstructs a localized 3D density map between every two single-particle images that have the largest number of corresponding features. The localized 3D density maps are then averaged to reconstruct a final 3D density map. The constructed 3D density map results illustrate the potential to determine the structures of the molecules using a few samples of good particles. Also, using the localized particle samples (with no background) to generate the localized 3D density maps can improve the process of the resolution evaluation in experimental maps of cryo-EM. Tested on two widely used datasets, Auto3DCryoMap is able to reconstruct good 3D density maps using only a few thousand protein particle images, which is much smaller than hundreds of thousands of particles required by the existing methods. Conclusions We design a fully automated approach for cryo-EM 3D density maps reconstruction (Auto3DCryoMap). Instead of increasing the signal-to-noise ratio by using 2D class averaging, our approach uses 2D particle masks to produce locally aligned particle images. Auto3DCryoMap is able to accurately align structural particle shapes. Also, it is able to construct a decent 3D density map from only a few thousand aligned particle images while the existing tools require hundreds of thousands of particle images. Finally, by using the pre-processed particle images,Auto3DCryoMap reconstructs a better 3D density map than using the original particle images. 
    more » « less