skip to main content

This content will become publicly available on January 1, 2024

Title: Fast principal component analysis for cryo-electron microscopy images
Abstract Principal component analysis (PCA) plays an important role in the analysis of cryo-electron microscopy (cryo-EM) images for various tasks such as classification, denoising, compression, and ab initio modeling. We introduce a fast method for estimating a compressed representation of the 2-D covariance matrix of noisy cryo-EM projection images affected by radial point spread functions that enables fast PCA computation. Our method is based on a new algorithm for expanding images in the Fourier–Bessel basis (the harmonics on the disk), which provides a convenient way to handle the effect of the contrast transfer functions. For $ N $ images of size $ L\times L $ , our method has time complexity $ O\left({NL}^3+{L}^4\right) $ and space complexity $ O\left({NL}^2+{L}^3\right) $ . In contrast to previous work, these complexities are independent of the number of different contrast transfer functions of the images. We demonstrate our approach on synthetic and experimental data and show acceleration by factors of up to two orders of magnitude.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Biological Imaging
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Memory-hard functions (MHFs) are a key cryptographic primitive underlying the design of moderately expensive password hashing algorithms and egalitarian proofs of work. Over the past few years several increasingly stringent goals for an MHF have been proposed including the requirement that the MHF have high sequential space-time (ST) complexity, parallel space-time complexity, amortized area-time (aAT) complexity and sustained space complexity. Data-Independent Memory Hard Functions (iMHFs) are of special interest in the context of password hashing as they naturally resist side-channel attacks. iMHFs can be specified using a directed acyclic graph (DAG) $G$ with $N=2^n$ nodes and low indegree and the complexity of the iMHF can be analyzed using a pebbling game. Recently, Alwen et al. [CCS'17] constructed an DAG called DRSample which has aAT complexity at least $\Omega\left( N^2/\log N\right)$. Asymptotically DRSample outperformed all prior iMHF constructions including Argon2i, winner of the password hashing competition (aAT cost $\mathcal{O}\left(N^{1.767}\right)$), though the constants in these bounds are poorly understood. We show that the the greedy pebbling strategy of Boneh et al. [ASIACRYPT'16] is particularly effective against DRSample e.g., the aAT cost is $\mathcal{O}\left( N^2/\log N\right)$. In fact, our empirical analysis {\em reverses} the prior conclusion of Alwen et al. that DRSample provides stronger resistance to known pebbling attacks for practical values of $N \leq 2^{24}$. We construct a new iMHF candidate (DRSample+BRG) by using the bit-reversal graph to extend DRSample. We then prove that the construction is asymptotically optimal under every MHF criteria, and we empirically demonstrate that our iMHF provides the best resistance to {\em known} pebbling attacks. For example, we show that any parallel pebbling attack either has aAT cost $\omega(N^2)$ or requires at least $\Omega(N)$ steps with $\Omega(N/\log N)$ pebbles on the DAG. This makes our construction the first practical iMHF with a strong sustained space-complexity guarantee and immediately implies that any parallel pebbling has aAT complexity $\Omega(N^2/\log N)$. We also prove that any sequential pebbling (including the greedy pebbling attack) has aAT cost $\Omega\left( N^2\right)$ and, if a plausible conjecture holds, any parallel pebbling has aAT cost $\Omega(N^2 \log \log N/\log N)$ --- the best possible bound for an iMHF. We implement our new iMHF and demonstrate that it is just as fast as Argon2. Along the way we propose a simple modification to the Argon2 round function which increases an attacker's aAT cost by nearly an order of magnitude without increasing running time on a CPU. Finally, we give a pebbling reduction which proves that in the parallel random oracle model (PROM) the cost of evaluating an iMHF like Argon2i or DRSample+BRG is given by the pebbling cost of the underlying DAG. Prior pebbling reductions assumed that the iMHF round function concatenates input labels before hashing and did not apply to practical iMHFs such as Argon2i, DRSample or DRSample+BRG where input labels are instead XORed together. 
    more » « less
  2. A Boolean {\em $k$-monotone} function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$-monotone functions are natural generalizations of the classical {\em monotone} functions, which are the {\em $1$-monotone} functions. Motivated by the recent interest in $k$-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of $k$-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are $k$-monotone (or are close to being $k$-monotone) from functions that are far from being $k$-monotone. Our results include the following: \begin{enumerate} \item We demonstrate a separation between testing $k$-monotonicity and testing monotonicity, on the hypercube domain $\{0,1\}^d$, for $k\geq 3$; \item We demonstrate a separation between testing and learning on $\{0,1\}^d$, for $k=\omega(\log d)$: testing $k$-monotonicity can be performed with $2^{O(\sqrt d \cdot \log d\cdot \log{1/\eps})}$ queries, while learning $k$-monotone functions requires $2^{\Omega(k\cdot \sqrt d\cdot{1/\eps})}$ queries (Blais et al. (RANDOM 2015)). \item We present a tolerant test for functions $f\colon[n]^d\to \{0,1\}$ with complexity independent of $n$, which makes progress on a problem left open by Berman et al. (STOC 2014). \end{enumerate} Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid $[n]^d$, and draw connections to distribution testing techniques. 
    more » « less
  3. null (Ed.)
    Drilling and milling operations are material removal processes involved in everyday conventional productions, especially in the high-speed metal cutting industry. The monitoring of tool information (wear, dynamic behavior, deformation, etc.) is essential to guarantee the success of product fabrication. Many methods have been applied to monitor the cutting tools from the information of cutting force, spindle motor current, vibration, as well as sound acoustic emission. However, those methods are indirect and sensitive to environmental noises. Here, the in-process imaging technique that can capture the cutting tool information while cutting the metal was studied. As machinists judge whether a tool is worn-out by the naked eye, utilizing the vision system can directly present the performance of the machine tools. We proposed a phase shifted strobo-stereoscopic method (Figure 1) for three-dimensional (3D) imaging. The stroboscopic instrument is usually applied for the measurement of fast-moving objects. The operation principle is as follows: when synchronizing the frequency of the light source illumination and the motion of object, the object appears to be stationary. The motion frequency of the target is transferring from the count information of the encoder signals from the working rotary spindle. If small differences are added to the frequency, the object appears to be slowly moving or rotating. This effect can be working as the source for the phase-shifting; with this phase information, the target can be whole-view 3D reconstructed by 360 degrees. The stereoscopic technique is embedded with two CCD cameras capturing images that are located bilateral symmetrically in regard to the target. The 3D scene is reconstructed by the location information of the same object points from both the left and right images. In the proposed system, an air spindle was used to secure the motion accuracy and drilling/milling speed. As shown in Figure 2, two CCDs with 10X objective lenses were installed on a linear rail with rotary stages to capture the machine tool bit raw picture for further 3D reconstruction. The overall measurement process was summarized in the flow chart (Figure 3). As the count number of encoder signals is related to the rotary speed, the input speed (unit of RPM) was set as the reference signal to control the frequency (f0) of the illumination of the LED. When the frequency was matched with the reference signal, both CCDs started to gather the pictures. With the mismatched frequency (Δf) information, a sequence of images was gathered under the phase-shifted process for a whole-view 3D reconstruction. The study in this paper was based on a 3/8’’ drilling tool performance monitoring. This paper presents the principle of the phase-shifted strobe-stereoscopic 3D imaging process. A hardware set-up is introduced, , as well as the 3D imaging algorithm. The reconstructed image analysis under different working speeds is discussed, the reconstruction resolution included. The uncertainty of the imaging process and the built-up system are also analyzed. As the input signal is the working speed, no other information from other sources is required. This proposed method can be applied as an on-machine or even in-process metrology. With the direct method of the 3D imaging machine vision system, it can directly offer the machine tool surface and fatigue information. This presented method can supplement the blank for determining the performance status of the machine tools, which further guarantees the fabrication process. 
    more » « less
  4. In many real-life image analysis applications, particularly in biomedical research domains, the objects of interest undergo multiple transformations that alter their visual properties while keeping the semantic content unchanged. Disentangling images into semantic content factors and transformations can provide significant benefits into many domain-specific image analysis tasks. To this end, we propose a generic unsupervised framework, Harmony, that simultaneously and explicitly disentangles semantic content from multiple parameterized transformations. Harmony leverages a simple cross-contrastive learning framework with multiple explicitly parameterized latent representations to disentangle content from transformations. To demonstrate the efficacy of Harmony, we apply it to disentangle image semantic content from several parameterized transformations (rotation, translation, scaling, and contrast). Harmony achieves significantly improved disentanglement over the baseline models on several image datasets of diverse domains. With such disentanglement, Harmony is demonstrated to incentivize bioimage analysis research by modeling structural heterogeneity of macromolecules from cryo-ET images and learning transformation-invariant representations of protein particles from single-particle cryo-EM images. Harmony also performs very well in disentangling content from 3D transformations and can perform coarse and fast alignment of 3D cryo-ET subtomograms. Therefore, Harmony is generalizable to many other imaging domains and can potentially be extended to domains beyond imaging as well. 
    more » « less
  5. null (Ed.)
    Abstract Background Cryo-EM data generated by electron tomography (ET) contains images for individual protein particles in different orientations and tilted angles. Individual cryo-EM particles can be aligned to reconstruct a 3D density map of a protein structure. However, low contrast and high noise in particle images make it challenging to build 3D density maps at intermediate to high resolution (1–3 Å). To overcome this problem, we propose a fully automated cryo-EM 3D density map reconstruction approach based on deep learning particle picking. Results A perfect 2D particle mask is fully automatically generated for every single particle. Then, it uses a computer vision image alignment algorithm (image registration) to fully automatically align the particle masks. It calculates the difference of the particle image orientation angles to align the original particle image. Finally, it reconstructs a localized 3D density map between every two single-particle images that have the largest number of corresponding features. The localized 3D density maps are then averaged to reconstruct a final 3D density map. The constructed 3D density map results illustrate the potential to determine the structures of the molecules using a few samples of good particles. Also, using the localized particle samples (with no background) to generate the localized 3D density maps can improve the process of the resolution evaluation in experimental maps of cryo-EM. Tested on two widely used datasets, Auto3DCryoMap is able to reconstruct good 3D density maps using only a few thousand protein particle images, which is much smaller than hundreds of thousands of particles required by the existing methods. Conclusions We design a fully automated approach for cryo-EM 3D density maps reconstruction (Auto3DCryoMap). Instead of increasing the signal-to-noise ratio by using 2D class averaging, our approach uses 2D particle masks to produce locally aligned particle images. Auto3DCryoMap is able to accurately align structural particle shapes. Also, it is able to construct a decent 3D density map from only a few thousand aligned particle images while the existing tools require hundreds of thousands of particle images. Finally, by using the pre-processed particle images,Auto3DCryoMap reconstructs a better 3D density map than using the original particle images. 
    more » « less