skip to main content

Title: RTExtract: Time-series NMR spectra quantification based on 3D surface ridge tracking
Abstract Motivation Time-series NMR has advanced our knowledge about metabolic dynamics. Before analyzing compounds through modeling or statistical methods, chemical features need to be tracked and quantified. However, because of peak overlap and peak shifting, the available protocols are time consuming at best or even impossible for some regions in NMR spectra. Results We introduce RTExtract (Ridge Tracking based Extract), a computer vision-based algorithm, to quantify time-series NMR spectra. The NMR spectra of multiple time points were formulated as a 3D surface. Candidate points were first filtered using local curvature and optima, then connected into ridges by a greedy algorithm. Interactive steps were implemented to refine results. Among 173 simulated ridges, 115 can be tracked (RMSD < 0.001). For reproducing previous results, RTExtract took less than two hours instead of ∼48 hours, and two instead of seven parameters need tuning. Multiple regions with overlapping and changing chemical shifts are accurately tracked. Availability Source code is freely available within Metabolomics toolbox GitHub repository ( and is implemented in MATLAB and R. Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This dataset contains raw data, processed data, and the codes used for data processing in our manuscript from our Fourier-transform infrared (FTIR) spectroscopy, Nuclear magnetic resonance (NMR), Raman spectroscopy, and X-ray diffraction (XRD) experiments. The data and codes for the fits of our unpolarized Raman spectra to polypeptide spectra is also included. The following explains the folder structure of the data provided in this dataset, which is also explained in the file ReadMe.txt. Browsing the data in Tree view is recommended. Folder contents Codes Raman Data Processing: The MATLAB script file RamanDecomposition.m contains the code to decompose the sub-peaks across different polarized Raman spectra (XX, XZ, ZX, ZZ, and YY), considering a set of pre-determined restrictions. The helper functions used in RamanDecomposition.m are included in the Helpers folder. RamanDecomposition.pdf is a PDF printout of the MATLAB code and output. P Value Simulation: 31_helix.ipynb and a_helix.ipynb: These two Jupyter Notebook files contain the intrinsic P value simulation for the 31-helix and alpha-helix structures. The simulation results were used to prepare Supplementary Table 4. See more details in the comments contained.,,, and These python files contains the class definitions used in 31_helix.ipynb and a_helix.ipynb. See more details in the comments contained. FTIR FTIR Raw Transmission.opj: This Origin data file contains the raw transmission data measured on single silk strand and used for FTIR spectra analysis. FTIR Deconvoluted Oscillators.opj: This Origin data file was generated from the data contained in the previous file using W-VASE software from J. A. Woollam, Inc. FTIR Unpolarized MultiStrand Raw Transmission.opj: This Origin data file contains the raw transmission data measured on multiple silk strands. The datasets contained in the first two files above were used to plot Figure 2a-b and the FTIR data points in Figure 4a, and Supplementary Figure 6. The datasets contained in the third file above were used to plot Supplementary Figure 3a. The datasets contained in the first two files above were used to plot Figure 2a-b, FTIR data points in Figure 4a, and Supplementary Figure 6. NMR Raw data files of the 13C MAS NMR spectra: ascii-spec_CP.txt: cross-polarized spectrum ascii-spec_DP.txt: direct-polarized spectrum Data is in ASCII format (comma separated values) using the following columns: Data point number Intensity Frequency [Hz] Frequency [ppm] Polypeptide Spectrum Fits MATLAB scripts (.m files) and Helpers: The MATLAB script file Raman_Fitting_Process_Part_1.m and Raman_Fitting_Process_Part_2.m contains the step-by-step instructions to perform the fitting process of our calculated unpolarized Raman spectrum, using digitized model polypeptide Raman spectra. The Helper folder contains two helper functions used by the above scripts. See the scripts for further instruction and information. Data aPA.csv, bPA.csv, GlyI.csv, GlyII.csv files: These csv files contain the digitized Raman spectra of poly-alanine, beta-alanine, poly-glycine-I, and poly-glycine-II. Raman_Exp_Data.mat: This MATLAB data file contains the processed, polarized Raman spectra obtained from our experiments. Variable freq is the wavenumber information of each collected spectrum. The variables xx, yy, zz, xz, zx represent the polarized Raman spectra collected. These variables are used to calculate the unpolarized Raman spectrum in Raman_Fitting_Process_Part_2.m. See the scripts for further instruction and information. Raman Raman Raw Data.mat: This MATLAB data file contains all the raw data used for Raman spectra analysis. All variables are of MATLAB structure data type. Each variable has fields called Freq and Raw, with Freq contains the wavenumber information of the measured spectra and Raw contains 5 measured Raman signal strengths. Variable XX, XZ, ZX, ZZ, and YY were used to plot and sub-peak analysis for Figure 2c-d, Raman data points in Figure 4a, Figure 5b, Supplementary Figure 2, and Supplementary Figure 7. Variable WideRange was used to plot and identify the peaks for Supplementary Figure 3b. X-Ray X-Ray.mat: This MATLAB data file contains the raw X-ray data used for the diffraction analysis in Supplementary Figure 5. 
    more » « less
  2. Yann, Ponty (Ed.)
    Abstract Motivation Correlated Nuclear Magnetic Resonance (NMR) chemical shift changes identified through the CHEmical Shift Projection Analysis (CHESPA) and CHEmical Shift Covariance Analysis (CHESCA) reveal pathways of allosteric transitions in biological macromolecules. To address the need for an automated platform that implements CHESPA and CHESCA and integrates them with other NMR analysis software packages, we introduce here integrated plugins for NMRFAM-SPARKY that implement the seamless detection and visualization of allosteric networks. Availability and implementation CHESCA-SPARKY and CHESPA-SPARKY are available in the latest version of NMRFAM-SPARKY from the National Magnetic Resonance Facility at Madison (, the NMRbox Project ( and to subscribers to the SBGrid ( The assigned spectra involved in this study and tutorial videos using this dataset are available at Supplementary information Supplementary data are available at Bioinformatics Online. 
    more » « less
  3. Abstract

    Protein regions which are intrinsically disordered, exist as an ensemble of rapidly interconverting structures. Cooling proteins to cryogenic temperatures for dynamic nuclear polarization (DNP) magic angle spinning (MAS) NMR studies suspends most of the motions, resulting in peaks that are broad but not featureless. To demonstrate that detailed conformational restraints can be retrieved from the peak shapes of frozen proteins alone, we developed and used a simulation framework to assign peak features to conformers in the ensemble. We validated our simulations by comparing them to spectra of α‐synuclein acquired under different experimental conditions. Our assignments of peaks to discrete dihedral angle populations suggest that structural constraints are attainable under cryogenic conditions. The ability to infer ensemble populations from peak shapes has important implications for DNP MAS NMR studies of proteins with regions of disorder in living cells because chemical shifts are the most accessible measured parameter.

    more » « less
  4. Background subtraction is a general problem in spectroscopy often addressed with application-specific techniques, or methods that introduce a variety of implementation barriers such as having to specify peak-free regions of the spectrum. An iterative dual-tree complex wavelet transform-based background subtraction method (DTCWT-IA) was recently developed for the analysis of ultrafast electron diffraction patterns. The method was designed to require minimal user intervention, to support streamlined analysis of many diffraction patterns with complex overlapping peaks and time-varying backgrounds, and is implemented in an open-source computer program. We examined the performance of DTCWT-IA for the analysis of spectra acquired by a range of optical spectroscopies including ultraviolet–visible spectroscopy (UV–Vis), X-ray photoelectron spectroscopy (XPS), and surface-enhanced Raman spectroscopy (SERS). A key benefit of the method is that the user need not specify regions of the spectrum where no peaks are expected to occur. SER spectra were used to investigate the robustness of DTCWT-IA to signal-to-noise levels in the spectrum and to user operation, specifically to two of the algorithm parameter settings: decomposition level and iteration number. The single, general DTCWT-IA implementation performs well in comparison to the different conventional approaches to background subtraction for UV–Vis, XPS, and SERS, while requiring minimal input. The method thus holds the same potential for optical spectroscopy as for ultrafast electron diffraction, namely streamlined analysis of spectra with complex distributions of peaks and varying signal levels, thus supporting real-time spectral analysis or the analysis of data acquired from different sources. 
    more » « less
  5. Resolving small molecule mixtures by nuclear magnetic resonance (NMR) spectroscopy has been of great interest for a long time for its precision, reproducibility, and efficiency. However, spectral analyses for such mixtures are often highly challenging due to overlapping resonance lines and limited chemical shift windows. The existing experimental and theoretical methods to produce shift NMR spectra in dealing with the problem have limited applicability owing to sensitivity issues, inconsistency, and/or the requirement of prior knowledge. Recently, we resolved the problem by decoupling multiplet structures in NMR spectra by the wavelet packet transform (WPT) technique. In this work, we developed a scheme for deploying the method in generating highly resolved WPT NMR spectra and predicting the composition of the corresponding molecular mixtures from their 1H NMR spectra in an automated fashion. The four-step spectral analysis scheme consists of calculating the WPT spectrum, peak matching with a WPT shift NMR library, followed by two optimization steps in producing the predicted molecular composition of a mixture. The robustness of the method was tested on an augmented dataset of 1000 molecular mixtures, each containing 3 to 7 molecules. The method successfully predicted the constituent molecules with a median true positive rate of 1.0 against the varying compositions, while a median false positive rate of 0.04 was obtained. The approach can be scaled easily for much larger datasets. 
    more » « less