skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: RTExtract: Time-series NMR spectra quantification based on 3D surface ridge tracking
Abstract Motivation Time-series NMR has advanced our knowledge about metabolic dynamics. Before analyzing compounds through modeling or statistical methods, chemical features need to be tracked and quantified. However, because of peak overlap and peak shifting, the available protocols are time consuming at best or even impossible for some regions in NMR spectra. Results We introduce RTExtract (Ridge Tracking based Extract), a computer vision-based algorithm, to quantify time-series NMR spectra. The NMR spectra of multiple time points were formulated as a 3D surface. Candidate points were first filtered using local curvature and optima, then connected into ridges by a greedy algorithm. Interactive steps were implemented to refine results. Among 173 simulated ridges, 115 can be tracked (RMSD < 0.001). For reproducing previous results, RTExtract took less than two hours instead of ∼48 hours, and two instead of seven parameters need tuning. Multiple regions with overlapping and changing chemical shifts are accurately tracked. Availability Source code is freely available within Metabolomics toolbox GitHub repository (https://github.com/artedison/Edison_Lab_Shared_Metabolomics_UGA/tree/master/metabolomics_toolbox/code/ridge_tracking) and is implemented in MATLAB and R. Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
1713746
PAR ID:
10156043
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Bioinformatics
ISSN:
1367-4803
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This dataset contains raw data, processed data, and the codes used for data processing in our manuscript from our Fourier-transform infrared (FTIR) spectroscopy, Nuclear magnetic resonance (NMR), Raman spectroscopy, and X-ray diffraction (XRD) experiments. The data and codes for the fits of our unpolarized Raman spectra to polypeptide spectra is also included. The following explains the folder structure of the data provided in this dataset, which is also explained in the file ReadMe.txt. Browsing the data in Tree view is recommended. Folder contents Codes Raman Data Processing: The MATLAB script file RamanDecomposition.m contains the code to decompose the sub-peaks across different polarized Raman spectra (XX, XZ, ZX, ZZ, and YY), considering a set of pre-determined restrictions. The helper functions used in RamanDecomposition.m are included in the Helpers folder. RamanDecomposition.pdf is a PDF printout of the MATLAB code and output. P Value Simulation: 31_helix.ipynb and a_helix.ipynb: These two Jupyter Notebook files contain the intrinsic P value simulation for the 31-helix and alpha-helix structures. The simulation results were used to prepare Supplementary Table 4. See more details in the comments contained. Vector.py, Atom.py, Amino.py, and Helpers.py: These python files contains the class definitions used in 31_helix.ipynb and a_helix.ipynb. See more details in the comments contained. FTIR FTIR Raw Transmission.opj: This Origin data file contains the raw transmission data measured on single silk strand and used for FTIR spectra analysis. FTIR Deconvoluted Oscillators.opj: This Origin data file was generated from the data contained in the previous file using W-VASE software from J. A. Woollam, Inc. FTIR Unpolarized MultiStrand Raw Transmission.opj: This Origin data file contains the raw transmission data measured on multiple silk strands. The datasets contained in the first two files above were used to plot Figure 2a-b and the FTIR data points in Figure 4a, and Supplementary Figure 6. The datasets contained in the third file above were used to plot Supplementary Figure 3a. The datasets contained in the first two files above were used to plot Figure 2a-b, FTIR data points in Figure 4a, and Supplementary Figure 6. NMR Raw data files of the 13C MAS NMR spectra: ascii-spec_CP.txt: cross-polarized spectrum ascii-spec_DP.txt: direct-polarized spectrum Data is in ASCII format (comma separated values) using the following columns: Data point number Intensity Frequency [Hz] Frequency [ppm] Polypeptide Spectrum Fits MATLAB scripts (.m files) and Helpers: The MATLAB script file Raman_Fitting_Process_Part_1.m and Raman_Fitting_Process_Part_2.m contains the step-by-step instructions to perform the fitting process of our calculated unpolarized Raman spectrum, using digitized model polypeptide Raman spectra. The Helper folder contains two helper functions used by the above scripts. See the scripts for further instruction and information. Data aPA.csv, bPA.csv, GlyI.csv, GlyII.csv files: These csv files contain the digitized Raman spectra of poly-alanine, beta-alanine, poly-glycine-I, and poly-glycine-II. Raman_Exp_Data.mat: This MATLAB data file contains the processed, polarized Raman spectra obtained from our experiments. Variable freq is the wavenumber information of each collected spectrum. The variables xx, yy, zz, xz, zx represent the polarized Raman spectra collected. These variables are used to calculate the unpolarized Raman spectrum in Raman_Fitting_Process_Part_2.m. See the scripts for further instruction and information. Raman Raman Raw Data.mat: This MATLAB data file contains all the raw data used for Raman spectra analysis. All variables are of MATLAB structure data type. Each variable has fields called Freq and Raw, with Freq contains the wavenumber information of the measured spectra and Raw contains 5 measured Raman signal strengths. Variable XX, XZ, ZX, ZZ, and YY were used to plot and sub-peak analysis for Figure 2c-d, Raman data points in Figure 4a, Figure 5b, Supplementary Figure 2, and Supplementary Figure 7. Variable WideRange was used to plot and identify the peaks for Supplementary Figure 3b. X-Ray X-Ray.mat: This MATLAB data file contains the raw X-ray data used for the diffraction analysis in Supplementary Figure 5. 
    more » « less
  2. Yann, Ponty (Ed.)
    Abstract Motivation Correlated Nuclear Magnetic Resonance (NMR) chemical shift changes identified through the CHEmical Shift Projection Analysis (CHESPA) and CHEmical Shift Covariance Analysis (CHESCA) reveal pathways of allosteric transitions in biological macromolecules. To address the need for an automated platform that implements CHESPA and CHESCA and integrates them with other NMR analysis software packages, we introduce here integrated plugins for NMRFAM-SPARKY that implement the seamless detection and visualization of allosteric networks. Availability and implementation CHESCA-SPARKY and CHESPA-SPARKY are available in the latest version of NMRFAM-SPARKY from the National Magnetic Resonance Facility at Madison (http://pine.nmrfam.wisc.edu/download_packages.html), the NMRbox Project (https://nmrbox.org) and to subscribers to the SBGrid (https://sbgrid.org). The assigned spectra involved in this study and tutorial videos using this dataset are available at https://sites.google.com/view/chescachespa-sparky. Supplementary information Supplementary data are available at Bioinformatics Online. 
    more » « less
  3. Background subtraction is a general problem in spectroscopy often addressed with application-specific techniques, or methods that introduce a variety of implementation barriers such as having to specify peak-free regions of the spectrum. An iterative dual-tree complex wavelet transform-based background subtraction method (DTCWT-IA) was recently developed for the analysis of ultrafast electron diffraction patterns. The method was designed to require minimal user intervention, to support streamlined analysis of many diffraction patterns with complex overlapping peaks and time-varying backgrounds, and is implemented in an open-source computer program. We examined the performance of DTCWT-IA for the analysis of spectra acquired by a range of optical spectroscopies including ultraviolet–visible spectroscopy (UV–Vis), X-ray photoelectron spectroscopy (XPS), and surface-enhanced Raman spectroscopy (SERS). A key benefit of the method is that the user need not specify regions of the spectrum where no peaks are expected to occur. SER spectra were used to investigate the robustness of DTCWT-IA to signal-to-noise levels in the spectrum and to user operation, specifically to two of the algorithm parameter settings: decomposition level and iteration number. The single, general DTCWT-IA implementation performs well in comparison to the different conventional approaches to background subtraction for UV–Vis, XPS, and SERS, while requiring minimal input. The method thus holds the same potential for optical spectroscopy as for ultrafast electron diffraction, namely streamlined analysis of spectra with complex distributions of peaks and varying signal levels, thus supporting real-time spectral analysis or the analysis of data acquired from different sources. 
    more » « less
  4. Abstract Without imposing prior distributional knowledge underlying multivariate time series of interest, we propose a nonparametric change-point detection approach to estimate the number of change points and their locations along the temporal axis. We develop a structural subsampling procedure such that the observations are encoded into multiple sequences of Bernoulli variables. A maximum likelihood approach in conjunction with a newly developed searching algorithm is implemented to detect change points on each Bernoulli process separately. Then, aggregation statistics are proposed to collectively synthesize change-point results from all individual univariate time series into consistent and stable location estimations. We also study a weighting strategy to measure the degree of relevance for different subsampled groups. Simulation studies are conducted and shown that the proposed change-point methodology for multivariate time series has favorable performance comparing with currently available state-of-the-art nonparametric methods under various settings with different degrees of complexity. Real data analyses are finally performed on categorical, ordinal, and continuous time series taken from fields of genetics, climate, and finance. 
    more » « less
  5. Neural networks (NNs) are emerging as a rapid and scalable method for quantifying metabolites directly from nuclear magnetic resonance (NMR) spectra, but the nonlinear nature of NNs precludes understanding of how a model makes predictions. This study implements an explainable artificial intelligence algorithm called integrated gradients (IG) to elucidate which regions of input spectra are the most important for the quantification of specific analytes. The approach is first validated in simulated mixture spectra of eight aqueous metabolites and then investigated in experimentally acquired lipid spectra of a reference standard mixture and a murine hepatic extract. The IG method revealed that, like a human spectroscopist, NNs recognize and quantify analytes based on an analyte’s respective resonance line-shapes, amplitudes, and frequencies. NNs can compensate for peak overlap and prioritize specific resonances most important for concentration determination. Further, we show how modifying a NN training dataset can affect how a model makes decisions, and we provide examples of how this approach can be used to de-bug issues with model performance. Overall, results show that the IG technique facilitates a visual and quantitative understanding of how model inputs relate to model outputs, potentially making NNs a more attractive option for targeted and automated NMR-based metabolomics. 
    more » « less