skip to main content


Title: DEEP Picker is a Deep Neural Network for Accurate Deconvolution of Complex Two-Dimensional NMR Spectra
The analysis of nuclear magnetic resonance (NMR) spectra for the comprehensive and unambiguous identification and characterization of peaks is a difficult, but critically important step in all NMR analyses of complex biological molecular systems. Here, we introduce DEEP Picker, a deep neural network (DNN)-based approach for peak picking and spectral deconvolution which semi-automates the analysis of two-dimensional NMR spectra. DEEP Picker includes 8 hidden convolutional layers and was trained on a large number of synthetic spectra of known composition with variable degrees of crowdedness. We show that our method is able to correctly identify overlapping peaks, including ones that are challenging for expert spectroscopists and existing computational methods alike. We demonstrate the utility of DEEP Picker on NMR spectra of folded and intrinsically disordered proteins as well as a complex metabolomics mixture, and show how it provides access to valuable NMR information. DEEP Picker should facilitate the semi-automation and standardization of protocols for better consistency and sharing of results within the scientific community.  more » « less
Award ID(s):
2103637
NSF-PAR ID:
10332410
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Nature communications
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We demonstrate that natural isotopic abundance 2D heteronuclear correlation (HETCOR) solid-state NMR spectra can be used to significantly reduce or eliminate the broadening of 1 H and 13 C solid-state NMR spectra of organic solids due to anisotropic bulk magnetic susceptibility (ABMS). ABMS often manifests in solids with aromatic groups, such as active pharmaceutical ingredients (APIs), and inhomogeneously broadens the NMR peaks of all nuclei in the sample. Inhomogeneous peaks with full widths at half maximum (FWHM) of ∼1 ppm typically result from ABMS broadening and the low spectral resolution impedes the analysis of solid-state NMR spectra. ABMS broadening of solid-state NMR spectra has previously been eliminated using 2D multiple-quantum correlation experiments, or by performing NMR experiments on diluted materials or single crystals. However, these experiments are often infeasible due to their poor sensitivity and/or provide limited gains in resolution. 2D 1 H– 13 C HETCOR experiments have previously been applied to reduce susceptibility broadening in paramagnetic solids and we show that this strategy can significantly reduce ABMS broadening in diamagnetic organic solids. Comparisons of 1D solid-state NMR spectra and 1 H and 13 C solid-state NMR spectra obtained from 2D 1 H– 13 C HETCOR NMR spectra show that the HETCOR spectrum directly increases resolution by a factor of 1.5 to 8. The direct gain in resolution is determined by the ratio of the inhomogeneous 13 C/ 1 H linewidth to the homogeneous 1 H linewidth, with the former depending on the magnitude of the ABMS broadening and the strength of the applied field and the latter on the efficiency of homonuclear decoupling. The direct gains in resolution obtained using the 2D HETCOR experiments are better than that obtained by dilution. For solids with long proton longitudinal relaxation times, dynamic nuclear polarization (DNP) was applied to enhance sensitivity and enable the acquisition of 2D 1 H– 13 C HETCOR NMR spectra. 2D 1 H– 13 C HETCOR experiments were applied to resolve and partially assign the NMR signals of the form I and form II polymorphs of aspirin in a sample containing both forms. These findings have important implications for ultra-high field NMR experiments, optimization of decoupling schemes and assessment of the fundamental limits on the resolution of solid-state NMR spectra. 
    more » « less
  2. Abstract

    Stannous fluoride (SnF2) is an effective fluoride source and antimicrobial agent that is widely used in commercial toothpaste formulations. The antimicrobial activity of SnF2is partly attributed to the presence of Sn(II) ions. However, it is challenging to directly determine the Sn speciation and oxidation state within commercially available toothpaste products due to the low weight loading of SnF2(0.454 wt% SnF2, 0.34 wt% Sn) and the amorphous, semi-solid nature of the toothpaste. Here, we show that dynamic nuclear polarization (DNP) enables119Sn solid-state NMR experiments that can probe the Sn speciation within commercially available toothpaste. Solid-state NMR experiments on SnF2and SnF4show that 19F isotropic chemical shift and119Sn chemical shift anisotropy (CSA) are highly sensitive to the Sn oxidation state. DNP-enhanced119Sn magic-angle turning (MAT) 2D NMR spectra of toothpastes resolve Sn(II) and Sn(IV) by their119Sn chemical shift tensor parameters. Fits of DNP-enhanced 1D1H → 119Sn solid-state NMR spectra allow the populations of Sn(II) and Sn(IV) within the toothpastes to be estimated. This analysis reveals that three of the four commercially available toothpastes contained at least 80% Sn(II), whereas one of the toothpaste contained a significantly higher amount of Sn(IV).

     
    more » « less
  3. This dataset contains raw data, processed data, and the codes used for data processing in our manuscript from our Fourier-transform infrared (FTIR) spectroscopy, Nuclear magnetic resonance (NMR), Raman spectroscopy, and X-ray diffraction (XRD) experiments. The data and codes for the fits of our unpolarized Raman spectra to polypeptide spectra is also included. The following explains the folder structure of the data provided in this dataset, which is also explained in the file ReadMe.txt. Browsing the data in Tree view is recommended. Folder contents Codes Raman Data Processing: The MATLAB script file RamanDecomposition.m contains the code to decompose the sub-peaks across different polarized Raman spectra (XX, XZ, ZX, ZZ, and YY), considering a set of pre-determined restrictions. The helper functions used in RamanDecomposition.m are included in the Helpers folder. RamanDecomposition.pdf is a PDF printout of the MATLAB code and output. P Value Simulation: 31_helix.ipynb and a_helix.ipynb: These two Jupyter Notebook files contain the intrinsic P value simulation for the 31-helix and alpha-helix structures. The simulation results were used to prepare Supplementary Table 4. See more details in the comments contained. Vector.py, Atom.py, Amino.py, and Helpers.py: These python files contains the class definitions used in 31_helix.ipynb and a_helix.ipynb. See more details in the comments contained. FTIR FTIR Raw Transmission.opj: This Origin data file contains the raw transmission data measured on single silk strand and used for FTIR spectra analysis. FTIR Deconvoluted Oscillators.opj: This Origin data file was generated from the data contained in the previous file using W-VASE software from J. A. Woollam, Inc. FTIR Unpolarized MultiStrand Raw Transmission.opj: This Origin data file contains the raw transmission data measured on multiple silk strands. The datasets contained in the first two files above were used to plot Figure 2a-b and the FTIR data points in Figure 4a, and Supplementary Figure 6. The datasets contained in the third file above were used to plot Supplementary Figure 3a. The datasets contained in the first two files above were used to plot Figure 2a-b, FTIR data points in Figure 4a, and Supplementary Figure 6. NMR Raw data files of the 13C MAS NMR spectra: ascii-spec_CP.txt: cross-polarized spectrum ascii-spec_DP.txt: direct-polarized spectrum Data is in ASCII format (comma separated values) using the following columns: Data point number Intensity Frequency [Hz] Frequency [ppm] Polypeptide Spectrum Fits MATLAB scripts (.m files) and Helpers: The MATLAB script file Raman_Fitting_Process_Part_1.m and Raman_Fitting_Process_Part_2.m contains the step-by-step instructions to perform the fitting process of our calculated unpolarized Raman spectrum, using digitized model polypeptide Raman spectra. The Helper folder contains two helper functions used by the above scripts. See the scripts for further instruction and information. Data aPA.csv, bPA.csv, GlyI.csv, GlyII.csv files: These csv files contain the digitized Raman spectra of poly-alanine, beta-alanine, poly-glycine-I, and poly-glycine-II. Raman_Exp_Data.mat: This MATLAB data file contains the processed, polarized Raman spectra obtained from our experiments. Variable freq is the wavenumber information of each collected spectrum. The variables xx, yy, zz, xz, zx represent the polarized Raman spectra collected. These variables are used to calculate the unpolarized Raman spectrum in Raman_Fitting_Process_Part_2.m. See the scripts for further instruction and information. Raman Raman Raw Data.mat: This MATLAB data file contains all the raw data used for Raman spectra analysis. All variables are of MATLAB structure data type. Each variable has fields called Freq and Raw, with Freq contains the wavenumber information of the measured spectra and Raw contains 5 measured Raman signal strengths. Variable XX, XZ, ZX, ZZ, and YY were used to plot and sub-peak analysis for Figure 2c-d, Raman data points in Figure 4a, Figure 5b, Supplementary Figure 2, and Supplementary Figure 7. Variable WideRange was used to plot and identify the peaks for Supplementary Figure 3b. X-Ray X-Ray.mat: This MATLAB data file contains the raw X-ray data used for the diffraction analysis in Supplementary Figure 5. 
    more » « less
  4. Abstract

    We describe the preparation, dynamic, assembly characteristics of vase‐shaped basket13−along with its ability to form an inclusion complex with anticancer drug mitoxantrone in abiotic and biotic systems. This novel cavitand has a deep nonpolar pocket consisting of three naphthalimide sides fused to a bicyclic platform at the bottom while carrying polar glycines at the top. The results of1H Nuclear Magnetic Resonance (NMR),1H NMR Chemical Exchange Saturation Transfer (CEST), Calorimetry, Hybrid Replica Exchange Molecular Dynamics (REMD), and Microcrystal Electron Diffraction (MicroED) measurements are in line with1forming dimer [12]6−, to be in equilibrium with monomers1(R)3−(relaxed) and1(S)3−(squeezed). Through simultaneous line‐shape analysis of1H NMR data, kinetic and thermodynamic parameters characterizing these equilibria were quantified. Basket1(R)3−includes anticancer drug mitoxantrone (MTO2+) in its pocket to give stable binary complex [MTO1](Kd=2.1 μM) that can be precipitated in vitro with UV light or pH as stimuli. Both in vitro and in vivo studies showed that the basket is nontoxic, while at a higher proportion with respect to MTO it reduced its cytotoxicity in vitro. With well‐characterized internal dynamics and dimerization, the ability to include mitoxantrone, and biocompatibility, the stage is set to develop sequestering agents from deep‐cavity baskets.

     
    more » « less
  5. Abstract

    Raman spectroscopy provides label‐free, specific analysis of biomolecular structure and interactions. It could have a greater impact with improved characterization of complex fingerprint vibrations. Many Raman peaks have been assigned to cholesterol, for example, but the molecular vibrations associated with those peaks are not known. In this report, time‐dependent density functional theory calculations of the Raman spectrum of cholesterol are compared to measurements on microcrystalline powder to identify 23 peaks in the Raman spectrum. Among them, a band of six peaks is found to be sensitive to the conformational structure of cholesterol's iso‐octyl chain. Calculations on 10 conformers in this spectral band are fit to experimental spectra to probe the cholesterol chain structure in purified powder and in phospholipid vesicles. In vesicles, the chain is found to bend perpendicular to the steroid rings, supporting the case that the chain is a dynamic structure that contributes to lipid condensation and other effects of cholesterol in biomembranes.

    Statement of Significance: Here we use density functional theory to identify a band of six peaks in cholesterol's Raman spectrum that is sensitive to the conformational structure of cholesterol's chain. Raman spectra were analyzed to show that in fluid‐phase lipid membranes, about half of the cholesterol chains point perpendicular to the steroid rings. This new method of label‐free structural analysis could make significant contributions to our understanding of cholesterol's critical role in biomembrane structure and function. More broadly, the results show that computational quantum chemistry Raman spectroscopy can make significant new contributions to molecular structure when spectra are interpreted with computational quantum chemistry.

     
    more » « less