skip to main content

Title: ISREA: An Efficient Peak-Preserving Baseline Correction Algorithm for Raman Spectra
A critical step in Raman spectroscopy is baseline correction. This procedure eliminates the background signals generated by residual Rayleigh scattering or fluorescence. Baseline correction procedures relying on asymmetric loss functions have been employed recently. They operate with a reduced penalty on positive spectral deviations that essentially push down the baseline estimates from invading Raman peak areas. However, their coupling with polynomial fitting may not be suitable over the whole spectral domain and can yield inconsistent baselines. Their requirement of the specification of a threshold and the non-convexity of the corresponding objective function further complicates the computation. Learning from their pros and cons, we have developed a novel baseline correction procedure called the iterative smoothing-splines with root error adjustment (ISREA) that has three distinct advantages. First, ISREA uses smoothing splines to estimate the baseline that are more flexible than polynomials and capable of capturing complicated trends over the whole spectral domain. Second, ISREA mimics the asymmetric square root loss and removes the need of a threshold. Finally, ISREA avoids the direct optimization of a non-convex loss function by iteratively updating prediction errors and refitting baselines. Through our extensive numerical experiments on a wide variety of spectra including simulated spectra, mineral spectra, and dialysate spectra, we show that ISREA is simple, fast, and can yield consistent and accurate baselines that preserve all the meaningful Raman peaks.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Applied Spectroscopy
Page Range / eLocation ID:
34 to 45
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-n shallow ReLU network is within n−1/2 of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. For stochastic gradient descent we obtain the same implicit bias result. We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength. 
    more » « less
  2. null (Ed.)
    Continuous advancements in LiDAR technology have enabled compelling wind turbulence measurements within the atmospheric boundary layer with range gates shorter than 20 m and sampling frequency of the order of 10 Hz. However, estimates of the radial velocity from the back-scattered laser beam are inevitably affected by an averaging process within each range gate, generally modeled as a convolution between the actual velocity projected along the LiDAR line-of-sight and a weighting function representing the energy distribution of the laser pulse along the range gate. As a result, the spectral energy of the turbulent velocity fluctuations is damped within the inertial sub-range with respective reduction of the velocity variance, and, thus, not allowing to take advantage of the achieved spatio-temporal resolution of the LiDAR technology. In this article, we propose to correct this turbulent energy damping on the LiDAR measurements by reversing the effect of a low-pass filter, which can be estimated directly from the LiDAR measurements. LiDAR data acquired from three different field campaigns are analyzed to describe the proposed technique, investigate the variability of the filter parameters and, for one dataset, assess the procedure for spectral LiDAR correction against sonic anemometer data. It is found that the order of the low-pass filter used for modeling the energy damping on the LiDAR velocity measurements has negligible effects on the correction of the second-order statistics of the wind velocity. In contrast, its cutoff frequency plays a significant role in the spectral correction encompassing the smoothing effects connected with the LiDAR gate length. 
    more » « less
  3. Abstract Raman spectroscopy is widely used to identify mineral and fluid inclusions in host crystals, as well as to calculate pressure-temperature (P-T) conditions with mineral inclusion elastic thermobarometry, for example quartz-in-garnet barometry (QuiG) and zircon-in-garnet thermometry (ZiG). For thermobarometric applications, P-T precision and accuracy depend crucially on the reproducibility of Raman peak position measurements. In this study, we monitored long-term instrument stability and varied analytical parameters to quantify peak position reproducibility for Raman spectra from quartz and zircon inclusions and reference crystals. Our ultimate goal was to determine the reproducibility of calculated inclusion pressures (“Pinc”) and entrapment pressures (“Ptrap”) or temperatures (“Ttrap”) by quantifying diverse analytical errors, as well as to identify optimal measurement conditions and provide a baseline for interlaboratory comparisons. Most tests emphasized 442 nm (blue) and 532 nm (green) laser sources, although repeated analysis of a quartz inclusion in garnet additionally used a 632.8 nm (red) laser. Power density was varied from <1 to >100 mW and acquisition time from 3 to 270s. A correction is proposed to suppress interference on the ~206 cm–1 peak in quartz spectra by a broad nearby (~220 cm–1) peak in garnet spectra. Rapid peak drift up to 1 cm–1/h occurred after powering the laser source, followed by minimal drift (<0.2 cm–1/h) for several hours thereafter. However, abrupt shifts in peak positions as large as 2–3 cm–1 sometimes occurred within periods of minutes, commonly either positively or negatively correlated to changes in room temperature. An external Hg-emission line (fluorescent light) can be observed in spectra collected with the green laser and shows highly correlated but attenuated directional shifts compared to quartz and zircon peaks. Varying power density and acquisition time did not affect Raman peak positions of either quartz or zircon grains, possibly because power densities at the levels of inclusions were low. However, some zircon inclusions were damaged at higher power levels of the blue laser source, likely because of laser-induced heating. Using a combination of 1, 2, or 3 peak positions for the ~128, ~206, and ~464 cm–1 peaks in quartz to calculate Pinc and Ptrap showed that use of the blue laser source results in the most reproducible Ptrap values for all methods (0.59 to 0.68 GPa at an assumed temperature of 450 °C), with precisions for a single method as small as ±0.03 GPa (2σ). Using the green and red lasers, some methods of calculating Ptrap produce nearly identical estimates as the blue laser with similarly good precision (±0.02 GPa for green laser, ±0.03 GPa for red laser). However, using 1- and 2-peak methods to calculate Ptrap can yield values that range from 0.52 ± 0.06 to 0.93 ± 0.16 GPa for the green laser, and 0.53 ± 0.08 GPa to 1.00 ± 0.45 GPa for the red laser. Semiquantitative calculations for zircon, assuming a typical error of ±0.25 cm–1 in the position of the ~1008 cm–1 peak, imply reproducibility in temperature (at an assumed pressure) of approximately ±65 °C. For optimal applications to elastic thermobarometry, analysts should: (1) delay data collection approximately one hour after laser startup, or leave lasers on; (2) collect a Hg-emission line simultaneously with Raman spectra when using a green laser to correct for externally induced shifts in peak positions; (3) correct for garnet interference on the quartz 206 cm–1 peak; and either (4a) use a short wavelength (blue) laser for quartz and zircon crystals for P-T calculations, but use very low-laser power (<12 mW) to avoid overheating and damage or (4b) use either the intermediate wavelength (green; quartz and zircon) or long wavelength (red; zircon) laser for P-T calculations, but restrict calculations to specific methods. Implementation of our recommendations should optimize reproducibility for elastic geothermobarometry, especially QuiG barometry and ZiG thermometry. 
    more » « less
  4. Abstract

    NASA's Mars 2020 and ESA's ExoMars will collect Raman measurements in dusty field conditions obscuring underlying rocks. This presents a challenge for remote Raman measurements at distances where mechanical or ablative sample cleaning is not straightforward. Historically, probing broad lithostratigraphic suites has been thwarted by the need for pristine targets and high‐quality spectra. We provide a means of identifying Raman spectra of common rock‐forming silicate, carbonate, and sulfate minerals under low signal‐to‐noise‐ratios, Mars‐like conditions using a convolutional neural network (CNN). The CNN was trained on the Machine Learning Raman Open Data set data set with 500,000+ Raman spectra of hand samples/powder mixtures (5,000+ spectra/mineral class). Diversity in sample microtopography, orientation, and crystallinity simulated varying laser focuses and spectral quality, and no traditional spectral preprocessing such as cosmic ray or baseline removal was employed. The CNN identified low‐intensity Raman scatterers (micas and amphiboles), mixed minerals, and distinguished between mineral endmembers with +99% success. We present among the first known implementations of “big data” machine learning using varied, high‐volume Raman spectral datasets. The pattern recognition abilities of CNNs can facilitate scientist Raman spectral interpretation on Earth and autonomous rover decision‐making on planets like Mars; increasing scientific yield, correcting human classification errors, reducing the need for thorough target dust removal during evaluative measurements, and streamlining the data communications pipeline—saving time and resources. This study examines an end‐to‐end development process for creating a deep learning algorithm sensitive to varieties of Raman spectra and provides guidelines for CNN model development at the interface of Raman spectroscopy, deep learning, and planetary science.

    more » « less
  5. Abstract

    We present the largest and most homogeneous collection of near-infrared (NIR) spectra of Type Ia supernovae (SNe Ia): 339 spectra of 98 individual SNe obtained as part of the Carnegie Supernova Project-II. These spectra, obtained with the FIRE spectrograph on the 6.5 m Magellan Baade telescope, have a spectral range of 0.8–2.5μm. Using this sample, we explore the NIR spectral diversity of SNe Ia and construct a template of spectral time series as a function of the light-curve-shape parameter, color stretchsBV. Principal component analysis is applied to characterize the diversity of the spectral features and reduce data dimensionality to a smaller subspace. Gaussian process regression is then used to model the subspace dependence on phase and light-curve shape and the associated uncertainty. Our template is able to predict spectral variations that are correlated withsBV, such as the hallmark NIR features: Mgiiat early times and theH-band break after peak. Using this template reduces the systematic uncertainties inK-corrections by ∼90% compared to those from the Hsiao template. These uncertainties, defined as the meanK-correction differences computed with the color-matched template and observed spectra, are on the level of 4 × 10−4mag on average. This template can serve as the baseline spectral energy distribution for light-curve fitters and can identify peculiar spectral features that might point to compelling physics. The results presented here will substantially improve future SN Ia cosmological experiments, for both nearby and distant samples.

    more » « less