skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: ISREA: An Efficient Peak-Preserving Baseline Correction Algorithm for Raman Spectra
A critical step in Raman spectroscopy is baseline correction. This procedure eliminates the background signals generated by residual Rayleigh scattering or fluorescence. Baseline correction procedures relying on asymmetric loss functions have been employed recently. They operate with a reduced penalty on positive spectral deviations that essentially push down the baseline estimates from invading Raman peak areas. However, their coupling with polynomial fitting may not be suitable over the whole spectral domain and can yield inconsistent baselines. Their requirement of the specification of a threshold and the non-convexity of the corresponding objective function further complicates the computation. Learning from their pros and cons, we have developed a novel baseline correction procedure called the iterative smoothing-splines with root error adjustment (ISREA) that has three distinct advantages. First, ISREA uses smoothing splines to estimate the baseline that are more flexible than polynomials and capable of capturing complicated trends over the whole spectral domain. Second, ISREA mimics the asymmetric square root loss and removes the need of a threshold. Finally, ISREA avoids the direct optimization of a non-convex loss function by iteratively updating prediction errors and refitting baselines. Through our extensive numerical experiments on a wide variety of spectra including simulated spectra, mineral spectra, and dialysate spectra, we show that ISREA is simple, fast, and can yield consistent and accurate baselines that preserve all the meaningful Raman peaks.  more » « less
Award ID(s):
1916174
PAR ID:
10281312
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Applied Spectroscopy
Volume:
75
Issue:
1
ISSN:
0003-7028
Page Range / eLocation ID:
34 to 45
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models, for example, GPT, Codex, and LLaMA. Yet almost all also use a categorical cross‐entropy (CCE) loss function for network optimization. Two problems with CCE are that (1) it computes loss over each word prediction one‐at‐a‐time, rather than evaluating a whole sentence, and (2) it requires a perfect prediction, leaving no room for partial credit for synonyms. In this paper, we extend our previous work on semantic similarity metrics to show a procedure for using semantic similarity as a loss function to alleviate this problem, and we evaluate this procedure in several settings in both metrics‐driven and human studies. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report improvement in the vast majority of conditions. 
    more » « less
  2. We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-n shallow ReLU network is within n−1/2 of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. For stochastic gradient descent we obtain the same implicit bias result. We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength. 
    more » « less
  3. We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For univariate regression, we show that the solution of training a width-n shallow ReLU network is within n1/2 of the function which fits the training data and whose difference from the initial function has the smallest 2-norm of the second derivative weighted by a curvature penalty that depends on the probability distribution that is used to initialize the network parameters. We compute the curvature penalty function explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. For stochastic gradient descent we obtain the same implicit bias result. We obtain a similar result for different activation functions. For multivariate regression we show an analogous result, whereby the second derivative is replaced by the Radon transform of a fractional Laplacian. For initialization schemes that yield a constant penalty function, the solutions are polyharmonic splines. Moreover, we show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength. 
    more » « less
  4. Abstract Aqueous electrolytes are promising in large-scale energy storage applications due to intrinsic low toxicity, non-flammability, high ion conductivity, and low cost. However, pure water’s narrow electrochemical stability window (ESW) limits the energy density of aqueous rechargeable batteries. Water-in-salt electrolytes (WiSE) proposal has expanded the ESW to over 3 V by changing electrolyte solvation structure. The limited solubility and WIS electrolyte crystallization have been persistent concerns for imide-based lithium salts. Asymmetric lithium salts compensate for the above flaws. However, studying the solvation structure of asymmetric salt aqueous electrolytes is rare. Here, we applied small-angle x-ray scattering (SAXS) and Raman spectroscope to reveal the solvation structure of imide-based asymmetric lithium salts. The SAXS spectra show the blue shifts of the lowerqpeak with decreased intensity as the increasing of concentration, indicating a decrease in the average distance between solvated anions. Significantly, an exponential decrease in the d-spacing as a function of concentration was observed. In addition, we also applied the Raman spectroscopy technique to study the evolutions of solvent-separated ion pairs (SSIPs), contacted ion pairs (CIPs), and aggregate ions (AGGs) in the solvation structure of asymmetric salt solutions. 
    more » « less
  5. A comprehensive method is provided for smoothing noisy, irregularly sampled data with non-Gaussian noise using smoothing splines. We demonstrate how the spline order and tension parameter can be chosen a priori from physical reasoning. We also show how to allow for non-Gaussian noise and outliers that are typical in global positioning system (GPS) signals. We demonstrate the effectiveness of our methods on GPS trajectory data obtained from oceanographic floating instruments known as drifters. 
    more » « less