skip to main content


Title: Structure Assisted NMF Methods for Separation of Degenerate Mixture Data with Application to NMR Spectroscopy
In this paper, we develop structure assisted nonnegative matrix factorization (NMF) methods for blind source separation of degenerate data. The motivation originates from nuclear magnetic resonance (NMR) spectroscopy, where a multiple mixture NMR spectra are recorded to identify chemical compounds with similar structures. Consider the linear mixing model (LMM), we aim to identify the chemical compounds involved when the mixing process is known to be nearly singular. We first consider a class of data with dominant interval(s) (DI) where each of source signals has dominant peaks over others. Besides, a nearly singular mixing process produces degenerate mixtures. The DI condition implies clustering structures in the data points. Hence, the estimation of the mixing matrix could be achieved by data clustering. Due to the presence of the noise and the degeneracy of the data, a small deviation in the estimation may introduce errors in the output. To resolve this problem and improve robustness of the separation, methods are developed in two aspects. One is to find better estimation of the mixing matrix by allowing a constrained perturbation to the clustering output, and it can be achieved by a quadratic programming. The other is to seek sparse source signals by exploiting the DI condition, and it solves an 1 optimization. If no source information is available, we propose to adopt the nonnegative matrix factorization approach by incorporating the matrix structure (parallel columns of the mixing matrix) into the cost function and develop multiplicative iteration rules for the numerical solutions. We present experimental results of NMR data to show the performance and reliability of the method in the applications arising in NMR spectroscopy.  more » « less
Award ID(s):
1924548
NSF-PAR ID:
10339326
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International journal of mathematics and computation
Volume:
33
Issue:
1
ISSN:
0974-570X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. End-member mixing analysis (EMMA) is a method of interpreting stream water chemistry variations and is widely used for chemical hydrograph separation. It is based on the assumption that stream water is a conservative mixture of varying contributions from well-characterized source solutions (end-members). These end-members are typically identified by collecting samples of potential end-member source waters from within the watershed and comparing these to the observations. Here we introduce a complementary data-driven method (convex hull end-member mixing analysis – CHEMMA) to infer the end-member compositions and their associated uncertainties from the stream water observations alone. The method involves two steps. The first uses convex hull nonnegative matrix factorization (CH-NMF) to infer possible end-member compositions by searching for a simplex that optimally encloses the stream water observations. The second step uses constrained K-means clustering (COP-KMEANS) to classify the results from repeated applications of CH-NMF and analyzes the uncertainty associated with the algorithm. In an example application utilizing the 1986 to 1988 Panola Mountain Research Watershed dataset, CHEMMA is able to robustly reproduce the three field-measured end-members found in previous research using only the stream water chemical observations. CHEMMA also suggests that a fourth and a fifth end-member can be (less robustly) identified. We examine uncertainties in end-member identification arising from non-uniqueness, which is related to the data structure, of the CH-NMF solutions, and from the number of samples using both real and synthetic data. The results suggest that the mixing space can be identified robustly when the dataset includes samples that contain extremely small contributions of one end-member, i.e., samples containing extremely large contributions from one end-member are not necessary but do reduce uncertainty about the end-member composition. 
    more » « less
  2. There are currently no methods for the acquisition of ultra-wideline (UW) solid-state NMR spectra under static conditions that enable reliable separation and resolution of overlapping powder patterns arising from magnetically distinct nuclei. This stands in contrast to the variety of techniques available for spin-1/2 or half-integer quadrupolar nuclei with narrow central transition patterns under magic-angle spinning (MAS). Resolution of overlapping signals is routinely achieved in MRI and solution-state NMR by exploiting relaxation differences between nonequivalent sites. Preliminary studies of relaxation assisted separation (RAS) for separating overlapping UWNMR patterns using pseudo-inverse Laplace Transforms have reported two-dimensional spectra featuring relaxation rates correlated to NMR interaction frequencies. However, RAS methods are inherently sensitive to experimental noise, and require that relaxation rates associated with overlapped patterns be significantly different from one another. Herein, principal component analysis (PCA) denoising is implemented to increase the signal-to-noise ratios of the relaxation datasets and RAS routines are stabilized with truncated singular value decomposition (TSVD) and elastic net (EN) regularization to resolve overlapped patterns with a larger tolerance for differences in relaxation rates. We extend these methods for improved pattern resolution by utilizing 3D frequency- R 1 – R 2 correlation spectra. Synthetic and experimental datasets, including 35 Cl ( I = 3/2), 2 H ( I = 1), and 14 N ( I = 1) NMR of organic and biological compounds, are explored with both regularized 2D RAS and 3D RAS; comparison of these data reveal improved resolution in the latter case. These methods have great potential for separating overlapping powder patterns under both static and MAS conditions. 
    more » « less
  3. Unsupervised mixture learning (UML) aims at identifying linearly or nonlinearly mixed latent components in a blind manner. UML is known to be challenging: Even learning linear mixtures requires highly nontrivial analytical tools, e.g., independent component analysis or nonnegative matrix factorization. In this work, the post-nonlinear (PNL) mixture model---where {\it unknown} element-wise nonlinear functions are imposed onto a linear mixture---is revisited. The PNL model is widely employed in different fields ranging from brain signal classification, speech separation, remote sensing, to causal discovery. To identify and remove the unknown nonlinear functions, existing works often assume different properties on the latent components (e.g., statistical independence or probability-simplex structures). This work shows that under a carefully designed UML criterion, the existence of a nontrivial {\it null space} associated with the underlying mixing system suffices to guarantee identification/removal of the unknown nonlinearity. Compared to prior works, our finding largely relaxes the conditions of attaining PNL identifiability, and thus may benefit applications where no strong structural information on the latent components is known. A finite-sample analysis is offered to characterize the performance of the proposed approach under realistic settings. To implement the proposed learning criterion, a block coordinate descent algorithm is proposed. A series of numerical experiments corroborate our theoretical claims. 
    more » « less
  4. Abstract

    To implement equilibrium hard‐modeling of spectroscopic titration data, the analyst must make a variety of crucial data processing choices that address negative absorbance and molar absorptivity values. The efficacy of three such methodological options is evaluated via high‐throughput Monte Carlo simulations, root‐mean‐square error surface mapping, and two mathematical theorems. Accuracy of the calculated binding constant values constitutes the key figure of merit used to compare different data analysis approaches. First, using singular value decomposition to filter the raw absorbance data prior to modeling often reduces the number of negative values involved but has little effect on the calculated binding constant despite its ability to address spectrometer noise. Second, both truncation of negative molar absorptivity values and the fast nonnegative least squares algorithms are superior to unconstrained regression because they avoid local minima; however, they introduce bias into the calculated binding constants in the presence of negative baseline offsets. Finally, we establish two theorems showing that negative values are best addressed when all the chemical solutions leading to the raw absorbance data are the result of mixing exactly two distinct stock solutions. This allows the raw absorbance data to be shifted up, eliminating negative baseline offsets, without affecting the concentration matrix, residual matrix, or calculated binding constants. Otherwise, the data cannot be safely upshifted. A comprehensive protocol for analyzing experimental absorbance datasets with is included.

     
    more » « less
  5. The study of human chemical communication benefits from comparative perspectives that relate humans, conceptually and empirically, to other primates. All major primate groups rely on intraspecific chemosignals, but strepsirrhines present the greatest diversity and specialization, providing a rich framework for examining design, delivery and perception. Strepsirrhines actively scent mark, possess a functional vomeronasal organ, investigate scents via olfactory and gustatory means, and are exquisitely sensitive to chemically encoded messages. Variation in delivery, scent mixing and multimodality alters signal detection, longevity and intended audience. Based on an integrative, 19-species review, the main scent source used (excretory versus glandular) differentiates nocturnal from diurnal or cathemeral species, reflecting differing socioecological demands and evolutionary trajectories. Condition-dependent signals reflect immutable (species, sex, identity, genetic diversity, immunity and kinship) and transient (health, social status, reproductive state and breeding history) traits, consistent with socio-reproductive functions. Sex reversals in glandular elaboration, marking rates or chemical richness in female-dominant species implicate sexual selection of olfactory ornaments in both sexes. Whereas some compounds may be endogenously produced and modified (e.g. via hormones), microbial analyses of different odorants support the fermentation hypothesis of bacterial contribution. The intimate contexts of information transfer and varied functions provide important parallels applicable to olfactory communication in humans. This article is part of the Theo Murphy meeting issue ‘Olfactory communication in humans’. 
    more » « less