skip to main content

Title: Structure Assisted NMF Methods for Separation of Degenerate Mixture Data with Application to NMR Spectroscopy
In this paper, we develop structure assisted nonnegative matrix factorization (NMF) methods for blind source separation of degenerate data. The motivation originates from nuclear magnetic resonance (NMR) spectroscopy, where a multiple mixture NMR spectra are recorded to identify chemical compounds with similar structures. Consider the linear mixing model (LMM), we aim to identify the chemical compounds involved when the mixing process is known to be nearly singular. We first consider a class of data with dominant interval(s) (DI) where each of source signals has dominant peaks over others. Besides, a nearly singular mixing process produces degenerate mixtures. The DI condition implies clustering structures in the data points. Hence, the estimation of the mixing matrix could be achieved by data clustering. Due to the presence of the noise and the degeneracy of the data, a small deviation in the estimation may introduce errors in the output. To resolve this problem and improve robustness of the separation, methods are developed in two aspects. One is to find better estimation of the mixing matrix by allowing a constrained perturbation to the clustering output, and it can be achieved by a quadratic programming. The other is to seek sparse source signals by exploiting the DI condition, and it solves an 1 optimization. If no source more » information is available, we propose to adopt the nonnegative matrix factorization approach by incorporating the matrix structure (parallel columns of the mixing matrix) into the cost function and develop multiplicative iteration rules for the numerical solutions. We present experimental results of NMR data to show the performance and reliability of the method in the applications arising in NMR spectroscopy. « less
Authors:
; ;
Award ID(s):
1924548
Publication Date:
NSF-PAR ID:
10339326
Journal Name:
International journal of mathematics and computation
Volume:
33
Issue:
1
ISSN:
0974-570X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. End-member mixing analysis (EMMA) is a method of interpreting stream water chemistry variations and is widely used for chemical hydrograph separation. It is based on the assumption that stream water is a conservative mixture of varying contributions from well-characterized source solutions (end-members). These end-members are typically identified by collecting samples of potential end-member source waters from within the watershed and comparing these to the observations. Here we introduce a complementary data-driven method (convex hull end-member mixing analysis – CHEMMA) to infer the end-member compositions and their associated uncertainties from the stream water observations alone. The method involves two steps. The first uses convex hull nonnegative matrix factorization (CH-NMF) to infer possible end-member compositions by searching for a simplex that optimally encloses the stream water observations. The second step uses constrained K-means clustering (COP-KMEANS) to classify the results from repeated applications of CH-NMF and analyzes the uncertainty associated with the algorithm. In an example application utilizing the 1986 to 1988 Panola Mountain Research Watershed dataset, CHEMMA is able to robustly reproduce the three field-measured end-members found in previous research using only the stream water chemical observations. CHEMMA also suggests that a fourth and a fifth end-member can be (less robustly) identified. We examine uncertainties inmore »end-member identification arising from non-uniqueness, which is related to the data structure, of the CH-NMF solutions, and from the number of samples using both real and synthetic data. The results suggest that the mixing space can be identified robustly when the dataset includes samples that contain extremely small contributions of one end-member, i.e., samples containing extremely large contributions from one end-member are not necessary but do reduce uncertainty about the end-member composition.« less
  2. There are currently no methods for the acquisition of ultra-wideline (UW) solid-state NMR spectra under static conditions that enable reliable separation and resolution of overlapping powder patterns arising from magnetically distinct nuclei. This stands in contrast to the variety of techniques available for spin-1/2 or half-integer quadrupolar nuclei with narrow central transition patterns under magic-angle spinning (MAS). Resolution of overlapping signals is routinely achieved in MRI and solution-state NMR by exploiting relaxation differences between nonequivalent sites. Preliminary studies of relaxation assisted separation (RAS) for separating overlapping UWNMR patterns use pseudo–inverse Laplace Transforms have reported two-dimensional spectra featuring relaxation rates correlated to NMR interaction frequencies. However, RAS methods are inherently sensitive to experimental noise, and require that relaxation rates associated with overlapped patterns be significantly different from one another. Herein, principal component analysis (PCA) denoising is implemented to increase the signal-to-noise ratios of the relaxation datasets and RAS routines are stabilized with truncated singular value decomposition (TSVD) and elastic net (EN) regularization to resolve overlapped patterns with a larger tolerance for differences in relaxation rates. We extend these methods for improved pattern resolution by utilizing 3D frequency-R1-R2 correlation spectra. Synthetic and experimental datasets, including 35Cl (I = 3/2), 2H (I =more »1), and 14N (I = 1) NMR of organic and biological compounds, are explored with both regularized 2D RAS and 3D RAS; comparison of these data reveal improved resolution in the latter case. These methods have great potential for separating overlapping powder patterns under both static and MAS conditions.« less
  3. Unsupervised mixture learning (UML) aims at identifying linearly or nonlinearly mixed latent components in a blind manner. UML is known to be challenging: Even learning linear mixtures requires highly nontrivial analytical tools, e.g., independent component analysis or nonnegative matrix factorization. In this work, the post-nonlinear (PNL) mixture model---where {\it unknown} element-wise nonlinear functions are imposed onto a linear mixture---is revisited. The PNL model is widely employed in different fields ranging from brain signal classification, speech separation, remote sensing, to causal discovery. To identify and remove the unknown nonlinear functions, existing works often assume different properties on the latent components (e.g., statistical independence or probability-simplex structures). This work shows that under a carefully designed UML criterion, the existence of a nontrivial {\it null space} associated with the underlying mixing system suffices to guarantee identification/removal of the unknown nonlinearity. Compared to prior works, our finding largely relaxes the conditions of attaining PNL identifiability, and thus may benefit applications where no strong structural information on the latent components is known. A finite-sample analysis is offered to characterize the performance of the proposed approach under realistic settings. To implement the proposed learning criterion, a block coordinate descent algorithm is proposed. A series of numericalmore »experiments corroborate our theoretical claims.« less
  4. Abstract

    Nuclear magnetic resonance (NMR) spectroscopy is a powerful tool for obtaining precise information about the local bonding of materials, but difficult to interpret without a well-vetted dataset of reference spectra. The ability to predict NMR parameters and connect them to three-dimensional local environments is critical for understanding more complex, long-range interactions. New computational methods have revealed structural information available from29Si solid-state NMR by generating computed reference spectra for solids. Such predictions are useful for the identification of new silicon-containing compounds, and serve as a starting point for determination of the local environments present in amorphous structures. In this study, we have used 42 silicon sites as a benchmarking set to compare experimentally reported29Si solid-state NMR spectra with those computed by CASTEP-NMR and Vienna Ab Initio Simulation Program (VASP). Data-driven approaches enable us to identify the source of discrepancies across a range of experimental and computational results. The information from NMR (in the form of an NMR tensor) has been validated, and in some cases corrected, in an effort to catalog these for the local spectroscopy database infrastructure (LSDI), where over 10,00029Si NMR tensors for crystalline materials have been computed. Knowledge of specific tensor values can serve as the basismore »for executing NMR experiments with precision, optimizing conditions to capture the elements accurately. The ability to predict and compare experimental observables from a wide range of structures can aid researchers in their chemical assignments and structure determination, since the computed values enables the extension beyond tables of typical chemical shift (or shielding) ranges.

    « less
  5. The study of human chemical communication benefits from comparative perspectives that relate humans, conceptually and empirically, to other primates. All major primate groups rely on intraspecific chemosignals, but strepsirrhines present the greatest diversity and specialization, providing a rich framework for examining design, delivery and perception. Strepsirrhines actively scent mark, possess a functional vomeronasal organ, investigate scents via olfactory and gustatory means, and are exquisitely sensitive to chemically encoded messages. Variation in delivery, scent mixing and multimodality alters signal detection, longevity and intended audience. Based on an integrative, 19-species review, the main scent source used (excretory versus glandular) differentiates nocturnal from diurnal or cathemeral species, reflecting differing socioecological demands and evolutionary trajectories. Condition-dependent signals reflect immutable (species, sex, identity, genetic diversity, immunity and kinship) and transient (health, social status, reproductive state and breeding history) traits, consistent with socio-reproductive functions. Sex reversals in glandular elaboration, marking rates or chemical richness in female-dominant species implicate sexual selection of olfactory ornaments in both sexes. Whereas some compounds may be endogenously produced and modified (e.g. via hormones), microbial analyses of different odorants support the fermentation hypothesis of bacterial contribution. The intimate contexts of information transfer and varied functions provide important parallels applicable to olfactory communicationmore »in humans. This article is part of the Theo Murphy meeting issue ‘Olfactory communication in humans’.« less