skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Characterizing sediment sources by non-negative matrix factorization of detrital geochronological data
This paper explores an inverse approach to the problem of characterizing sediment sources' (“source” samples) age distributions based on samples from a particular depocenter (“sink” samples) using non-negative matrix factorization (NMF). It also outlines a method to determine the optimal number of sources to factorize from a set of sink samples (i.e., the optimum factorization rank). We demonstrate the power of this method by generating sink samples as random mixtures of known sources, factorizing them, and recovering the number of known sources, their age distributions, and the weighting functions used to generate the sink samples. Sensitivity testing indicates that similarity between factorized and known sources is positively correlated to 1) the number of sink samples, 2) the dissimilarity among sink samples, and 3) sink sample size. Specifically, the algorithm yields consistent, close similarity between factorized and known sources when the number of sink samples is more than ∼3 times the number of source samples, sink data sets are internally dissimilar (cross-correlation coefficient range >0.3, Kuiper V value range >0.35), and sink samples are well-characterized (>150–225 data points). However, similarity between known and factorized sources can be maintained while decreasing some of these variables if other variables are increased. Factorization of three empirical detrital zircon U–Pb data sets from the Book Cliffs, the Grand Canyon, and the Gulf of Mexico yields plausible source age distributions and weights. Factorization of the Book Cliffs data set yields five sources very similar to those recently independently proposed as the primary sources for Book Cliffs strata; confirming the utility of the NMF approach. The Grand Canyon data set exemplifies two general considerations when applying the NMF algorithm. First, although the NMF algorithm is able to identify source age distribution, additional geological details are required to discriminate between primary or recycled sources. Second, the NMF algorithm will identify the most basic elements of the mixed sink samples and so may subdivide sources that are themselves heterogeneous mixtures of more basic elements into those basic elements. Finally, application to a large Gulf of Mexico data set highlights the increased contribution from Appalachian sources during Cretaceous and Holocene time, potentially attributable to drainage reorganization. Although the algorithm reproduces known sources and yields reasonable sources for empirical data sets, inversions are inherently non-unique. Consequently, the results of NMF and their interpretations should be evaluated in light of independent geological evidence. The NMF algorithm is provided both as MATLAB code and a stand-alone graphical user interface for Windows and macOS (.exe and .app) along with all data sets discussed in this contribution.  more » « less
Award ID(s):
1742952
PAR ID:
10092509
Author(s) / Creator(s):
Date Published:
Journal Name:
Earth and planetary science letters
Volume:
512
Issue:
15
ISSN:
0012-821X
Page Range / eLocation ID:
46-58
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Orthoquartzite detrital source regions in the Cordilleran interior yield clast populations with distinct spectra of paleomagnetic inclinations and detrital zircon ages that can be used to trace the provenance of gravels deposited along the western margin of the Cordilleran orogen. An inventory of characteristic remnant magnetizations (CRMs) from >700 sample cores from orthoquartzite source regions defines a low-inclination population of Neoproterozoic–Paleozoic age in the Mojave Desert–Death Valley region (and in correlative strata in Sonora, Mexico) and a moderate- to high-inclination population in the 1.1 Ga Shinumo Formation in eastern Grand Canyon. Detrital zircon ages can be used to distinguish Paleoproterozoic to mid-Mesoproterozoic (1.84–1.20 Ga) clasts derived from the central Arizona highlands region from clasts derived from younger sources that contain late Mesoproterozoic zircons (1.20–1.00 Ga). Characteristic paleomagnetic magnetizations were measured in 44 densely cemented orthoquartzite clasts, sampled from lower Miocene portions of the Sespe Formation in the Santa Monica and Santa Ana mountains and from a middle Eocene section in Simi Valley. Miocene Sespe clast inclinations define a bimodal population with modes near 15° and 45°. Eight samples from the steeper Miocene mode for which detrital zircon spectra were obtained all have spectra with peaks at 1.2, 1.4, and 1.7 Ga. One contains Paleozoic and Mesozoic peaks and is probably Jurassic. The remaining seven define a population of clasts with the distinctive combination of moderate to high inclination and a cosmopolitan age spectrum with abundant grains younger than 1.2 Ga. The moderate to high inclinations rule out a Mojave Desert–Death Valley or Sonoran region source population, and the cosmopolitan detrital zircon spectra rule out a central Arizona highlands source population. The Shinumo Formation, presently exposed only within a few hundred meters elevation of the bottom of eastern Grand Canyon, thus remains the only plausible, known source for the moderate- to high-inclination clast population. If so, then the Upper Granite Gorge of the eastern Grand Canyon had been eroded to within a few hundred meters of its current depth by early Miocene time (ca. 20 Ma). Such an unroofing event in the eastern Grand Canyon region is independently confirmed by (U-Th)/He thermochronology. Inclusion of the eastern Grand Canyon region in the Sespe drainage system is also independently supported by detrital zircon age spectra of Sespe sandstones. Collectively, these data define a mid-Tertiary, SW-flowing “Arizona River” drainage system between the rapidly eroding eastern Grand Canyon region and coastal California. 
    more » « less
  2. null (Ed.)
    Abstract The provocative hypothesis that the Shinumo Sandstone in the depths of Grand Canyon was the source for clasts of orthoquartzite in conglomerate of the Sespe Formation of coastal California, if verified, would indicate that a major river system flowed southwest from the Colorado Plateau to the Pacific Ocean prior to opening of the Gulf of California, and would imply that Grand Canyon had been carved to within a few hundred meters of its modern depth at the time of this drainage connection. The proposed Eocene Shinumo-Sespe connection, however, is not supported by detrital zircon nor paleomagnetic-inclination data and is refuted by thermochronology that shows that the Shinumo Sandstone of eastern Grand Canyon was >60 °C (∼1.8 km deep) and hence not incised at this time. A proposed 20 Ma (Miocene) Shinumo-Sespe drainage connection based on clasts in the Sespe Formation is also refuted. We point out numerous caveats and non-unique interpretations of paleomagnetic data from clasts. Further, our detrital zircon analysis requires diverse sources for Sespe clasts, with better statistical matches for the four “most-Shinumo-like” Sespe clasts with quartzites of the Big Bear Group and Ontario Ridge metasedimentary succession of the Transverse Ranges, Horse Thief Springs Formation from Death Valley, and Troy Quartzite of central Arizona. Diverse thermochronologic and geologic data also refute a Miocene river pathway through western Grand Canyon and Grand Wash trough. Thus, Sespe clasts do not require a drainage connection from Grand Canyon or the Colorado Plateau and provide no constraints for the history of carving of Grand Canyon. Instead, abundant evidence refutes the “old” (70–17 Ma) Grand Canyon models and supports a <6 Ma Grand Canyon. 
    more » « less
  3. Abstract. End-member mixing analysis (EMMA) is a method of interpreting stream water chemistry variations and is widely used for chemical hydrograph separation. It is based on the assumption that stream water is a conservative mixture of varying contributions from well-characterized source solutions (end-members). These end-members are typically identified by collecting samples of potential end-member source waters from within the watershed and comparing these to the observations. Here we introduce a complementary data-driven method (convex hull end-member mixing analysis – CHEMMA) to infer the end-member compositions and their associated uncertainties from the stream water observations alone. The method involves two steps. The first uses convex hull nonnegative matrix factorization (CH-NMF) to infer possible end-member compositions by searching for a simplex that optimally encloses the stream water observations. The second step uses constrained K-means clustering (COP-KMEANS) to classify the results from repeated applications of CH-NMF and analyzes the uncertainty associated with the algorithm. In an example application utilizing the 1986 to 1988 Panola Mountain Research Watershed dataset, CHEMMA is able to robustly reproduce the three field-measured end-members found in previous research using only the stream water chemical observations. CHEMMA also suggests that a fourth and a fifth end-member can be (less robustly) identified. We examine uncertainties in end-member identification arising from non-uniqueness, which is related to the data structure, of the CH-NMF solutions, and from the number of samples using both real and synthetic data. The results suggest that the mixing space can be identified robustly when the dataset includes samples that contain extremely small contributions of one end-member, i.e., samples containing extremely large contributions from one end-member are not necessary but do reduce uncertainty about the end-member composition. 
    more » « less
  4. We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Common techniques like token indexes fail for semantic search since similar elements may be unrelated at the character level. Further, verifying candidates is expensive (cubic versus linear for syntactic overlap), calling for highly selective filters. We propose Koios, the first exact and efficient algorithm for semantic overlap search. Koios leverages sophisticated filters to minimize the number of required graph-matching calculations. Our experiments show that for medium to large sets less than 5% of the candidate sets need verification, and more than half of those sets are further pruned without requiring the expensive graph matching. We show the efficiency of our algorithm on four real datasets and demonstrate the improved result quality of semantic over vanilla set similarity search. 
    more » « less
  5. Abstract We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We derive a backpropagation optimization scheme that allows us to frame hierarchical NMF as a neural network. We test Neural NMF on a synthetic hierarchical dataset, the 20 Newsgroups dataset, and the MyLymeData symptoms dataset. Numerical results demonstrate that Neural NMF outperforms other hierarchical NMF methods on these data sets and offers better learned hierarchical structure and interpretability of topics. 
    more » « less