skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Deterministic Self-Organizing Map Approach and its Application on Satellite Data based Cloud Type Classification
A self-organizing map (SOM) is a type of competitive artificial neural network, which projects the high- dimensional input space of the training samples into a low-dimensional space with the topology relations preserved. This makes SOMs supportive of organizing and visualizing complex data sets and have been pervasively used among numerous disciplines with different applications. Notwithstanding its wide applications, the self-organizing map is perplexed by its inherent randomness, which produces dissimilar SOM patterns even when being trained on identical training samples with the same parameters every time, and thus causes usability concerns for other domain practitioners and precludes more potential users from exploring SOM based applications in a broader spectrum. Motivated by this practical concern, we propose a deterministic approach as a supplement to the standard self-organizing map. In accordance with the theoretical design, the experimental results with satellite cloud data demonstrate the effective and efficient organization as well as simplification capabilities of the proposed approach.  more » « less
Award ID(s):
1730250
PAR ID:
10110748
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of 2018 IEEE International Conference on Big Data (BigData 2018)
Page Range / eLocation ID:
2027 to 2034
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Aims.We present a method for refining photometric redshift galaxy catalogs based on a comparison of their color-space matching with overlapping spectroscopic calibration data. We focus on cases where photometric redshifts (photo-z) are estimated empirically. Identifying galaxies that are poorly represented in spectroscopic data is crucial, as their photo-zmay be unreliable due to extrapolation beyond the training sample. Methods.Our approach uses a self-organizing map (SOM) to project a multidimensional parameter space of magnitudes and colors onto a 2D manifold, allowing us to analyze the resulting patterns as a function of various galaxy properties. Using SOM, we compared the Kilo-Degree Survey’s bright galaxy sample (KiDS-Bright), limited tor < 20 mag, with various spectroscopic samples, including the Galaxy And Mass Assembly (GAMA). Results.Our analysis reveals that GAMA tends to underrepresent KiDS-Bright at its faintest (r ≳ 19.5) and highest-redshift (z ≳ 0.4) ranges; however, no strong trends are seen in terms of color or stellar mass. By incorporating additional spectroscopic data from the SDSS, 2dF, and early DESI, we identified SOM cells where the photo-zvalues are estimated suboptimally. We derived a set of SOM-based criteria to refine the photometric sample and improve photo-zstatistics. For the KiDS-Bright sample, this improvement is modest, namely, it excludes the least represented 20% of the sample reduces photo-zscatter by less than 10%. Conclusions.We conclude that GAMA, used for KiDS-Bright photo-ztraining, is sufficiently representative for reliable redshift estimation across most of the color space. Future spectroscopic data from surveys such as DESI should be better suited for exploiting the full improvement potential of our method. 
    more » « less
  2. We extend the self-organizing mapping algorithm to the problem of visualizing data on Grassmann manifolds. In this setting, a collection of k points in n-dimensions is represented by a k-dimensional subspace, e.g., via the singular value or QR-decompositions. Data assembled in this way is challenging to visualize given abstract points on the Grassmannian do not reside in Euclidean space. The extension of the SOM algorithm to this geometric setting only requires that distances between two points can be measured and that any given point can be moved towards a presented pattern. The similarity between two points on the Grassmannian is measured in terms of the principal angles between subspaces, e.g., the chordal distance. Further, we employ a formula for moving one subspace towards another along the shortest path, i.e., the geodesic between two points on the Grassmannian. This enables a faithful implementation of the SOM approach for visualizing data consisting of k-dimensional subspaces of n-dimensional Euclidean space. We illustrate the resulting algorithm on a hyperspectral imaging application. 
    more » « less
  3. Abstract Biofilm formation is a major cause of hospital‐acquired infections. Research into biofilm‐resistant materials is therefore critical to reduce the frequency of these events. Polymer microarrays offer a high‐throughput approach to enable the efficient discovery of novel biofilm‐resistant polymers. Herein, bacterial attachment and surface chemistry are studied for a polymer microarray to improve the understanding ofPseudomonas aeruginosabiofilm formation on a diverse set of polymeric surfaces. The relationships between time‐of‐flight secondary ion mass spectrometry (ToF‐SIMS) data and biofilm formation are analyzed using linear multivariate analysis (partial least squares [PLS] regression) and a nonlinear self‐organizing map (SOM). The SOM models revealed several combinations of fragment ions that are positively or negatively associated with bacterial biofilm formation, which are not identified by PLS. With these insights, a second PLS model is calculated, in which interactions between key fragments (identified by the SOM) are explicitly considered. Inclusion of these terms improved the PLS model performance and shows that, without such terms, certain key fragment ions correlated with bacterial attachment may not be identified. The chemical insights provided by the combination of PLS regression and SOM will be useful for the design of materials that support negligible pathogen attachment. 
    more » « less
  4. Abstract Stable isotope‐based reconstructions of past ocean salinity and hydroclimate depend on accurate, regionally constrained relationships between the stable oxygen isotopic composition of seawater (δ18Osw) and salinity in the surface ocean. An increasing number of δ18Oswobservations suggest greater spatial variability in this relationship than previously considered, highlighting the need to reassess these relationships on a global scale. Here, we use available, paired δ18Oswand salinity data (N = 11,119) to create global interpolations of each variable. We then use a self‐organizing map, a specialized form of machine learning, to define 19 regions with unique δ18Osw‐salinity relationships in the surface (<50 m) ocean. Inclusion of atmospheric moisture‐related variables and oceanic tracer data in additional self‐organizing map experiments indicates global surface δ18Osw‐salinity spatial patterns are strongly forced by the atmosphere, as the SOM spatial output is highly similar despite no overlapping input data. Our approach is a useful update to the previously delimited regions, and highlights the utility of neural network pattern extraction in spatiotemporally sparse data sets. 
    more » « less
  5. Abstract. The Great Plains and southwest regions of the US are highly vulnerable to precipitation-related climate disasters such as droughts and floods. In this study, we propose a self-organizing map–analogue (SOMA) approach to empirically quantify the contribution of atmospheric moist circulation (mid-tropospheric geopotential and column moisture transport) to the regional precipitation anomalies, variability, and multi-decadal changes. Our results indicate that moist circulation contributes significantly to short-term precipitation variability, accounting for 54 %–61 % of the total variance in these regions, though these contributions vary significantly across seasons. As indicated in previous research, Pacific Decadal Oscillation (PDO) is one of the major climate modes influencing the long-term multi-decadal variation in precipitation. By contrasting three multi-decadal periods (1950–1976, 1977–1998, 1999–2021) with shifting PDO phases and linking the phase shift to self-organizing map (SOM) nodes, we found that circulation changes contribute considerably to the multi-decadal changes in precipitation anomaly in terms of the mean and days of dry and wet extremes, especially for the southern Great Plains (GP) and southwest. However, these circulation-induced changes are not totally related to the PDO phase shift (mostly less than half) since internal variability or anthropogenically induced changes in circulation can also be potential contributors. Our approach improves upon flow analogue and SOM-based methods and provides insights into the contribution of atmospheric circulation to regional precipitation anomalies and variability. 
    more » « less