skip to main content


Title: Cluster Appearance Glyphs: A Methodology for Illustrating High-Dimensional Data Patterns in 2-D Data Layouts
Two-dimensional space embeddings such as Multi-Dimensional Scaling (MDS) are a popular means to gain insight into high-dimensional data relationships. However, in all but the simplest cases these embeddings suffer from significant distortions, which can lead to misinterpretations of the high-dimensional data. These distortions occur both at the global inter-cluster and the local intra-cluster levels. The former leads to misinterpretation of the distances between the various N-D cluster populations, while the latter hampers the appreciation of their individual shapes and composition, which we call cluster appearance. The distortion of cluster appearance incurred in the 2-D embedding is unavoidable since such low-dimensional embeddings always come at the loss of some of the intra-cluster variance. In this paper, we propose techniques to overcome these limitations by conveying the N-D cluster appearance via a framework inspired by illustrative design. Here we make use of Scagnostics which offers a set of intuitive feature descriptors to describe the appearance of 2-D scatterplots. We extend the Scagnostics analysis to N-D and then devise and test via crowd-sourced user studies a set of parameterizable texture patterns that map to the various Scagnostics descriptors. Finally, we embed these N-D Scagnostics-informed texture patterns into shapes derived from N-D statistics to yield what we call Cluster Appearance Glyphs. We demonstrate our framework with a dataset acquired to analyze program execution times in file systems.  more » « less
Award ID(s):
1650499
NSF-PAR ID:
10317224
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Information
Volume:
13
Issue:
1
ISSN:
2078-2489
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recently a new research field of quasi-one-dimensional (1D) van der Waals quantummaterials has emerged from earlier work on low-dimensional systems [1-2]. The quasi-1D van der Waalsmaterials have 1D motifs in their crystal structure [1]. Many of these materials reveal strongly correlatedphenomena such as charge density waves (CDW) [1-2]. The CDW phase is a periodic modulation of theelectronic charge density, accompanied by distortions in the underlying crystal lattice. Potential uses for CDWmaterials include memory storage and oscillators [3]. Raman spectroscopy can identify the CDW transitions todifferent phases via the appearance of phonon peaks due to emerging superstructure or the disappearance ofcertain peaks due to the loss of translation symmetry in the crystal lattice [3]. In this presentation, we report theresults of the angle and temperature-dependent Raman scattering spectroscopy investigation of themechanically exfoliated nanowires of the quasi-1D Nb van der Waals material. It is known that Nb forms in atetragonal crystal structure with space group 124 (P4/mcc). Recently, this material attracted attention as aCDW material with multiple phase transitions, some of them, possibly, near room temperature. Littleinformation is known on the Raman characteristics of this material. Our Raman data for different polarizationangles show strong anisotropy in the response depending on the crystal direction. The most pronouncedRaman peaks reveal strong temperature dependence. The results of the measurements will be compared withthe theoretical predictions. Our data is important for further investigation of this quasi-1D CDW material forpossible applications in phase-change memory and reconfigurable devices. A.A.B. acknowledges the support of the Vannevar Bush Faculty Fellowship (VBFF) from the Office of NavalResearch (ONR) contract N00014-21-1-2947 “One-Dimensional Quantum Materials” and the National ScienceFoundation (NSF) program Designing Materials to Revolutionize and Engineer our Future (DMREF) via aproject DMR-1921958 “Data-Driven Discovery of Synthesis Pathways and Distinguishing ElectronicPhenomena of 1D van der Waals Bonded Solids”. A. D. and S. K. acknowledge support through the MaterialGenome Initiative funding allocated to the National Institute of Standards and Technology. [1] A. A. Balandin, F. Kargar, T. T. Salguero, and R. Lake, “One-dimensional van der Waals quantummaterials", Mater. Today, 55, 74 (2022). [2] A. A. Balandin, R. K. Lake, and T. T. Salguero, "One-dimensional van der Waals materials - Advent of a newresearch field" Appl. Phys. Lett., 121, 040401 (2022). [3] A. A. Balandin, S. V. Zaitzev-Zotov, and G. Grüner, "Charge-density-wave quantum materials and devices—New developments and future prospects", Appl. Phys. Lett., 119, 170401 (2021). [4] R. Samnakay, et al., “Zone-folded phonons and the charge-density-wave transition in 1T-TaSe2 thin films, Nano Lett., 15, 2965 (2015). 
    more » « less
  2. Abstract Body: Recently a new research field of quasi-one-dimensional (1D) van der Waals quantummaterials has emerged from earlier work on low-dimensional systems [1-2]. The quasi-1D van der Waalsmaterials have 1D motifs in their crystal structure [1]. Many of these materials reveal strongly correlatedphenomena such as charge density waves (CDW) [1-2]. The CDW phase is a periodic modulation of theelectronic charge density, accompanied by distortions in the underlying crystal lattice. Potential uses for CDWmaterials include memory storage and oscillators [3]. Raman spectroscopy can identify the CDW transitions todifferent phases via the appearance of phonon peaks due to emerging superstructure or the disappearance ofcertain peaks due to the loss of translation symmetry in the crystal lattice [3]. In this presentation, we report theresults of the angle and temperature-dependent Raman scattering spectroscopy investigation of themechanically exfoliated nanowires of the quasi-1D Nb van der Waals material. It is known that Nb forms in atetragonal crystal structure with space group 124 (P4/mcc). Recently, this material attracted attention as aCDW material with multiple phase transitions, some of them, possibly, near room temperature. Littleinformation is known on the Raman characteristics of this material. Our Raman data for different polarizationangles show strong anisotropy in the response depending on the crystal direction. The most pronouncedRaman peaks reveal strong temperature dependence. The results of the measurements will be compared withthe theoretical predictions. Our data is important for further investigation of this quasi-1D CDW material forpossible applications in phase-change memory and reconfigurable devices. A.A.B. acknowledges the support of the Vannevar Bush Faculty Fellowship (VBFF) from the Office of NavalResearch (ONR) contract N00014-21-1-2947 “One-Dimensional Quantum Materials” and the National ScienceFoundation (NSF) program Designing Materials to Revolutionize and Engineer our Future (DMREF) via aproject DMR-1921958 “Data-Driven Discovery of Synthesis Pathways and Distinguishing ElectronicPhenomena of 1D van der Waals Bonded Solids”. A. D. and S. K. acknowledge support through the MaterialGenome Initiative funding allocated to the National Institute of Standards and Technology. [1] A. A. Balandin, F. Kargar, T. T. Salguero, and R. Lake, “One-dimensional van der Waals quantummaterials", Mater. Today, 55, 74 (2022). [2] A. A. Balandin, R. K. Lake, and T. T. Salguero, "One-dimensional van der Waals materials - Advent of a newresearch field" Appl. Phys. Lett., 121, 040401 (2022). [3] A. A. Balandin, S. V. Zaitzev-Zotov, and G. Grüner, "Charge-density-wave quantum materials and devices—New developments and future prospects", Appl. Phys. Lett., 119, 170401 (2021). [4] R. Samnakay, et al., “Zone-folded phonons and the charge-density-wave transition in 1T-TaSe2 thin films,” Nano Lett., 15, 2965 (2015). 
    more » « less
  3. Abstract Background

    In Alzheimer’s Diseases (AD) research, multimodal imaging analysis can unveil complementary information from multiple imaging modalities and further our understanding of the disease. One application is to discover disease subtypes using unsupervised clustering. However, existing clustering methods are often applied to input features directly, and could suffer from the curse of dimensionality with high-dimensional multimodal data. The purpose of our study is to identify multimodal imaging-driven subtypes in Mild Cognitive Impairment (MCI) participants using a multiview learning framework based on Deep Generalized Canonical Correlation Analysis (DGCCA), to learn shared latent representation with low dimensions from 3 neuroimaging modalities.

    Results

    DGCCA applies non-linear transformation to input views using neural networks and is able to learn correlated embeddings with low dimensions that capture more variance than its linear counterpart, generalized CCA (GCCA). We designed experiments to compare DGCCA embeddings with single modality features and GCCA embeddings by generating 2 subtypes from each feature set using unsupervised clustering. In our validation studies, we found that amyloid PET imaging has the most discriminative features compared with structural MRI and FDG PET which DGCCA learns from but not GCCA. DGCCA subtypes show differential measures in 5 cognitive assessments, 6 brain volume measures, and conversion to AD patterns. In addition, DGCCA MCI subtypes confirmed AD genetic markers with strong signals that existing late MCI group did not identify.

    Conclusion

    Overall, DGCCA is able to learn effective low dimensional embeddings from multimodal data by learning non-linear projections. MCI subtypes generated from DGCCA embeddings are different from existing early and late MCI groups and show most similarity with those identified by amyloid PET features. In our validation studies, DGCCA subtypes show distinct patterns in cognitive measures, brain volumes, and are able to identify AD genetic markers. These findings indicate the promise of the imaging-driven subtypes and their power in revealing disease structures beyond early and late stage MCI.

     
    more » « less
  4. Abstract Given a Banach space X and a real number α ≥ 1, we write: (1) D ( X ) ≤ α if, for any locally finite metric space A , all finite subsets of which admit bilipschitz embeddings into X with distortions ≤ C , the space A itself admits a bilipschitz embedding into X with distortion ≤ α ⋅ C ; (2) D ( X ) = α + if, for every ϵ > 0, the condition D ( X ) ≤ α + ϵ holds, while D ( X ) ≤ α does not; (3) D ( X ) ≤ α + if D ( X ) = α + or D ( X ) ≤ α. It is known that D ( X ) is bounded by a universal constant, but the available estimates for this constant are rather large. The following results have been proved in this work: (1) D ((⊕ n =1 ∞ X n ) p ) ≤ 1 + for every nested family of finite-dimensional Banach spaces { X n } n =1 ∞ and every 1 ≤ p ≤ ∞. (2) D ((⊕ n =1 ∞ ℓ ∞ n ) p ) = 1 + for 1 < p < ∞. (3) D ( X ) ≤ 4 + for every Banach space X with no nontrivial cotype. Statement (3) is a strengthening of the Baudier–Lancien result (2008). 
    more » « less
  5. Abstract The t-distributed stochastic neighbor embedding (t-SNE) method is one of the leading techniques for data visualization and clustering. This method finds lower-dimensional embedding of data points while minimizing distortions in distances between neighboring data points. By construction, t-SNE discards information about large-scale structure of the data. We show that adding a global cost function to the t-SNE cost function makes it possible to cluster the data while preserving global intercluster data structure. We test the new global t-SNE (g-SNE) method on one synthetic and two real data sets on flower shapes and human brain cells. We find that significant and meaningful global structure exists in both the plant and human brain data sets. In all cases, g-SNE outperforms t-SNE and UMAP in preserving the global structure. Topological analysis of the clustering result makes it possible to find an appropriate trade-off of data distribution across scales. We find differences in how data are distributed across scales between the two subjects that were part of the human brain data set. Thus, by striving to produce both accurate clustering and positioning between clusters, the g-SNE method can identify new aspects of data organization across scales. 
    more » « less