skip to main content


Title: HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
Projection algorithms such as t-SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperpa- rameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter values is computationally intensive and unintuitive due to the stochastic nature of such methods. In this paper we propose Hy- perNP, a scalable method that allows for real-time interactive hyperparameter exploration of projection methods by training neural network approximations. A HyperNP model can be trained on a fraction of the total data instances and hyperparameter configurations that one would like to investigate and can compute projections for new data and hyperparameters at interactive speeds. HyperNP models are compact in size and fast to compute, thus allowing them to be embedded in lightweight visualiza- tion systems. We evaluate the performance of HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP models are accurate, scalable, interactive, and appropriate for use in real-world settings.  more » « less
Award ID(s):
2118201
NSF-PAR ID:
10339649
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Eurographics Conference on Visualization (EuroVis)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Projection algorithms such as t‐SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperparameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter values is computationally intensive and unintuitive due to the stochastic nature of such methods. In this paper we propose HyperNP, a scalable method that allows for real‐time interactive hyperparameter exploration of projection methods by training neural network approximations. A HyperNP model can be trained on a fraction of the total data instances and hyperparameter configurations that one would like to investigate and can compute projections for new data and hyperparameters at interactive speeds. HyperNP models are compact in size and fast to compute, thus allowing them to be embedded in lightweight visualization systems. We evaluate the performance of HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP models are accurate, scalable, interactive, and appropriate for use in real‐world settings.

     
    more » « less
  2. null (Ed.)
    We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning. The source code is available at https://github.com/GTmac/FastRP. 
    more » « less
  3. Abstract

    The rapid expansion of Earth system model (ESM) data available from the Coupled Model Intercomparison Project Phase 6 (CMIP6) necessitates new methods to evaluate the performance and suitability of ESMs used for hydroclimate applications as these extremely large data volumes complicate stakeholder efforts to use new ESM outputs in updated climate vulnerability and impact assessments. We develop an analysis framework to inform ESM sub‐selection based on process‐oriented considerations and demonstrate its performance for a regional application in the US Pacific Northwest. First, a suite of global and regional metrics is calculated, using multiple historical observation datasets to assess ESM performance. These metrics are then used to rank CMIP6 models, and a culled ensemble of models is selected using a trend‐related diagnostics approach. This culling strategy does not dramatically change climate scenario trend projections in this region, despite retaining only 20% of the CMIP6 ESMs in the final model ensemble. The reliability of the culled trend projection envelope and model response similarity is also assessed using a perfect model framework. The absolute difference in temperature trend projections is reduced relative to the full ensemble compared to the model for each SSP scenario, while precipitation trend errors are largely unaffected. In addition, we find that the spread of the culled ensemble temperature and precipitation trends includes the trend of the “truth” model ∼83%‐92% of the time. This analysis demonstrates a reliable method to reduce ESM ensemble size that can ease use of ESMs for creating and understanding climate vulnerability and impact assessments.

     
    more » « less
  4. Modern Internet of Things (IoT) applications, from contextual sensing to voice assistants, rely on ML-based training and serving systems using pre-trained models to render predictions. However, real-world IoT environments are diverse, with rich IoT sensors and need ML models to be personalized for each setting using relatively less training data. Most existing general-purpose ML systems are optimized for specific and dedicated hardware resources and do not adapt to changing resources and different IoT application requirements. To address this gap, we propose MLIoT, an end-to-end Machine Learning System tailored towards supporting the entire lifecycle of IoT applications. MLIoT adapts to different IoT data sources, IoT tasks, and compute resources by automatically training, optimizing, and serving models based on expressive applicationspecific policies. MLIoT also adapts to changes in IoT environments or compute resources by enabling re-training, and updating models served on the fly while maintaining accuracy and performance. Our evaluation across a set of benchmarks show that MLIoT can handle multiple IoT tasks, each with individual requirements, in a scalable manner while maintaining high accuracy and performance. We compare MLIoT with two state-of-the-art hand-tuned systems and a commercial ML system showing that MLIoT improves accuracy from 50% - 75% while reducing or maintaining latency. 
    more » « less
  5. null (Ed.)
    Variable binding is a cornerstone of symbolic reasoning and cognition. But how binding can be implemented in connectionist models has puzzled neuroscientists, cognitive psychologists, and neural network researchers for many decades. One type of connectionist model that naturally includes a binding operation is vector symbolic architectures (VSAs). In contrast to other proposals for variable binding, the binding operation in VSAs is dimensionality-preserving, which enables representing complex hierarchical data structures, such as trees, while avoiding a combinatoric expansion of dimensionality. Classical VSAs encode symbols by dense randomized vectors, in which information is distributed throughout the entire neuron population. By contrast, in the brain, features are encoded more locally, by the activity of single neurons or small groups of neurons, often forming sparse vectors of neural activation. Following Laiho et al. (2015), we explore symbolic reasoning with a special case of sparse distributed representations. Using techniques from compressed sensing, we first show that variable binding in classical VSAs is mathematically equivalent to tensor product binding between sparse feature vectors, another well-known binding operation which increases dimensionality. This theoretical result motivates us to study two dimensionality-preserving binding methods that include a reduction of the tensor matrix into a single sparse vector. One binding method for general sparse vectors uses random projections, the other, block-local circular convolution, is defined for sparse vectors with block structure, sparse block-codes. Our experiments reveal that block-local circular convolution binding has ideal properties, whereas random projection based binding also works, but is lossy. We demonstrate in example applications that a VSA with block-local circular convolution and sparse block-codes reaches similar performance as classical VSAs. Finally, we discuss our results in the context of neuroscience and neural networks. 
    more » « less