skip to main content

This content will become publicly available on June 13, 2023

Title: HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
Projection algorithms such as t-SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperpa- rameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter values is computationally intensive and unintuitive due to the stochastic nature of such methods. In this paper we propose Hy- perNP, a scalable method that allows for real-time interactive hyperparameter exploration of projection methods by training neural network approximations. A HyperNP model can be trained on a fraction of the total data instances and hyperparameter configurations that one would like to investigate and can compute projections for new data and hyperparameters at interactive speeds. HyperNP models are compact in size and fast to compute, thus allowing them to be embedded in lightweight visualiza- tion systems. We evaluate the performance of HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP models are accurate, scalable, interactive, and appropriate for use in real-world settings.
Authors:
; ; ; ; ; ;
Award ID(s):
2118201
Publication Date:
NSF-PAR ID:
10339649
Journal Name:
Eurographics Conference on Visualization (EuroVis)
Sponsoring Org:
National Science Foundation
More Like this
  1. We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning.more »The source code is available at https://github.com/GTmac/FastRP.« less
  2. Modern Internet of Things (IoT) applications, from contextual sensing to voice assistants, rely on ML-based training and serving systems using pre-trained models to render predictions. However, real-world IoT environments are diverse, with rich IoT sensors and need ML models to be personalized for each setting using relatively less training data. Most existing general-purpose ML systems are optimized for specific and dedicated hardware resources and do not adapt to changing resources and different IoT application requirements. To address this gap, we propose MLIoT, an end-to-end Machine Learning System tailored towards supporting the entire lifecycle of IoT applications. MLIoT adapts to different IoT data sources, IoT tasks, and compute resources by automatically training, optimizing, and serving models based on expressive applicationspecific policies. MLIoT also adapts to changes in IoT environments or compute resources by enabling re-training, and updating models served on the fly while maintaining accuracy and performance. Our evaluation across a set of benchmarks show that MLIoT can handle multiple IoT tasks, each with individual requirements, in a scalable manner while maintaining high accuracy and performance. We compare MLIoT with two state-of-the-art hand-tuned systems and a commercial ML system showing that MLIoT improves accuracy from 50% - 75% while reducing ormore »maintaining latency.« less
  3. Variable binding is a cornerstone of symbolic reasoning and cognition. But how binding can be implemented in connectionist models has puzzled neuroscientists, cognitive psychologists, and neural network researchers for many decades. One type of connectionist model that naturally includes a binding operation is vector symbolic architectures (VSAs). In contrast to other proposals for variable binding, the binding operation in VSAs is dimensionality-preserving, which enables representing complex hierarchical data structures, such as trees, while avoiding a combinatoric expansion of dimensionality. Classical VSAs encode symbols by dense randomized vectors, in which information is distributed throughout the entire neuron population. By contrast, in the brain, features are encoded more locally, by the activity of single neurons or small groups of neurons, often forming sparse vectors of neural activation. Following Laiho et al. (2015), we explore symbolic reasoning with a special case of sparse distributed representations. Using techniques from compressed sensing, we first show that variable binding in classical VSAs is mathematically equivalent to tensor product binding between sparse feature vectors, another well-known binding operation which increases dimensionality. This theoretical result motivates us to study two dimensionality-preserving binding methods that include a reduction of the tensor matrix into a single sparse vector. One bindingmore »method for general sparse vectors uses random projections, the other, block-local circular convolution, is defined for sparse vectors with block structure, sparse block-codes. Our experiments reveal that block-local circular convolution binding has ideal properties, whereas random projection based binding also works, but is lossy. We demonstrate in example applications that a VSA with block-local circular convolution and sparse block-codes reaches similar performance as classical VSAs. Finally, we discuss our results in the context of neuroscience and neural networks.« less
  4. In statistical inference, the information-theoretic performance limits can often be expressed in terms of a statistical divergence between the underlying statistical models (e.g., in binary hypothesis testing, the error probability is related to the total variation distance between the statistical models). As the data dimension grows, computing the statistics involved in decision-making and the attendant performance limits (divergence measures) face complexity and stability challenges. Dimensionality reduction addresses these challenges at the expense of compromising the performance (the divergence reduces by the data-processing inequality). This paper considers linear dimensionality reduction such that the divergence between the models is maximally preserved. Specifically, this paper focuses on Gaussian models where we investigate discriminant analysis under five f-divergence measures (Kullback–Leibler, symmetrized Kullback–Leibler, Hellinger, total variation, and χ2). We characterize the optimal design of the linear transformation of the data onto a lower-dimensional subspace for zero-mean Gaussian models and employ numerical algorithms to find the design for general Gaussian models with non-zero means. There are two key observations for zero-mean Gaussian models. First, projections are not necessarily along the largest modes of the covariance matrix of the data, and, in some situations, they can even be along the smallest modes. Secondly, under specific regimes, themore »optimal design of subspace projection is identical under all the f-divergence measures considered, rendering a degree of universality to the design, independent of the inference problem of interest.« less
  5. Vehicle to Vehicle (V2V) communication allows vehicles to wirelessly exchange information on the surrounding environment and enables cooperative perception. It helps prevent accidents, increase the safety of the passengers, and improve the traffic flow efficiency. However, these benefits can only come when the vehicles can communicate with each other in a fast and reliable manner. Therefore, we investigated two areas to improve the communication quality of V2V: First, using beamforming to increase the bandwidth of V2V communication by establishing accurate and stable collaborative beam connection between vehicles on the road; second, ensuring scalable transmission to decrease the amount of data to be transmitted, thus reduce the bandwidth requirements needed for collaborative perception of autonomous driving vehicles. Beamforming in V2V communication can be achieved by utilizing image-based and LIDAR’s 3D data-based vehicle detection and tracking. For vehicle detection and tracking simulation, we tested the Single Shot Multibox Detector deep learning-based object detection method that can achieve a mean Average Precision of 0.837 and the Kalman filter for tracking. For scalable transmission, we simulate the effect of varying pixel resolutions as well as different image compression techniques on the file size of data. Results show that without compression, the file size formore »only transmitting the bounding boxes containing detected object is up to 10 times less than the original file size. Similar results are also observed when the file is compressed by lossless and lossy compression to varying degrees. Based on these findings using existing databases, the impact of these compression methods and methods of effectively combining feature maps on the performance of object detection and tracking models will be further tested in the real-world autonomous driving system.« less