Abstract Projection algorithms such as t‐SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperparameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter values is computationally intensive and unintuitive due to the stochastic nature of such methods. In this paper we propose HyperNP, a scalable method that allows for real‐time interactive hyperparameter exploration of projection methods by training neural network approximations. A HyperNP model can be trained on a fraction of the total data instances and hyperparameter configurations that one would like to investigate and can compute projections for new data and hyperparameters at interactive speeds. HyperNP models are compact in size and fast to compute, thus allowing them to be embedded in lightweight visualization systems. We evaluate the performance of HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP models are accurate, scalable, interactive, and appropriate for use in real‐world settings.
more »
« less
HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters
Projection algorithms such as t-SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperpa- rameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter values is computationally intensive and unintuitive due to the stochastic nature of such methods. In this paper we propose Hy- perNP, a scalable method that allows for real-time interactive hyperparameter exploration of projection methods by training neural network approximations. A HyperNP model can be trained on a fraction of the total data instances and hyperparameter configurations that one would like to investigate and can compute projections for new data and hyperparameters at interactive speeds. HyperNP models are compact in size and fast to compute, thus allowing them to be embedded in lightweight visualiza- tion systems. We evaluate the performance of HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP models are accurate, scalable, interactive, and appropriate for use in real-world settings.
more »
« less
- Award ID(s):
- 2118201
- PAR ID:
- 10339649
- Date Published:
- Journal Name:
- Eurographics Conference on Visualization (EuroVis)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning. The source code is available at https://github.com/GTmac/FastRP.more » « less
-
The variability and biases in the real-world performance benchmarking of deep learning models for medical imaging compromise their trustworthiness for real-world deployment. The common approach of holding out a single fixed test set fails to quantify the variance in the estimation of test performance metrics. This study introduces NACHOS (Nested and Automated Cross-validation and Hyperparameter Optimization using Supercomputing) to reduce and quantify the variance of test performance metrics of deep learning models. NACHOS integrates Nested Cross-Validation (NCV) and Automated Hyperparameter Optimization (AHPO) within a parallelized high-performance computing (HPC) framework. NACHOS was demonstrated on a chest X-ray repository and an Optical Coherence Tomography (OCT) dataset under multiple data partitioning schemes. Beyond performance estimation, DACHOS (Deployment with Automated Cross-validation and Hyperparameter Optimization using Supercomputing) is introduced to leverage AHPO and cross-validation to build the final model on the full dataset, improving expected deployment performance. The findings underscore the importance of NCV in quantifying and reducing estimation variance, AHPO in optimizing hyperparameters consistently across test folds, and HPC in ensuring computational feasibility. By integrating these methodologies, NACHOS and DACHOS provide a scalable, reproducible, and trustworthy framework for DL model evaluation and deployment in medical imaging.more » « less
-
Fan, J; Pan, J. (Ed.)Testing whether the mean vector from some population is zero or not is a fundamental problem in statistics. In the high-dimensional regime, where the dimension of data p is greater than the sample size n, traditional methods such as Hotelling’s T2 test cannot be directly applied. One can project the high-dimensional vector onto a space of low dimension and then traditional methods can be applied. In this paper, we propose a projection test based on a new estimation of the optimal projection direction Σ^{−1}μ. Under the assumption that the optimal projection Σ^{−1}μ is sparse, we use a regularized quadratic programming with nonconvex penalty and linear constraint to estimate it. Simulation studies and real data analysis are conducted to examine the finite sample performance of different tests in terms of type I error and power.more » « less
-
null (Ed.)Applied macroeconomists often compute confidence intervals for impulse responses using local projections, that is, direct linear regressions of future outcomes on current covariates. This paper proves that local projection inference robustly handles two issues that commonly arise in applications: highly persistent data and the estimation of impulse responses at long horizons. We consider local projections that control for lags of the variables in the regression. We show that lag‐augmented local projections with normal critical values are asymptotically valid uniformly over (i) both stationary and non‐stationary data, and also over (ii) a wide range of response horizons. Moreover, lag augmentation obviates the need to correct standard errors for serial correlation in the regression residuals. Hence, local projection inference is arguably both simpler than previously thought and more robust than standard autoregressive inference, whose validity is known to depend sensitively on the persistence of the data and on the length of the horizon.more » « less
An official website of the United States government

