skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 28, 2026

Title: DyESP: Accelerating Hyperparameter-Architecture Search via Dynamic Exploration and Space Pruning
In this work, we introduce DyESP, a novel approach that unites dynamic exploration with space pruning to expedite the combined search of hyperparameters and architecture, enhancing the efficiency and accuracy of hyperparameter-architecture search (HAS). Central to DyESP are two innovative components: a meta-scheduler that customizes the search strategy for varying spaces and a pruner designed to minimize the hyperparameter space by discarding suboptimal configurations. The meta-scheduler leverages historical data to dynamically refine the search direction, targeting the most promising areas while minimizing unnecessary exploration. Meanwhile, the pruner employs a surrogate model, specifically a fine-tuned multilayer perceptron (MLP), to predict and eliminate inferior configurations based on static metrics, thereby streamlining the search and conserving computational resources. The results from the pruner, which identifies and removes underperforming configurations, are fed into the meta-scheduler. This process updates the historical dataset used by the meta-scheduler, enabling it to adjust the exploration degree and refine the sampling strategy for subsequent iterations. This integration ensures the meta-scheduler is continually updated with relevant data, allowing for more accurate and timely adjustments to the exploration strategy.Experiments on various benchmarks show that DyESP outperforms existing methods in terms of both speed and stability on almost all benchmarks.  more » « less
Award ID(s):
2416846 2417850
PAR ID:
10618777
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Association for the Advancement of Artificial Intelligence (AAAI)
Date Published:
Journal Name:
Proceedings of the AAAI Symposium Series
Volume:
5
Issue:
1
ISSN:
2994-4317
Page Range / eLocation ID:
172 to 179
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investigate the problem of federated hyperparameter tuning. We first identify key challenges and show how standard approaches may be adapted to form baselines for the federated setting. Then, by making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx, to accelerate federated hyperparameter tuning that is applicable to widely-used federated optimization methods such as FedAvg and recent variants. Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization across devices. Empirically, we show that FedEx can outperform natural baselines for federated hyperparameter tuning by several percentage points on the Shakespeare, FEMNIST, and CIFAR-10 benchmarks, obtaining higher accuracy using the same training budget. 
    more » « less
  2. We introduce Ordalia, a novel approach for speeding up deep learning hyperparameter optimization search through early-pruning of less promising configurations. Our method leverages empirical and theoretical results characterizing the shape of the generalization error curve for increasing training data size and number of epochs. We show that with relatively small computational resources one can estimate the dominant parameters of neural networks' learning curves to obtain consistently good evaluations of their learning process to reliably early-eliminate non-promising configurations. By iterating this process with increasing training resources Ordalia rapidly converges to a small candidate set that includes many of the most promising configurations. We compare the performance of Ordalia with Hyperband, the state-of-the-art model-free hyperparameter optimization algorithm, and show that Ordalia consistently outperforms it on a variety of deep learning tasks. Ordalia conservative use of computational resources and ability to evaluate neural networks learning progress leads to a much better exploration and coverage of the search space, which ultimately produces superior neural network configurations. 
    more » « less
  3. The interaction and dimension of points are two important axes in designing point operators to serve hierarchical 3D models. Yet, these two axes are heterogeneous and challenging to fully explore. Existing works craft point operator under a single axis and reuse the crafted operator in all parts of 3D models. This overlooks the opportunity to better combine point interactions and dimensions by exploiting varying geometry/density of 3D point clouds. In this work, we establish PIDS, a novel paradigm to jointly explore point interactions and point dimensions to serve semantic segmentation on point cloud data. We establish a large search space to jointly consider versatile point interactions and point dimensions. This supports point operators with various geometry/density considerations. The enlarged search space with heterogeneous search components calls for a better ranking of candidate models. To achieve this, we improve the search space exploration by leveraging predictor-based Neural Architecture Search (NAS), and enhance the quality of prediction by assigning unique encoding to heterogeneous search components based on their priors. We thoroughly evaluate the networks crafted by PIDS on two semantic segmentation benchmarks, showing ∼ 1% mIOU improvement on SemanticKITTI and S3DIS over state-of-the-art 3D models. 
    more » « less
  4. The interaction and dimension of points are two important axes in designing point operators to serve hierarchical 3D models. Yet, these two axes are heterogeneous and challenging to fully explore. Existing works craft point operator under a single axis and reuse the crafted operator in all parts of 3D models. This overlooks the opportunity to better combine point interactions and dimensions by exploiting varying geometry/density of 3D point clouds. In this work, we establish PIDS, a novel paradigm to jointly explore point interactions and point dimensions to serve semantic segmentation on point cloud data. We establish a large search space to jointly consider versatile point interactions and point dimensions. This supports point operators with various geometry/density considerations. The enlarged search space with heterogeneous search components calls for a better ranking of candidate models. To achieve this, we improve the search space exploration by leveraging predictor-based Neural Architecture Search (NAS), and enhance the quality of prediction by assigning unique encoding to heterogeneous search components based on their priors. We thoroughly evaluate the networks crafted by PIDS on two semantic segmentation benchmarks, showing 1% mIOU improvement on SemanticKITTI and S3DIS over state-of-the-art 3D models. 
    more » « less
  5. Abstract Projection algorithms such as t‐SNE or UMAP are useful for the visualization of high dimensional data, but depend on hyperparameters which must be tuned carefully. Unfortunately, iteratively recomputing projections to find the optimal hyperparameter values is computationally intensive and unintuitive due to the stochastic nature of such methods. In this paper we propose HyperNP, a scalable method that allows for real‐time interactive hyperparameter exploration of projection methods by training neural network approximations. A HyperNP model can be trained on a fraction of the total data instances and hyperparameter configurations that one would like to investigate and can compute projections for new data and hyperparameters at interactive speeds. HyperNP models are compact in size and fast to compute, thus allowing them to be embedded in lightweight visualization systems. We evaluate the performance of HyperNP across three datasets in terms of performance and speed. The results suggest that HyperNP models are accurate, scalable, interactive, and appropriate for use in real‐world settings. 
    more » « less