skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural Networks
The design of machine learning systems often requires trading off different objectives, for example, prediction error and energy consumption for deep neural networks (DNNs). Typically, no single design performs well in all objectives; therefore, finding Pareto-optimal designs is of interest. The search for Pareto-optimal designs involves evaluating designs in an iterative process, and the measurements are used to evaluate an acquisition function that guides the search process. However, measuring different objectives incurs different costs. For example, the cost of measuring the prediction error of DNNs is orders of magnitude higher than that of measuring the energy consumption of a pre-trained DNN as it requires re-training the DNN. Current state-of-the-art methods do not consider this difference in objective evaluation cost, potentially incurring expensive evaluations of objective functions in the optimization process. In this paper, we develop a novel decoupled and cost-aware multi-objective optimization algorithm, which we call Flexible Multi-Objective Bayesian Optimization (FlexiBO) to address this issue. For evaluating each design, FlexiBO selects the objective with higher relative gain by weighting the improvement of the hypervolume of the Pareto region with the measurement cost of each objective. This strategy, therefore, balances the expense of collecting new information with the knowledge gained through objective evaluations, preventing FlexiBO from performing expensive measurements for little to no gain. We evaluate FlexiBO on seven state-of-the-art DNNs for image recognition, natural language processing (NLP), and speech-to-text translation. Our results indicate that, given the same total experimental budget, FlexiBO discovers designs with 4.8% to 12.4% lower hypervolume error than the best method in state-of-the-art multi-objective optimization.  more » « less
Award ID(s):
2107463 2233873 2007202 2038080
NSF-PAR ID:
10443135
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of Artificial Intelligence Research
Volume:
77
ISSN:
1076-9757
Page Range / eLocation ID:
645 to 682
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning system design frequently necessitates balancing multiple objectives, such as prediction error and energy consumption for deep neural networks (DNNs). Typically, no single design performs well across all objectives; thus, finding Pareto-optimal designs is of interest. Measuring different objectives frequently incurs different costs; for example, measuring the prediction error of DNNs is significantly more expensive than measuring the energy consumption of a pre-trained DNN because it requires re-training the DNN. Current state-of-the-art methods do not account for this difference in objective evaluation cost, potentially wasting costly evaluations of objective functions for little information gain. To address this issue, we propose a novel cost-aware decoupled approach that weights the improvement of the hypervolume of the Pareto region by the measurement cost of each objective. We perform experiments on a of range of DNN applications for comprehensive evaluation of our approach. 
    more » « less
  2. Machine learning system design frequently necessitates balancing multiple objectives, such as prediction error and energy consumption, for deep neural networks (DNNs). Typically, no single design performs well across all objectives; thus, finding Pareto-optimal designs is of interest. Measuring different objectives frequently incurs different costs; for example, measuring the prediction error of DNNs is significantly more expensive than measuring the energy consumption of a pre-trained DNN because it requires re-training the DNN. Current state-of-the-art methods do not account for this difference in objective evaluation cost, potentially wasting costly evaluations of objective functions for little information gain. To address this issue, we propose a novel cost-aware decoupled approach that weights the improvement of the hypervolume of the Pareto region by the measurement cost of each objective. To evaluate our approach, we perform experiments on several machine learning systems deployed on energy constraints environments. 
    more » « less
  3. Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds. 
    more » « less
  4. In multi-objective search, edges are annotated with cost vectors consisting of multiple cost components. A path dominates another path with the same start and goal vertices iff the component-wise sum of the cost vectors of the edges of the former path is 'less than' the component-wise sum of the cost vectors of the edges of the latter path. The Pareto-optimal solution set is the set of all undominated paths from a given start vertex to a given goal vertex. Its size can be exponential in the size of the graph being searched, which makes multi-objective search time-consuming. In this paper, we therefore study how to find an approximate Pareto-optimal solution set for a user-provided vector of approximation factors. The size of such a solution set can be significantly smaller than the size of the Pareto-optimal solution set, which enables the design of approximate multi-objective search algorithms that are efficient and produce small solution sets. We present such an algorithm in this paper, called A*pex. A*pex builds on PPA*, a state-of-the-art approximate bi-objective search algorithm (where there are only two cost components) but (1) makes PPA* more efficient for bi-objective search and (2) generalizes it to multi-objective search for any number of cost components. We first analyze the correctness of A*pex and then experimentally demonstrate its efficiency advantage over existing approximate algorithms for bi- and tri-objective search. 
    more » « less
  5. The design of optical devices is a complex and time-consuming process. To simplify this process, we present a novel framework of multi-fidelity multi-objective Bayesian optimization with warm starts, called Multi-BOWS. This approach automatically discovers new nanophotonic structures by managing multiple competing objectives and utilizing multi-fidelity evaluations during the design process. We employ our Multi-BOWS method to design an optical device specifically for transparent electromagnetic shielding, a challenge that demands balancing visible light transparency and effective protection against electromagnetic waves. Our approach leverages the understanding that simulations with a coarser mesh grid are faster, albeit less accurate than those using a denser mesh grid. Unlike the earlier multi-fidelity multi-objective method, Multi-BOWS begins with faster, less accurate evaluations, which we refer to as “warm-starting,” before shifting to a dense mesh grid to increase accuracy. As a result, Multi-BOWS demonstrates 3.2–89.9% larger normalized area under the Pareto frontier, which measures a balance between transparency and shielding effectiveness, than low-fidelity only and high-fidelity only techniques for the nanophotonic structures studied in this work. Moreover, our method outperforms an existing multi-fidelity method by obtaining 0.5–10.3% larger normalized area under the Pareto frontier for the structures of interest. 
    more » « less