skip to main content

Title: DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference
Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms. Current segmentation models are trained and evaluated on massive high-resolution scene images (“data-level”) and suffer from the expensive computation arising from the required multi-scale aggregation (“network level”). In both folds, the computational and energy costs in training and inference are notable due to the often desired large input resolutions and heavy computational burden of segmentation models. To this end, we propose DANCE, general automated DA ta- N etwork C o-optimization for E fficient segmentation model training and inference . Distinct from existing efficient segmentation approaches that focus merely on light-weight network design, DANCE distinguishes itself as an automated simultaneous data-network co-optimization via both input data manipulation and network architecture slimming. Specifically, DANCE integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images’ spatial complexity. Such a downsampling operation, in addition to slimming down the cost associated with the input size directly, also shrinks the dynamic range of input object and context scales, therefore motivating us to also adaptively slim the network to match the downsampled data. Extensive experiments and ablating studies (on four SOTA segmentation models with three popular segmentation datasets under two training settings) demonstrate that DANCE can achieve “all-win” towards efficient segmentation (reduced training cost, less expensive inference, and better mean Intersection-over-Union (mIoU)). Specifically, DANCE can reduce ↓25%–↓77% energy consumption in training, ↓31%–↓56% in inference, while boosting the mIoU by ↓0.71%–↑ 13.34%.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Design Automation of Electronic Systems
Page Range / eLocation ID:
1 to 20
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. 3D instance segmentation for unlabeled imaging modalities is a challenging but essential task as collecting expert annotation can be expensive and time-consuming. Existing works segment a new modality by either deploying pre-trained models optimized on diverse training data or sequentially conducting image translation and segmentation with two relatively independent networks. In this work, we propose a novel Cyclic Segmentation Generative Adversarial Network (CySGAN) that conducts image translation and instance segmentation simultaneously using a unified network with weight sharing. Since the image translation layer can be removed at inference time, our proposed model does not introduce additional computational cost upon a standard segmentation model. For optimizing CySGAN, besides the CycleGAN losses for image translation and supervised losses for the annotated source domain, we also utilize self-supervised and segmentation-based adversarial objectives to enhance the model performance by leveraging unlabeled target domain images. We benchmark our approach on the task of 3D neuronal nuclei segmentation with annotated electron microscopy (EM) images and unlabeled expansion microscopy (ExM) data. The proposed CySGAN outperforms pre-trained generalist models, feature-level domain adaptation models, and the baselines that conduct image translation and segmentation sequentially. Our implementation and the newly collected, densely annotated ExM zebrafish brain nuclei dataset, named NucExM, are publicly available at 
    more » « less
  2. High-quality 3D image recognition is an important component of many vision and robotics systems. However, the accurate processing of these images requires the use of compute-expensive 3D Convolutional Neural Networks (CNNs). To address this challenge, we propose the use of Spiking Neural Networks (SNNs) that are generated from iso-architecture CNNs and trained with quantization-aware gradient descent to optimize their weights, membrane leak, and firing thresholds. During both training and inference, the analog pixel values of a 3D image are directly applied to the input layer of the SNN without the need to convert to a spike-train. This significantly reduces the training and inference latency and results in high degree of activation sparsity, which yields significant improvements in computational efficiency. However, this introduces energy-hungry digital multiplications in the first layer of our models, which we propose to mitigate using a processing-in-memory (PIM) architecture. To evaluate our proposal, we propose a 3D and a 3D/2D hybrid SNN-compatible convolutional architecture and choose hyperspectral imaging (HSI) as an application for 3D image recognition. We achieve overall test accuracy of 98.68, 99.50, and 97.95% with 5 time steps (inference latency) and 6-bit weight quantization on the Indian Pines, Pavia University, and Salinas Scene datasets, respectively. In particular, our models implemented using standard digital hardware achieved accuracies similar to state-of-the-art (SOTA) with ~560.6× and ~44.8× less average energy than an iso-architecture full-precision and 6-bit quantized CNN, respectively. Adopting the PIM architecture in the first layer, further improves the average energy, delay, and energy-delay-product (EDP) by 30, 7, and 38%, respectively. 
    more » « less
  3. 3D LiDAR scanners are playing an increasingly important role in autonomous driving as they can generate depth information of the environment. However, creating large 3D LiDAR point cloud datasets with point-level labels requires a significant amount of manual annotation. This jeopardizes the efficient development of supervised deep learning algorithms which are often data-hungry. We present a framework to rapidly create point clouds with accurate pointlevel labels from a computer game. To our best knowledge, this is the first publication on LiDAR point cloud simulation framework for autonomous driving. The framework supports data collection from both auto-driving scenes and user-configured scenes. Point clouds from auto-driving scenes can be used as training data for deep learning algorithms, while point clouds from user-configured scenes can be used to systematically test the vulnerability of a neural network, and use the falsifying examples to make the neural network more robust through retraining. In addition, the scene images can be captured simultaneously in order for sensor fusion tasks, with a method proposed to do automatic registration between the point clouds and captured scene images. We show a significant improvement in accuracy (+9%) in point cloud segmentation by augmenting the training dataset with the generated synthesized data. Our experiments also show by testing and retraining the network using point clouds from user-configured scenes, the weakness/blind spots of the neural network can be fixed. 
    more » « less
  4. Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at 
    more » « less
  5. null (Ed.)
    Heat loss quantification (HLQ) is an essential step in improving a building’s thermal performance and optimizing its energy usage. While this problem is well-studied in the literature, most of the existing studies are either qualitative or minimally driven quantitative studies that rely on localized building envelope points and are, thus, not suitable for automated solutions in energy audit applications. This research work is an attempt to fill this gap of knowledge by utilizing intensive thermal data (on the order of 100,000 plus images) and constitutes a relatively new area of analysis in energy audit applications. Specifically, we demonstrate a novel process using deep-learning methods to segment more than 100,000 thermal images collected from an unmanned aerial system (UAS). To quantify the heat loss for a building envelope, multiple stages of computations need to be performed: object detection (using Mask-RCNN/Faster R-CNN), estimating the surface temperature (using two clustering methods), and finally calculating the overall heat transfer coefficient (e.g., the U-value). The proposed model was applied to eleven academic campuses across the state of North Dakota. The preliminary findings indicate that Mask R-CNN outperformed other instance segmentation models with an mIOU of 73% for facades, 55% for windows, 67% for roofs, 24% for doors, and 11% for HVACs. Two clustering methods, namely K-means and threshold-based clustering (TBC), were deployed to estimate surface temperatures with TBC providing consistent estimates across all times of the day over K-means. Our analysis demonstrated that thermal efficiency not only depended on the accurate acquisition of thermal images but also relied on other factors, such as the building geometry and seasonal weather parameters, such as the outside/inside building temperatures, wind, time of day, and indoor heating/cooling conditions. Finally, the resultant U-values of various building envelopes were compared with recommendations from the American Society of Heating, Refrigerating, and Air-conditioning Engineers (ASHRAE) building standards. 
    more » « less