From signal processing to emerging deep neural networks, a range of applications exhibit intrinsic error resilience. For such applications, approximate computing opens up new possibilities for energyefficient computing by producing slightly inaccurate results using greatly simplified hardware. Adopting this approach, a variety of basic arithmetic units, such as adders and multipliers, have been effectively redesigned to generate approximate results for many errorresilient applications.In this work, we propose SECO, an approximate exponential function unit (EFU). Exponentiation is a key operation in many signal processing applications and more importantly in spiking neuron models, but its energyefficient implementation has been inadequately explored. We also introduce a crosslayer design method for SECO to optimize the energyaccuracy tradeoff. At the algorithm level, SECO offers runtime scaling between energy efficiency and accuracy based on approximate Taylor expansion, where the error is minimized by optimizing parameters using discrete gradient descent at design time. At the circuit level, our error analysis method efficiently explores the design space to select the energyaccuracyoptimal approximate multiplier at design time. In tandem, the crosslayer design and runtime optimization method are able to generate energyefficient and accurate approximate EFU designs that are up to 99.7% accurate at a power consumption of 3.73 pJmore »
Proxima: An Approach for Time or Accuracy Budgeted Collision Proximity Queries
Many applications in robotics require computing a robot manipulator's "proximity" to a collision state in a given configuration. This collision proximity is commonly framed as a summation over closest Euclidean distances between many pairs of rigid shapes in a scene. Computing many such pairwise distances is inefficient, while more efficient approximations of this procedure, such as through supervised learning, lack accuracy and robustness. In this work, we present an approach for computing a collision proximity function for robot manipulators that formalizes the tradeoff between efficiency and accuracy and provides an algorithm that gives control over it. Our algorithm, called Proxima, works in one of two ways: (1) given a time budget as input, the algorithm returns an asaccurateaspossible proximity approximation value in this time; or (2) given an accuracy budget, the algorithm returns an asfastaspossible proximity approximation value that is within the given accuracy bounds. We show the robustness of our approach through analytical investigation and simulation experiments on a wide set of robot models ranging from 6 to 132 degrees of freedom. We demonstrate that controlling the tradeoff between efficiency and accuracy in proximity computations via our approach can enable safe and accurate realtime robot motionoptimization even on highdimensional more »
 Award ID(s):
 1830242
 Publication Date:
 NSFPAR ID:
 10340412
 Journal Name:
 Proceedings of Robotics: Science and Systems (RSS)
 Sponsoring Org:
 National Science Foundation
More Like this


We consider the problem of designing sublinear time algorithms for estimating the cost of minimum] metric traveling salesman (TSP) tour. Specifically, given access to a n × n distance matrix D that specifies pairwise distances between n points, the goal is to estimate the TSP cost by performing only sublinear (in the size of D) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any epsilon > 0, there exists an O^~(n/epsilon^O(1))time algorithm that returns a (1+epsilon)approximate estimate of the MST cost. This result immediately implies an O^~(n/epsilon^O(1)) time algorithm to estimate the TSP cost to within a (2 + epsilon) factor for any epsilon > 0. However, no o(n^2)time algorithms are known to approximate metric TSP to a factor that is strictly better than 2. On the other hand, there were also no known barriers that rule out existence of (1 + epsilon)approximate estimation algorithms for metric TSP with O^~ (n) time for any fixed epsilon > 0. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost. On the algorithmic side, we first consider the graphic TSP problemmore »

With the success of Deep Neural Networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resourcelimited system via model compression techniques, such as quantization, pruning, lowrank approximation and etc. However, almost all existing compressed DNNs are fixed after deployment, which lacks runtime adaptive structure to adapt to its dynamic hardware resource allocation, power budget, throughput requirement, as well as dynamic workload. As the countermeasure, to construct a novel runtime dynamic DNN structure, we propose a novel DNN subnetwork sampling method via nonuniform channel selection for subnets generation. Thus, user can trade off between power, speed, computing load and accuracy onthefly after the deployment, depending on the dynamic requirements or specifications of the given system. We verify the proposed model on both CIFAR10 and ImageNet dataset using ResNets, which outperforms the same subnets trained individually and other related works. It shows that, our method can achieve latency tradeoff among 13.4, 24.6, 41.3, 62.1(ms) and 30.5, 38.7, 51, 65.4(ms) for GPU with 128 batchsize and CPU respectively on ImageNet using ResNet18.

Autonomous mobile robots (AMRs) have been widely utilized in industry to execute various onboard computervision applications including autonomous guidance, security patrol, object detection, and face recognition. Most of the applications executed by an AMR involve the analysis of camera images through trained machine learning models. Many research studies on machine learning focus either on performance without considering energy efficiency or on techniques such as pruning and compression to make the model more energyefficient. However, most previous work do not study the root causes of energy inefficiency for the execution of those applications on AMRs. The computing stack on an AMR accounts for 33% of the total energy consumption and can thus highly impact the battery life of the robot. Because recharging an AMR may disrupt the application execution, it is important to efficiently utilize the available energy for maximized battery life. In this paper, we first analyze the breakdown of power dissipation for the execution of computervision applications on AMRs and discover three main root causes of energy inefficiency: uncoordinated access to sensor data, performanceoriented model inference execution, and uncoordinated execution of concurrent jobs. In order to fix these three inefficiencies, we propose E2M, an energyefficient middleware software stack formore »

With the success of deep neural networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resourcelimited embedded system via model compression techniques, such as quantization, pruning, lowrank approximation, etc. However, almost all existing DNN structure is fixed after deployment, which lacks runtime adaptive DNN structure to adapt to its dynamic hardware resource, power budget, throughput requirement, as well as dynamic workload. Correspondingly, there is no runtime adaptive hardware platform to support dynamic DNN structure. To address this problem, we first propose a dynamic channeladaptive deep neural network (CADNN) which can adjust the involved convolution channel (i.e. model size, computing load) at runtime (i.e. at inference stage without retraining) to dynamically trade off between power, speed, computing load and accuracy. Further, we utilize knowledge distillation method to optimize the model and quantize the model to 8bits and 16bits, respectively, for hardware friendly mapping. We test the proposed model on CIFAR10 and ImageNet dataset by using ResNet. Comparing with the same model size of individual model, our CADNN achieves better accuracy. Moreover, as far as we know, we are the first to propose a ProcessinginMemory accelerator for such adaptive neural networks structure based on Spin Orbitmore »