skip to main content


Title: Choosing preemption points to minimize typical running times
The problem of selecting "effective preemption points" in a program --- points in the code at which to permit preemption --- in order to minimize overall running time is considered. Prior solutions that have been proposed for this problem are based on workload models in which worst-case known upper bounds are assumed for the duration needed to perform preemptions at particular points in the code, and of the time needed to non-preemptively execute the code between preemption points. Since these solutions are based on worst-case assumptions, they tend to select effective preemption points in a conservative manner; consequently the overall execution time of the program may be needlessly large under most typical run-time circumstances. We consider a more general workload model in which "typical" values, as well as upper bounds, are assumed to be known for the preemption durations and the non-preemptive code-execution durations; given such information, we derive algorithms for the optimal placement of preemption points in a manner that minimizes the typical overall running time (while continuing to guarantee, if needed, upper bounds on the worst-case over-all running time). Both off-line solutions (in which all preemption points are selected prior to run-time) and on-line solutions (where the selection of some of the preemption points is made during run-time and therefore can exploit knowledge of the actual durations of prior preemptions and of the executions of already executed pieces of code) are presented and proved optimal.  more » « less
Award ID(s):
1724227
NSF-PAR ID:
10183410
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the International Conference on Real-Time Networks and Systems (RTNS)
Page Range / eLocation ID:
198 to 208
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The fixed preemption point (FPP) model has been studied as an alternative to fully preemptive and non-preemptive models, as restricting preemptions to specific, predictable locations within a task’s execution can simplify overhead analysis without disallowing preemptions entirely. Prior work has produced response-time analyses for global Earliest Deadline First (G-EDF) scheduling under the FPP model. However, scheduling decisions based solely on task deadlines may be too coarsegrained and may not lead to the lowest response times. In this paper, we propose global FPP EDF-like (G-FPP-EL) scheduling, which assigns a priority point in time for each non-preemptive region of a task. We adapt compliant-vector analysis (CVA) to our model and present general response-time bounds for G-FPPEL schedulers. We then demonstrate that it is possible to design G-FPP-EL schedulers acheiving response-time bounds optimal under CVA and argue that such schedulers should replace global FPP EDF. 
    more » « less
  2. We prove several hardness results for training depth-2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth-2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of k ReLUs minimizing the squared error (for k>1) even in the realizable setting (i.e., when the labels are consistent with an unknown depth-2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error ϵ. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (Gap-ETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest k-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in (1/epsilon)^2. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth-2 network of ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on epsilon. 
    more » « less
  3. Abstract This paper introduces and evaluates a general construct for trading off accuracy and overall execution duration in classification-based machine perception problems—namely, the generalized IDK classifier cascade . The aim is to select the optimal sequence of classifiers required to minimize the expected (i.e. average) execution duration needed to achieve successful classification, subject to a constraint on quality, and optionally a latency constraint on the worst-case execution duration. An IDK classifier is a software component that attempts to categorize each input provided to it into one of a fixed set of classes, returning “I Don’t Know” (IDK) if it is unable to do so with the required level of confidence. An ensemble of several different IDK classifiers may be available for the same classification problem, offering different trade-offs between effectiveness (i.e. the probability of successful classification) and timeliness (i.e. execution duration). A model for representing such characteristics is defined, and a method is proposed for determining the values of the model parameters for a given ensemble of IDK classifiers. Optimal algorithms are developed for sequentially ordering IDK classifiers into an IDK cascade, such that the expected duration to successfully classify an input is minimized, optionally subject to a latency constraint on the worst-case overall execution duration of the IDK cascade. The entire methodology is applied to two real-world case studies. In contrast to prior work, the methodology developed in this paper caters for arbitrary dependences between the probabilities of successful classification for different IDK classifiers. Effective practical solutions are developed considering both single and multiple processors. 
    more » « less
  4. null (Ed.)
    Finding locally optimal solutions for MAX-CUT and MAX- k -CUT are well-known PLS-complete problems. An instinctive approach to finding such a locally optimum solution is the FLIP method. Even though FLIP requires exponential time in worst-case instances, it tends to terminate quickly in practical instances. To explain this discrepancy, the run-time of FLIP has been studied in the smoothed complexity framework. Etscheid and Röglin (ACM Transactions on Algorithms, 2017) showed that the smoothed complexity of FLIP for max-cut in arbitrary graphs is quasi-polynomial. Angel, Bubeck, Peres, and Wei (STOC, 2017) showed that the smoothed complexity of FLIP for max-cut in complete graphs is ( O Φ 5 n 15.1 ), where Φ is an upper bound on the random edge-weight density and Φ is the number of vertices in the input graph. While Angel, Bubeck, Peres, and Wei’s result showed the first polynomial smoothed complexity, they also conjectured that their run-time bound is far from optimal. In this work, we make substantial progress toward improving the run-time bound. We prove that the smoothed complexity of FLIP for max-cut in complete graphs is O (Φ n 7.83 ). Our results are based on a carefully chosen matrix whose rank captures the run-time of the method along with improved rank bounds for this matrix and an improved union bound based on this matrix. In addition, our techniques provide a general framework for analyzing FLIP in the smoothed framework. We illustrate this general framework by showing that the smoothed complexity of FLIP for MAX-3-CUT in complete graphs is polynomial and for MAX - k - CUT in arbitrary graphs is quasi-polynomial. We believe that our techniques should also be of interest toward showing smoothed polynomial complexity of FLIP for MAX - k - CUT in complete graphs for larger constants k . 
    more » « less
  5. Neuromorphic computing systems execute machine learning tasks designed with spiking neural networks. These systems are embracing non-volatile memory to implement high-density and low-energy synaptic storage. Elevated voltages and currents needed to operate non-volatile memories cause aging of CMOS-based transistors in each neuron and synapse circuit in the hardware, drifting the transistor’s parameters from their nominal values. If these circuits are used continuously for too long, the parameter drifts cannot be reversed, resulting in permanent degradation of circuit performance over time, eventually leading to hardware faults. Aggressive device scaling increases power density and temperature, which further accelerates the aging, challenging the reliable operation of neuromorphic systems. Existing reliability-oriented techniques periodically de-stress all neuron and synapse circuits in the hardware at fixed intervals, assuming worst-case operating conditions, without actually tracking their aging at run-time. To de-stress these circuits, normal operation must be interrupted, which introduces latency in spike generation and propagation, impacting the inter-spike interval and hence, performance (e.g., accuracy). We observe that in contrast to long-term aging, which permanently damages the hardware, short-term aging in scaled CMOS transistors is mostly due to bias temperature instability. The latter is heavily workload-dependent and, more importantly, partially reversible. We propose a new architectural technique to mitigate the aging-related reliability problems in neuromorphic systems by designing an intelligent run-time manager (NCRTM), which dynamically de-stresses neuron and synapse circuits in response to the short-term aging in their CMOS transistors during the execution of machine learning workloads, with the objective of meeting a reliability target. NCRTM de-stresses these circuits only when it is absolutely necessary to do so, otherwise reducing the performance impact by scheduling de-stress operations off the critical path. We evaluate NCRTM with state-of-the-art machine learning workloads on a neuromorphic hardware. Our results demonstrate that NCRTM significantly improves the reliability of neuromorphic hardware, with marginal impact on performance. 
    more » « less