Many computer-vision (CV) applications used in autonomous vehicles rely on historical results, which introduce cycles in processing graphs. However, existing response-time analysis breaks down in the presence of cycles, either by failing completely or by drastically sacrificing parallelism or CV accuracy. To address this situation, this paper presents a new graph-based task model, based on the recently ratified OpenVX standard, that includes historical requirements and their induced cycles as first-class concepts. Using this model, response-time bounds for graphs that may contain cycles are derived. These bounds expose a tradeoff between responsiveness and CV accuracy that hinges on the extent of allowed parallelism. This tradeoff is illustrated via a CV case study involving pedestrian tracking. In this case study, the methods proposed in this paper enabled significant improvements in both analytical and observed response times, with acceptable CV accuracy, compared to prior methods.
more »
« less
Making OpenVX Really "Real Time"
OpenVX is a recently ratified standard that was expressly proposed to facilitate the design of computer-vision (CV) applications used in real-time embedded systems. Despite its real-time focus, OpenVX presents several challenges when validating real-time constraints. Many of these challenges are rooted in the fact that OpenVX only implicitly defines any notion of a schedulable entity. Under OpenVX, CV applications are specified in the form of processing graphs that are inherently considered to execute monolithically end-to-end. This monolithic execution hinders parallelism and can lead to significant processing-capacity loss. Prior work partially addressed this problem by treating graph nodes as schedulable entities, but under OpenVX, these nodes represent rather coarse-grained CV functions, so the available parallelism that can be obtained in this way is quite limited. In this paper, a much more fine-grained approach for scheduling OpenVX graphs is proposed. This approach was designed to enable additional parallelism and to eliminate schedulability-related processing-capacity loss that arises when programs execute on both CPUs and graphics processing units (GPUs). Response-time analysis for this new approach is presented and its efficacy is evaluated via a case study involving an actual CV application.
more »
« less
- Award ID(s):
- 1717589
- PAR ID:
- 10108191
- Date Published:
- Journal Name:
- 2018 IEEE Real-Time Systems Symposium (RTSS)
- Page Range / eLocation ID:
- 80 to 93
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Autonomous vehicles often employ computer-vision (CV) algorithms that track the movements of pedestrians and other vehicles to maintain safe distances from them. These algorithms are usually expressed as real-time processing graphs that have cycles due to back edges that provide history information. If immediate back history is required, then such a cycle must execute sequentially. Due to this requirement, any graph that contains a cycle with utilization exceeding 1.0 is categorically unschedulable, i.e., bounded graph response times cannot be guaranteed. Unfortunately, such cycles can occur in practice, particularly if conservative execution-time assumptions are made, as befits a safety-critical system. This dilemma can be obviated by allowing older back history, which enables parallelism in cycle execution at the expense of possibly affecting the accuracy of tracking. However, the efficacy of this solution hinges on the resulting history-vs.-accuracy trade-off that it exposes. In this paper, this trade-off is explored in depth through an experimental study conducted using the open-source CARLA autonomous driving simulator. Somewhat surprisingly, easing away from always requiring immediate back history proved to have only a marginal impact on accuracy in this study.more » « less
-
Edge computing has enabled a large set of emerging edge applications by exploiting data proximity and offloading computation-intensive workloads to nearby edge servers. However, supporting edge application users at scale poses challenges due to limited point-of-presence edge sites and constrained elasticity. In this paper, we introduce a densely-distributed edge resource model that leverages capacity-constrained volunteer edge nodes to support elastic computation offloading. Our model also enables the use of geo-distributed edge nodes to further support elasticity. Collectively, these features raise the issue of edge selection. We present a distributed edge selection approach that relies on client-centric views of available edge nodes to optimize average end-to-end latency, with considerations of system heterogeneity, resource contention and node churn. Elasticity is achieved by fine-grained performance probing, dynamic load balancing, and proactive multi-edge node connections per client. Evaluations are conducted in both real-world volunteer environments and emulated platforms to show how a common edge application, namely AR-based cognitive assistance, can benefit from our approach and deliver low-latency responses to distributed users at scale.more » « less
-
Stochastic block partitioning (SBP) is a community detection algorithm that is highly accurate even on graphs with a complex community structure, but its inherently serial nature hinders its widespread adoption by the wider scientific community. To make it practical to analyze large real-world graphs with SBP, there is a growing need to parallelize and distribute the algorithm. The current state-of-the-art distributed SBP algorithm is a divide-and-conquer approach that limits communication between compute nodes until the end of inference. This leads to the breaking of computational dependencies, which causes convergence issues as the number of compute nodes increases and when the graph is sufficiently sparse. To address this shortcoming, we introduce EDiSt — an exact distributed stochastic block partitioning algorithm. Under EDiSt, compute nodes periodically share community assignments during inference. Due to this additional communication, EDiSt improves upon the divide-and-conquer algorithm by allowing it to scale out to a larger number of compute nodes without suffering from convergence issues, even on sparse graphs. We show that EDiSt provides speedups of up to 26.9x over the divide-and-conquer approach and speedups up to 44.0x over shared memory parallel SBP when scaled out to 64 compute nodes.more » « less
-
Fixed-priority multicore schedulers are often preferable to dynamic-priority ones because they entail less overhead, are easier to im-plement, and enable certain tasks to be favored over others. Underglobal fixed-priority (G-FP) scheduling, as applied to the standardsporadic task model, response times for low-priority tasks may beunbounded, even if total task-system utilization is low. In this paper,it is shown that this negative result can be circumvented if differentjobs of the same task are allowed to execute in parallel. In particu-lar, a response-time bound is presented for task systems that allowintra-task parallelism. This bound merely requires that total utiliza-tion does not exceed the overall processing capacity—individualtask utilizations need not be further restricted. This result impliesthat G-FP is optimal for scheduling soft real-time tasks that requirebounded tardiness, if intra-task parallelism is allowedmore » « less
An official website of the United States government

