<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Towards Elasticity in Heterogeneous Edge-denseEnvironments</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/09/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10354987</idno>
					<idno type="doi"></idno>
					<title level='j'>ICDCS</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Zhiying Liang Lei Huang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Edge computing has enabled a large set of emergingedge applications by exploiting data proximity and offloadingcomputation-intensive workloads to nearby edge servers. However,supporting edge application users at scale poses challengesdue to limited point-of-presence edge sites and constrainedelasticity. In this paper, we introduce a densely-distributed edgeresource model that leverages capacity-constrained volunteeredge nodes to support elastic computation offloading. Our modelalso enables the use of geo-distributed edge nodes to furthersupport elasticity. Collectively, these features raise the issue ofedge selection. We present a distributed edge selection approachthat relies on client-centric views of available edge nodes tooptimize average end-to-end latency, with considerations ofsystem heterogeneity, resource contention and node churn. Elasticityis achieved by fine-grained performance probing, dynamicload balancing, and proactive multi-edge node connections perclient. Evaluations are conducted in both real-world volunteerenvironments and emulated platforms to show how a commonedge application, namely AR-based cognitive assistance, canbenefit from our approach and deliver low-latency responses todistributed users at scale.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Edge computing, a computing paradigm that brings computation closer to data sources and end-users, has enabled the deployment of emerging edge applications such as AR/VR, cognitive assistance, and autonomous vehicles. Offloading workload from devices to powerful edge servers that can run complex machine learning algorithms is necessary to resolve the device-side limitation. The demand for these latencysensitive and compute-intensive applications is starting to increase rapidly and requires the edge to be highly available and scalable.</p><p>However, elasticity is a well-known limitation of edge resources. Edge applications are typically pinned to a single pre-deployed edge site. A burst of incoming workload can easily overwhelm an edge site causing service performance degradation. While resource allocation optimizations <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref> and workload adaptation strategies <ref type="bibr">[3]</ref>, <ref type="bibr">[4]</ref>, <ref type="bibr">[5]</ref> can mitigate the contention and improve utilization, edge resource capacity is still a bottleneck without hardware scaling. Furthermore, geodistributed users require wide edge availability to provide lowlatency edge access. Existing public edge infrastructures <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref> only serve major metropolitan areas, and on-premise edge deployment solutions <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>, <ref type="bibr">[11]</ref> rely on costly/private resources to only serve certain users at specific geo-locations. Today's edge presence density <ref type="bibr">[12]</ref> is far from delivering the expected geographical coverage.</p><p>With the prevalanence of powerful personal PCs/laptops, we argue that sufficient compute power with low latency is already in the vicinity of end users. Volunteer resources are widely distributed with unlimited potential to scale cost-efficiently under appropriate incentive models. They tend to reside in the same local ISP networks with nearby end users, which minimizes the number of routing hops and network jitter. Figure <ref type="figure">1</ref> shows that nearby volunteer edge nodes can deliver lower RTT propagation delay compared to the latest AWS Local Zone infrastructure in the specified area.</p><p>In this paper, we propose using volunteer resources and geo-distributed edge resources as an enabler of elastic edge computing, and introduce the concept of a fine-grained edge resource model, namely geo-distributed heterogeneous edgedense environments. This model relies on heterogeneous, unreliable yet densely-distributed commodity machines on the edge to provision latency-sensitive services. We then formulate a latency optimization problem, and present a distributed edge selection approach that leverages client-centric views to locate the best-performing edge node for each user and optimize average end-to-end latency. Specifically, we answer the following questions: the presence of system dynamics?</p><p>&#8226; How to guarantee continuous service in the presence of high node churn and failure rate? Our contributions are as follows: First, we present the notion of geo-distributed heterogeneous edge-dense environments, a novel edge resource model, using (but not limited to) volunteer resources and non-local resources (as appropriate) as an enabler. Second, we design and implement a 2-step distributed edge selection approach that optimizes global average end-toend latency through a simple local selection heuristic. We use a user-side performance probing strategy as a key mechanism to accurately evaluate edge candidates, with a focus on the environment heterogeneity and workload interference. Further, our approach proactively establishes multiple client-to-edge connections to provide: (i) fault tolerance by enabling an immediate connection switch to alternate edge nodes upon failure, and (ii) improved performance by switching to a better performing edge node in the presence of system dynamics.</p><p>The evaluation is conducted using an AR-based edge application in both real-world environments with 9 volunteer edge nodes, and emulation environments with 18 AWS EC2 instances in controlled node churn. The real-world results show that our approach can deliver near-optimal average endto-end latency in a dynamic environment, and achieve 18%-46% latency reduction compared to resource-aware, localitybased and dedicated-edge-only approaches under high user demand.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. BACKGROUND AND RELATED WORK</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Edge Computing Infrastructure</head><p>To cater to geo-proximity, strong QoS, and privacy requirements of targeting users/customers at specific areas, enterprises and organizations tend to deploy their own on-premise hardware on the edge. The logical proximity, defined as low-latency high-bandwidth communication channels between users and edge servers, is usually provided by a LAN or dedicated networking infrastructure. This category of specialpurpose deployments can deliver stable latency performance, but is extremely expensive to maintain and lacks scaling capacity.</p><p>Today, major cloud providers are aggressively rolling out their public edge infrastructures <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref>. Smaller-scale data centers are directly deployed in major metropolitan areas, providing generic VM-based computing capacity and general access to the public. The point-of-presence (PoP) of this category of edge resources is largely limited to metro areas with their numbers being only one order of magnitude higher than those for a traditional cloud. Our observations on AWS Local Zone (Figure <ref type="figure">1</ref>) also show that its deliverable latency is much higher than the claimed single-digit millisecond level to end users due to the networking overhead within the local ISP network.</p><p>We categorize today's on-premise and public edge infrastructures into a coarse-grained edge resource model such that they are sparsely-distributed geographically with hundreds of PoPs worldwide. These edge data centers are smaller footprint extensions of the cloud relying on the same technology stack, and users at specific locations only have one (sometimes zero) or very few nearby options of edge access points to satisfy their QoS requirements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Edge Elasticity and User Allocation</head><p>Edge elasticity has been improved by efficient resource allocation and service scheduling <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref> using workload prediction, fine-grained provisioning and proactive service deployment as common strategies. Application-specific adaptive streaming <ref type="bibr">[3]</ref>, <ref type="bibr">[4]</ref>, <ref type="bibr">[5]</ref> is another technique to reduce resource contention while satisfying performance requirements by dynamically adjusting requesting rate and QoS level. However, these works cannot enable the hardware scaling capacity on the edge which is the fundamental problem constraining the edge elasticity.</p><p>In mobile edge cloud (MEC) environments with densely distributed edge nodes, studies <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref> have converted the elasticity problem to a user allocation problem optimizing user-to-edge assignments towards resource utilization and latency performance. However, these works take server-centric views without examining the deliverable latency performance on the user side. A distributed iterative algorithm <ref type="bibr">[15]</ref> is proposed to optimize user allocations by examining user-side QoE, however, environment heterogeneity is not taken into consideration. A resource allocation scheme is applied in these works to strictly avoid workload interference and performance degradation, however, an overloaded but well-connected edge node can still provide better performance if an idle edge node suffers from the networking overhead.</p><p>We are unaware of any edge system that combines both edge-dense volunteer and dedicated nodes and geo-distributed resources simultaneously.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Volunteer Computing</head><p>While volunteer computing has mostly been applied to scientific computation <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>, it has also been proposed to enable low-latency services on the edge <ref type="bibr">[18]</ref>. There have been several works <ref type="bibr">[19]</ref>, <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref> bringing conceptual designs of volunteer-based or volunteer-assist systems for latencysensitive workloads over emulation environments. However, they lack quantitative analysis for volunteer resource characteristics, addressing networking/capacity heterogeneity, availability and reliability in real-world environments.</p><p>Multi-provider edge clouds have been proposed in both industry and academia <ref type="bibr">[22]</ref>, <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>, as a similar paradigm related to volunteer resources. It is based on a coarse-grained resource model with edge data centers as contributors, in contrast to individual devices in the volunteer computing landscape. Nebula <ref type="bibr">[25]</ref> leverages geo-proximate volunteer workers to optimize MapReduce workloads, considering data locality, resource availability and bandwidth heterogeneity. It targets data-intensive batch jobs where client latency is not a concern.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Performance Prediction on the Edge</head><p>Edge-native applications <ref type="bibr">[26]</ref> have been proposed to take advantage of edge servers, and <ref type="bibr">[3]</ref> uses static profiling to obtain their processing performance before optimizing resource allocation. Proactive profiling, however, introduces significant overhead before edge nodes join the system acting as workers. Wagner and Gedeon et. al introduce a learning-based assessment framework <ref type="bibr">[27]</ref> to estimate the workload performance on heterogeneous resources, and reduce profiling overhead by only performing a few runs before the prediction. However, a separate profiling process still exists with compromised accuracy. Several studies <ref type="bibr">[28]</ref>, <ref type="bibr">[29]</ref> propose methods based on live workloads instead of test workloads to profile applications on heterogeneous hardware. However, these studies focused more on the data flow and streaming applications and are limited to data centers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PROBLEM DEFINITION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Geo-distributed Heterogeneous Edge-Dense Environment</head><p>We present a fine-grained edge resource model called geodistributed heterogeneous edge-dense environments that supplements deployed edge infrastructure with volunteer edge computing resources. The main benefit of this model is to substantially scale the availability of edge resources, both in terms of their geographic spread (e.g., in rural areas), as well as proximity to users (to reduce network latency). Compared to hundreds of edge data centers inherited from cloud technologies, the number of volunteer edge nodes can be orders of magnitude greater than cloud PoPs. Potentially millions of volunteer-based commodity machines are densely distributed around users contributing computing power and providing low-latency edge access, as greener complements of existing edge infrastructures discussed in Section II-A. This edge resource model has the following characteristics: &#8226; Unreliable edge node: Volunteer edge nodes are assumed to be unreliable and unpredictable with high node churn. They can join and leave the system anytime without notifications. In addition, edge nodes can run unexpected higher priority host workloads competing with existing edge services. We believe our model is richer and more elastic than emerging edge-dense environments such as in MEC models, where edge compute capacity co-locates with densely distributed 5G base stations. In MEC environments, dedicated edge servers, fully managed by telecom companies, tend to be highly reliable and homogeneous. The user-to-edge connectivity is mainly affected by first wireless hop characteristics <ref type="bibr">[30]</ref>, while the user-to-edge connectivity in our proposed model is determined by local ISP infrastructures and unpredictable networking conditions. Our model has more flexibility in deployment as it does not require resource co-location with 5G base stations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Edge Application Model</head><p>We adopt a widely used edge application model consisting of the pre-deployment of application servers (e.g. image processing) to a set of edge nodes. The user selects an edge node for their requests to be out-sourced. For simplicity, we consider a single application server type in this paper, but our model can be extended to support any number of application server types. An application manager manages each application service type in the system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Problem Formulation</head><p>Based on the environment definition above, we formulate a latency optimization problem. We first define the problem in a static manner without the churn of application users and edge nodes, and then discuss its generalization in a dynamic system.</p><p>Consider a geo-distributed heterogeneous edge-dense environment with n users and m edge nodes in a specified area, an Edge Assignment (EA) refers to a users-to-edge match that assigns each user u i (1 &#8804; i &#8804; n) an edge node e j (1 &#8804; j &#8804; m) to offload computation. An EA contains n pairs as shown below:</p><p>We can also express the EA above from the view of edge nodes. For an edge node e j , we use S j to denote the set of users attached to it, namely S j = {u i1 , u i2 , . . . , u ikj } where 0 &#8804; k j &#8804; n and k 1 + k 2 + . . . + k m = n. Note that each user only offloads computation to one edge node while one edge node can serve multiple users concurrently. Therefore, it's equivalent to express the EA as follows:</p><p>Assuming the user u i selects the edge node e j under the given EA, then its performance, namely the end-to-end latency, is expressed as the sum of D_prop j i , D_trans j i , and D_proc(e j , S j ). D_prop is the RTT propogation delay, D_trans stands for the data transfer delay of each request, and D_proc stands for the queuing delay and processing time within the edge node. Specifically, D_proc is determined by the heterogeneous hardware/capacity of node e j and its current workload denoted by S j . The resource contention caused by overloading and capacity limitation can easily become bottleneck and dominate the latency performance, leaving edge proximity not effective. We use the following expression to describe the average end-to-end latency under the given EA:</p><p>(D_prop ji i +D_trans ji i +D_proc(e ji , S ji ))</p><p>In a system with n users and m edge nodes, there are totally &#934; = m n possible EAs since each user has potentially m options towards the edge selection in an edge-dense environment. The objective is to find such an EA that minimizes the average end-to-end latency of n users:</p><p>This is a NP-hard problem without a simple solution. Furthermore, knowledge of D_prop, D_trans and D_proc are unknown information from a central view to make global decisions. D_prop and D_trans are only subject to clientcentric views. And D_proc is determined by current workload and server hardware/platform. It requires a costly profiling process to obtain the processing time under specific configurations, which is impractical to execute on every single edge node in volunteer environments.</p><p>The problem becomes more complicated when both application users and volunteer edge nodes are highly dynamic with unpredictable node churn. How to construct an efficient local search solution with effective heuristics becomes critical to reach the new optimal state faster from the current state in the presence of system dynamics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. PROPOSED APPROACH</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Architecture Overview</head><p>Given an edge-dense environment, each user may have a choice of multiple edge nodes. A key question is how to select resources for individual clients so as to meet desired performance goals such as minimizing latency. We present a distributed edge selection approach applied in a geo-distributed volunteer-based edge cloud system <ref type="bibr">[31]</ref>. Figure <ref type="figure">2</ref> shows a simplified version of this edge cloud architecture to emphasize the edge selection process. Central Manager collects real-time node status/resource utilization information from edge nodes to serve edge discovery queries discussed below.</p><p>We use a 2-step approach: global edge selection on the manager or server side followed by local edge selection on the user side, to accurately locate the best edge candidate for each user and at the same time reduce profiling overhead. Our approach leverages client-centric views to closely monitor edge environments, predict heterogeneous edge performance, and make local edge selection decisions by using a heuristic that is geared toward optimizing average end-to-end latency. Our approach contains four critical client-centric modules (in grey boxes) as shown on the left side of Figure <ref type="figure">2</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Edge Discovery and Global Edge Selection</head><p>Edge discovery happens when new users join the system or existing users periodically query environment updates. A manager-side global edge selection process is then triggered to locate nearby edge access points for requesting users. However, manager views alone cannot entirely identify the environment heterogeneity and edge status without end-toend network probing and real-time performance prediction. We therefore employ a 2-step approach that first generates an approximate candidate edge list on the manager side, followed by a local selection process in client-centric views.</p><p>The candidate edge list is a small subset of edge nodes that are expected to provide low latency responses for specific users. Global edge selection policies incorporate multiple factors that affect user end-to-end latency, including geo-proximity, node capacity/load, resource utilization, and optionally-provided network affiliation information between users and edge (indicating existing LAN or preferred networking channels).</p><p>We first apply a geo-proximity filter to rule out unqualified nodes, and then prioritize the local candidates based on resource availability, network affiliation and user preferences. Specifically in geo-proximity search, we use GeoHash <ref type="bibr">[32]</ref> to identify a wider-range geographical area to include remote nodes which may be useful as a last resort (if all local nodes are performing poorly). Selection filters and sorting policies can be flexibly combined/modified to prioritize available edge nodes towards different application requirements. Since final decisions are eventually made on the user side, the global edge selection of our 2-step approach is coarse-grained with high tolerance to edge selection inaccuracy and mismatch.</p><p>We use the concept TopN to indicate the size of candidate edge list, referring to the top N edge nodes in the sorted list of the global edge selection results. Multiple choices can deal with both node churn and other system dynamics. This is an important configuration parameter in our approach. Larger TopN value brings higher accuracy and flexibility to the edge selection process by providing more edge options to end users at the local selection step. It also improves the fault tolerance by enabling users to proactively establish redundant connections to multiple edge nodes in preparation for potential node failures. However, larger TopN value can also bring higher probing and synchronization overhead as discussed in the following sections. In our evaluation, we examine the  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Performance Probing</head><p>To accurately address the heterogeneity of real-world edge environments and avoid costly profiling overhead on various edge devices, we use a real-time probing approach to predict edge performance during runtime. Performance probing requests are initiated by application users to collect (1) end-toend networking metrics to each edge candidate, and ( <ref type="formula">2</ref>) "whatif" processing performance on these candidates if they are selected for computation offloading. Furthermore, information related to existing workloads on each edge node is also reported in probing results to optimize global average latency (discussed in section IV-D). Table <ref type="table">I</ref> describes APIs exposed by edge nodes to interact with probing users.</p><p>1) Probe networking delay: Networking delay for an offloading request comprises RTT propagation delay (D_prop) and data transfer overhead (D_trans) introduced by constrained bandwidth. While probing D_prop is fairly easy, probing D_trans consumes available bandwidth and competes with existing networking traffic. Therefore, we only conduct D_prop probing to avoid interfering with concurrent workload. Our observations show that D_prop alone is sufficient to evaluate end-to-end connectivity for typical latencysensitive offloading workloads such as AR-based wearable assistance. The reasons are twofold as follows.</p><p>First, edge resources can only provide low/modest improvements to bandwidth-hungry applications like video streaming <ref type="bibr">[12]</ref>, since edge proximity improves propagation but not necessarily transmission unless multiple single hop network options are available. We target interactive and compute-intensive workload where bandwidth is not the bottleneck compared to processing time and propagation delay. Second, for AR-based wearable assistance applications, requests carry video frames for image processing and responses are lightweight instructions related to detected objects. User outbound bandwidth usually becomes the data transfer bottleneck, which is determined by network access method/ISP configurations/traffic plans. Edge selection has limited effect on first-hop data transfer performance.</p><p>2) Probe processing delay: A user offloads requests to the selected edge node for real-time processing, and we take the compute time of single requests as the value of D_proc. To avoid time-consuming profiling and to improve the accuracy of performance prediction, we invoke a test synthetic workload to simulate "new-user-join" scenarios. The test workload is based on the same application logic and compute requirements as the real offloading task. Its invocation generates the "whatif" performance upon new user/request arrivals, which reflects specific hardware/platform configurations and resource contention due to existing workload. Taking the AR application as an example, the test workload is image processing for a single synthetic video frame with standard image size.</p><p>To minimize the overhead, "what-if" performance is cached in the node memory responding to probing queries, and we only invoke the test workload to update the "what-if" performance upon node state changes. There are three scenarios for which a test workload is invoked: (1) A new user is attached to an edge node -workload increase. (2) An existing user stops offloading requests or switches to another edge nodeworkload decrease. (3) Performance monitor in edge nodes reports noticeable change of processing time under the same number of attached users. This can be caused by adaptive request rate (e.g. varying FPS for AR applications) or higher priority host workloads running on volunteer that are out of our control.</p><p>Offloading request rates for different users can dynamically change according to networking conditions, processing speed and user preferences, which leads to varying amount of workload under the same number of attached users. The performance monitor trigger enables the system to deal with workload dynamics in a fine-grained manner. The more aggressively test workload is triggered, the more accurately the performance is predicted along with higher overhead.</p><p>When one of the above scenarios occurs, new "what-if" performance is measured and updated in the cache. Upon receiving P rocess_probe() requests, edge node only reads the cache to fetch the "what-if" performance instead of invoking a new test workload. Therefore, a large number of probing requests do not necessarily lead to more test workload invocations and higher overhead.</p><p>3) Join selected edge node: We add up RTT propagation delay from RT T _probe() and "what-if" processing time from P rocess_probe() as the predicted performance of edge candidates. Based on the performance prediction and selection policies, each user selects a candidate to start offloading workload. Since probing requests are conducted asynchronously, a selection conflict happens when multiple users simultaneously select the same edge node. Probing predictions are inaccurate in this case since the "what-if" performance only reflects the one-additional-user scenario. Therefore, we use the Join() interface to synchronize edge selection procedures, and users only start offloading workloads after their Join() requests are accepted by selected edge nodes.</p><p>Join synchronization is implemented using seqN um which incrementally identifies the node state change. seqN um is updated along with test workload invocation in critical sections. They share the same three triggers discussed in IV-C2. Algorithm 1 shows the Join() interface which corresponds to Algorithm 1 Join() API exposed by edge nodes (Critical section locks are hidden for simplicity) if not success then &#9655; This join request is rejected 15:</p><p>Repeat probing process from edge discovery step 16:</p><p>end if</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>17:</head><p>Current.Leave()</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>18:</head><p>Current &#8592; C[0].address 19: end if 20: Backups &#8592; C[1:].addresses the first trigger type -user join. Note that the test workload is invoked asynchronously in a separate thread with some delay (line 5), since it should reflect the "what-if" performance after this new accepted user is officially attached. This delay is set to be two times the common user RTT propagation to make sure the test workload is invoked after the arrival of the new user offloading requests.</p><p>Algorithm 2 shows the user-side edge selection procedure with join synchronization. After collecting the probing results (line 4 -10), local edge selection policies are applied to sort edge candidates. We discuss local selection policies in Section IV-D. The first edge node in the sorted list is the bestperforming candidate. If it is the same node as the currentlyused node for offloading, no action is required besides updating the backup edge list (used by failure monitor in section IV-E). If a better candidate is found, then Join() request is sent to this selected node. Upon join acceptance, users call Leave() to notify the previous edge node of leaving, which triggers the test workload invocation and seqN um update.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Local Edge Selection</head><p>In this section, we discuss local edge selection policies applied in SortLocalSelectionP olicy() (Algorithm 2 line 11). We use LO j to represent the probing result or the Localview Overhead of edge candidate j. The edge node with smallest LO is the Best Local Candidate (BLC) which can deliver the lowest latency performance:</p><p>However, this selection scheme neglects the interference to existing workloads on each candidate. Assume D_proc current is the processing delay for existing attached users on an edge node, and D_proc probing is the "what-if" processing delay on this edge node, then D_proc probing -D_proc current is the performance degradation that can be experienced by all existing users if one additional user is accepted to join. To optimize the average performance, we should consider both the local and global views to minimize the end-to-end latency for all users. We use GO j to denote the Global Overhead of edge candidate j (n is the number of existing users on candidate j):</p><p>Note that the information of existing workloads on each candidate is shared with the requesting user at line 8 in Algorithm 2. Therefore, we have the following expression to optimize the global average performance:</p><p>In common scenarios, LO and GO are positively correlated towards affecting both individual performance and global average performance, since an edge candidate with multiple existing users tends to have high LO value (performance degradation caused by overloading) and high GO value (larger n in GO equation) at the same time.</p><p>Our framework based on LO and GO can be easily modified to optimize edge applications with specific QoS requirements. Users can first filter out edge candidates whose LO violates QoS requirements and then select the node with lowest GO to optimize global performance. In this case, new users can be rejected to join the system if (1) no available edge nodes can satisfy the QoS requirements, or (2) new joins lead to QoS violations of existing users.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Failure Monitor</head><p>Static user-to-edge assignment is susceptible to failures, since volunteer edge resources are unreliable with high node churn. Failure monitor detects the connection failures and immediately replaces the edge node by looking up the backup edge list to guarantee continuous service. Backup edge list is comprised of unselected edge nodes from the candidate edge list. It is pre-sorted based on the local selection policies discussed in section IV-D, therefore the first backup node used is always the second best option. The backup edge list is periodically updated by the performance probing step as shown in Algorithm 2, and connections to backup edge nodes are proactively established to make the service downtime during connection switch negligible.</p><p>Specifically, the probability of users still experiencing node failures in our approach is determined by the following two parameters:</p><p>&#8226; TopN value: T opN is the size of candidate edge list generated by the global edge selection process. T opN -1 is the size of backup edge list after each update of probing requests. Larger T opN value adds more backup edge nodes maintained on the user side, and therefore enables users to be more fault tolerant to simultaneous edge failures. &#8226; Probing period T probing : T probing is the time period between two consecutive edge discovery/performance probing requests. It indicates the frequency with which users query the Application Manager for environment changes. The smaller T probing , the more frequent the backup edge list gets updated, during which failed edge nodes get replaced with alive ones. Therefore, smaller T probing brings higher robustness. As a tradeoff, higher T opN and smaller T probing also bring higher overhead to the system by increasing the number of asynchronous operations (detecting, probing and updating). Based on the level of node churn and reliability of volunteer resources <ref type="bibr">[33]</ref>, T opN and T probing can be modified accordingly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EVALUATION</head><p>We evaluate our approach using an AR-based cognitive assistance application, in both real-world and emulated environments, to show its validity in achieving elastic edge computing with volunteer resources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Application: AR-based Cognitive Assistance</head><p>AR-based cognitive assistance is one promising category of edge-native applications <ref type="bibr">[26]</ref> that can fully exploit the potential of edge resources. Its core building block, real-time object detection, is highly compute-intensive with strict latency requirements to deliver a satisfying user experience.</p><p>We integrate the proposed distributed edge selection approach into a cognitive assistance application that helps visually impaired people to identify objects. Users constantly send video frames to edge servers at a max rate of 20 FPS (which can adaptively decrease based on the network and processing performance). All video frames have the standard size of 0.02 MB after encoding, with the same compute requirements. After queuing, decoding and processing of the frame, lightweight cognitive assistance instructions (negligible size) are returned back to users based on the detected objects and application-specific assistance logic. The test workload in our approach is configured to process one pre-loaded synthetic video frame on edge nodes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Evaluation Baselines</head><p>We use the global average end-to-end latency collected on the user side as the evaluation metric to determine the overall elasticity performance in geo-distributed volunteer environments.</p><p>We contrast the proposed distributed edge selection approach with the following performance baselines:</p><p>&#8226; Geo-proximity: Locality-based edge selection policy is widely applied in coarse-grained edge data center environments. Users are assigned to their closest edge nodes geographically to offload the computation. The latency between users and edge nodes is assumed to be proportional to the distance, and resource capacity is not considered to be the bottleneck. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Real-world Experiments</head><p>We first show the adaptability and overall effectiveness of our approach in real-world volunteer environments.</p><p>1) Experimental Setup: We recruit 20 participants with heterogeneous access network. They are all within 10 miles away from each other in Minneapolis-Saint Paul metropolitan area, with 15 participants being users and 5 participants using their laptops registered as volunteer edge nodes. We also register 4 additional AWS Local Zone instances into our volunteer platform to illustrate that our approach can also be applied  to form a hybrid edge cloud consisting of both volunteer and dedicated resources. We use these 4 instances alone to demonstrate the dedicated-only scenario for comparisons.</p><p>Table <ref type="table">II</ref> shows the hardware details of all participants in the system, along with their heterogeneous processing performance for a single application video frame.</p><p>2) Single-user View: Figure <ref type="figure">3</ref> shows the end-to-end latency perceived by a random user to 3 volunteer nodes and 1 AWS Local Zone instance. We can see that volunteer nodes (V1 and V2) can deliver better performance compared to dedicated nodes (D6) in real-world environments. This is because V1 and V2 have lower network latency to the user as compared to D6, thus reducing the overall latency.    Table <ref type="table">III</ref> shows the pairwise latency performance between 3 random users and edge nodes, along with selection results using the proposed distributed edge selection approach. The experiment is conducted separately for 3 users to avoid interference. We configure T opN = 6 on each user to guarantee that all volunteer edge nodes are selected in the candidate edge list. Therefore, the performance probing approach probes all edge nodes in the system to collect probing results. Best-performing nodes are accurately selected for 3 users, addressing the networking and processing heterogeneity.</p><p>Figure <ref type="figure">4</ref> shows the end-to-end latency performance trace for consecutive video frames upon node failure. While a reconnection approach experiences a large service downtime to re-discover an alternative edge node upon failure, our approach can immediately switch to a backup edge node maintaining the continuous service.</p><p>3) Edge Elasticity: To explore the elasticity performance of the proposed approach, we incrementally let all 15 application users join the system to evaluate the global average end-toend latency. Figure <ref type="figure">5</ref> shows the average performance for our approach (Client-centric) vs. 4 baselines in V-B. As we can see in the figure, our approach can better balance the load with increasing number of attached users. Locality-based approach selects the closest nodes without examining the actual networking connectivity and processing capacity. Resource-aware approach checks the resource utilization, however, the heterogeneous hardware/networking environments lead to unexpected performance. Dedicated-only scenario lacks hardware scaling flexibility upon increasing workload, which results in a worse-than-cloud performance at #user = 15 due to significant performance degradation caused by overloading.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Emulation Experiments</head><p>We conduct emulation experiments in AWS cloud to simulate a wider geographical distribution of users/nodes, with flexibly controlled users/nodes numbers, networking performance and more importantly node churn in a dynamic environments. We discuss two experiment configurations in the following sections.</p><p>1) Static edge nodes with increasing number of users: In the first emulation experiment, we incrementally add randomly-distributed users into the system one by one, as a way to examine each user's behavior towards the edge selection and performance degradation due to resource contention. We keep the number of edge nodes and distribution static for simplicity.</p><p>Experimental setup: We set up 9 volunteer edge nodes (4 &#215; t2.medium, 4 &#215; t2.xlarge, 1 &#215; t2.2xlarge) and 15 application users (15 &#215; t2.micro), such that they are simulated to be within 50 miles away from each other. We configure the pairwise networking performance (latency/bandwidth) using tc with real-world measurement data. Specifically, the RTT propagation delay is between the range of 8 to 55ms in the corresponding geo-distribution.</p><p>Performance trace: Figure <ref type="figure">6</ref> shows the latency performance trace of each individual user using three edge selection methods. 15 users join the system one after another every 10 seconds, indicated by vertical lines in the figure . 
While locality-based method (a) should have performed better in a wider range geo-distribution, it lacks the flexibility upon resource contention. A few users in (a) exceed 150ms latency due to the overload of local nodes, when an idle and relatively far-away node can deliver better end-to-end latency. Fig. <ref type="figure">6</ref>: Performance trace for 15 users using different edge selection methods. 10 edge nodes are static with users incrementally joining the system (every 10 seconds) Fig. <ref type="figure">7</ref>: Average end-to-end latency perceived by all users using different edge selection approaches (b) performs better than (a) since resource contention is fairly balanced to all edge nodes. However, resource-aware selection cannot identify the network heterogeneity between users and nodes to tradeoff resource availability and faster networking channel. Our approach (c) can assign all users a low-latency edge node by combining networking and processing performance probing. And we can see that dynamic load balancing happens due to the proactive multi-node (TopN) connections, where some of the users switch their selected edge nodes during the processing when a better option is found upon varying workloads.</p><p>Comparison to optimal selection: We next compare the different edge selection approaches to optimal selection. Figure <ref type="figure">7</ref> shows the average performance after all users join the system (after 150 seconds in Figure <ref type="figure">6</ref>). We calculate the optimal edge assignment for this specific configuration based on the application profile on three types of EC2 instance we use and the emulated network setup. The figure shows that our approach has about 12% higher latency than the optimal, 2) Static users with high edge node churn: In the second emulation experiment, we apply a high node churn to edge nodes (along with distribution changes) to examine dynamic load balancing behavior of our approach. We configure different TopN values to explore its influence on the fault tolerance, system overhead, user-side latency and fairness performance. The number of users is static for simplicity.</p><p>Experimental setup: To model the node churn of volunteer edge nodes, we assume that the probability of nodes joining the system every 30 seconds follows the Poisson distribution (k = 4 edge nodes). Arriving nodes are randomly assigned a timestamp (second) in each 30 seconds period. And the lifetime of edge nodes is modeled using Weibull distribution (average lifetime = 50 seconds). We randomly select a configuration from multiple runs of this process, which results in a total of 18 edge nodes over a 3-minute timeline as shown in Figure <ref type="figure">8</ref> (grey stair line). Then we randomly match 18 simulated edge nodes with 18 AWS ec2 instances (8 &#215; t2.medium, 8 &#215; t2.xlarge, 2 &#215; t2.2xlarge) which are configured in the same way as section V-D1 with 10 static users (10 &#215; t2.micro).</p><p>Performance trace: Figure <ref type="figure">8</ref> shows the average performance of 10 static users (T opN = 3) at each timestamp, along with the number of alive nodes over time based on our node churn model. It gives us the correlation between average performance and edge resource availability. Whenever new edge nodes join the system (upward steps), the average latency correspondingly decreases within seconds. This implies effective dynamic load balancing since users can immediately discover and switch to newly joined edge nodes, benefiting from the periodic performance probing approach. When edge nodes leave the system (downward steps), the average latency does increase but there is no service disruption, since a second best backup edge node can immediately take over the workload without any service downtime to users.</p><p>Effect of TopN: Based on the same node churn model, we run the experiment multiple times using different T opN values to explore its influences on the following factors:</p><p>System overhead: The system overhead of probing is primarily due to the number of test workload runs on the edge nodes. Figure <ref type="figure">9</ref> (a) shows that the number of probing requests increases linearly with T opN value, however, the increase in number of test workload invocation is much smaller (Figure <ref type="figure">9</ref> (b)). This is because test workload is invoked upon 3 scenarios referring to node state changes, while probing requests only access the cache to fetch the historical data.</p><p>Overall performance: Figure <ref type="figure">9</ref> (c) shows that the overall end-to-end latency for different T opN values are fairly close with T opN = 3 slightly better than the others. They have similar performance since they can all benefit from the periodic performance probing mechanism. Even T opN = 1 can obtain a fresh edge node every period of time and compare its performance with the current node to decide if a betterperforming option is available. Higher T opN can enhance this benefit by probing more options to locate the best choice. However, there is a point of diminishing returns as overhead begins to outpace any benefit beyond T opN = 3. In our experiments, T opN = 3 has worked well.</p><p>Fairness: Figure <ref type="figure">9</ref> (d) shows the standard derivation of average performance. Higher values refers to higher performance variation and lower fairness across all the users. While all 5 T opN values can deliver similar latency performance, lower T opN values lead to weak system fairness due to less probing options during the selection process.</p><p>Fault tolerance: Figure <ref type="figure">10</ref> (a) shows the latency difference between reactive connection (re-connect) and proactive connection (our approach) upon node failure. We regard the re-connect situation as a failure since it leads to an unacceptable delay gap for latency-critical applications. It only happens when all backup edge nodes fail simultaneously in our approach.</p><p>Figure <ref type="figure">10</ref> (b) shows the number of failures experienced by all users under different T opN values. T opN = 1 stands for 0 backup edge nodes on the user side, which corresponds to common user-to-edge policies. We can see that T opN = 2 can dramatically reduce the number of failures with 1 backup edge node in our node churn model. Starting at T opN = 3, the number of failures can be reduced to 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION</head><p>In this paper, we gave a detailed discussion of volunteer resource characteristics in the edge computing landscape, and proposed the use of geo-distributed heterogeneous edge-dense environments. We used this new edge resource model to formulate a latency optimization problem. We then presented a distributed edge selection solution that leverages clientcentric views to identify the environment heterogeneity and optimize global average end-to-end latency. We validated the effectiveness of the proposed approach in both real-world and emulation environments, and showed that our approach can deliver near-optimal performance in a dynamic environment.</p></div></body>
		</text>
</TEI>
