<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Dynamically Creating Custom SDN High-Speed Network Paths for Big Data Science Flows</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2017 July</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10039957</idno>
					<idno type="doi">10.1145/3093338.3104155</idno>
					<title level='j'>Practice &amp; Experience in Advanced Research Computing Conference (PEARC 2017)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Sergio Rivera</author><author>Mami Hayashida</author><author>James Griffioen</author><author>Zongming Fei</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Existing campus network infrastructure is not designed to effectively handle the transmission of big data sets. Performance degradation in these networks is often caused by middleboxes -- appliances that enforce campus-wide policies by deeply inspecting all traffic going through the network (including big data transmissions). We are developing a Software-Defined Networking (SDN) solution for our campus network that grants privilege to science flows by dynamically calculating routes that bypass certain middleboxes to avoid the bottlenecks they create. Using the global network information provided by an SDN controller, we are developing graph databases approaches to compute custom paths that not only bypass middleboxes to achieve certain requirements (e.g., latency, bandwidth, hop-count) but also insert rules that modify packets hop-by-hop to create the illusion of standard routing/forward despite the fact that packets are being rerouted. In some cases, additional functionality needs to be added to the path using network function virtualization (NFV) techniques (e.g., NAT). To ensure that path computations are run on an up-to-date snapshot of the topology, we introduce a versioning mechanism that allows for lazy topology updates that occur only when "important" network changes take place and are requested by big data flows.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">BIG DATA TRANSMISSION ISSUES</head><p>Methods to process, manipulate, and transform large data sets are evolving at a remarkable rate. However, university researchers often encounter frustratingly slow transmission speeds when trying to move these data sets from one device to another (e.g., sharing it with a collaborator). Oftentimes, this performance degradation is caused by a combination of performance-limiting middleboxes (appliances that enforce policies on all traffic going through the network), and competing non-research traffic (e.g., streaming Netflix) that consumes the university's limited network resources.</p><p>To address this problem, campuses have adopted Science DMZs [1] (Figure <ref type="figure">1</ref>) as the mechanism to avoid performance-limiting middleboxes. In this approach, machines that need high-throughput communications are placed on a separate network (i.e., a Science DMZ) hanging off the router at the junction between the campus network and the Internet. While these privileged machines can send/receive traffic without any performance bottlenecks, they give up the security, protection and policing offered by middleboxes. We are developing an all-campus Science DMZ solution based on Software-Defined Networking (SDN) <ref type="bibr">[4]</ref> called VIP Lanes <ref type="bibr">[2,</ref><ref type="bibr">6]</ref> that allows researchers to boost transmission speeds for privileged scientific flows while retaining middlebox security and protection for all other flows.</p><p>In VIP Lanes, our control software computes paths that not only bypass middleboxes to achieve certain requirementse.g. the fastest path (latency constraint), the widest path (bandwidth constraint), or the shortest path (hop-count constraint) -but also modify packets as needed (e.g., changing MAC addresses or VLAN ids) to create the illusion of standard routing/forward despite the fact that packets are being rerouted. In some cases, additional functionality needs to be added to the path using network function virtualization (NFV) techniques such as network address translation (NAT). The resulting packet transformations (i.e., OpenFlow <ref type="bibr">[5]</ref> rules) are then written to the campus SDN routers/switches to reroute pre-approved flows through bottleneck-free paths.</p><p>Rather than implementing the underlying path computation algorithms using imperative programming languages, and possibly introducing bugs, we calculate paths via natural declarative queries in Neo4j 1 , a production-ready graph database we use to store versioned fine-grained topology information (e.g., link capacities, vlans, latency, etc) and VIP Lanes flow-related data. We developed an effective topology versioning mechanism that keeps the topology view of the network in Neo4j and the controller's view in sync. In that way, every path request is guaranteed to be feasible, valid, and dynamically calculated using the most up-to-date snapshot of the network before it is translated into low-level OpenFlow rules installed at every switch along the computed route.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">GRAPH DB PATH CALCULATION</head><p>Figure <ref type="figure">2</ref> illustrates the workflow and events involved while requesting a high-throughput VIP Lanes path for a science data transfer. A request can be specified via (1) a web interface or (2) a wrapper library. In the former, users have to manually provide information about the communication that is about to happen, including details like source/destination IP addresses, destination port, and flow duration. In the latter, all the information about the VIP Lanes is determined by a wrapper library that is dynamically loaded along with existing (legacy) applications; the VIP Lanes information is computed when the socket is created and the connection is being established, triggering a request to the path computation service to set up the VIP Lanes. In either case, the request needs to be approved by our permission system (described in <ref type="bibr">[2]</ref> and depicted by the Auth N/Z Service in Figure <ref type="figure">2</ref>).</p><p>After being granted permission, the path computation service attempts to find a bottleneck-free path with the provided information. If the resulting path is feasible, a versioned install request is sent to the controller and further transformed into OpenFlow rules. Otherwise, an error is reported back to the user/application.</p><p>While the path computation could be written in a conventional imperative programming language by calling existing graph libraries, computing the custom paths needed by VIP Lanes would require tailoring the path computation algorithms included in these libraries to handle networks made up of heterogeneous elements, which is an error-prone and time-consuming task for a network programmer/operator. Instead, we opted to leverage the built-in capabilities of the Neo4j graph database to perform the path computation and topology data maintenance within the database. The main reasons for our decision include: (1) the Cypher graph DB language -the declarative query language for Neo4j that simplifies the maintenance of topology data and provides an intuitive syntax to construct our own constrained queries, including path computations; (2) a direct mapping from a network topology (devices, 1 The Neo4j &#174; Graph Database can be found at <ref type="url">https://neo4j.com</ref>  links) into the same representation in the database using nodes and edges. Further, the ability to manipulate sets of labels assigned to the stored elements allows us to represent more complex network abstractions like active flows, IP Pools, topology snapshots, and virtual network functions (e.g., NAT); (3) the ability to store heterogeneous collections of data as properties of elements in the network such as DPIDs for switches, MAC addresses for hosts, and bandwidth capacity for links; and (4) an intuitive GUI (Figure <ref type="figure">3</ref>) that allows network operators to view current (and past) topology information, and to ask simple questions that are otherwise tedious to implement in imperative programming languages (e.g., "what active flows go through switch X and avoid middleboxes of type T?").</p><p>Currently, VIP Lanes is capable of calculating three types of paths: the fastest, the widest, and the shortest. The fastest path query chooses the route based on the sum of latencies of all links on the path; the widest path query chooses the route with the maximum (greatest) bandwidth capability of the minimum-bandwidth link in a path; the shortest simply chooses the path with fewest hop counts. While Neo4j provides a built-in function for the shortest path, we wrote queries for the fastest and widest path with relatively simple and straight-forward phrases using Cypher. By default, all three types of paths are middlebox-free. However, it is possible to compute paths that avoid only certain types of middleboxes, e.g., to ensure big data traffic goes through a non-bottleneck monitoring appliance. Relevant middleboxes and non-SDN devices present in the network (red nodes in Figure <ref type="figure">3</ref>) are identified in the database primarily through JSON-encoded configuration files that contain descriptions of the interfaces present at every middlebox (e.g., MAC and IP addresses, or neighbors). Typically, some of these middleboxes are discovered by the controller as hosts, consequently, the path computation service we developed uses the information Dynamically Creating Custom SDN High-Speed Paths for Big Data Science Flows PEARC17, July 09-13, 2017, New Orleans, LA, USA When a path query is run in Neo4j, it returns not only the nodes and edges along the computed path, but also a selection of label and property data for each node and edge. This information is vital to the successful construction of custom paths since it describes how each OpenFlow element must behave in the selected path; thus, it allows us to realize a certain degree of NFV in our SDN network. The control software we developed parses this relevant data and maps it into OpenFlow rules which are further installed at every switch along the path by the Aruba VAN SDN controller <ref type="bibr">[3]</ref>. These rules ultimately dictate the behavior of every individual switch for every approved big data science flow. Consequently, it is not rare to have "multi-function" switches (like the sdn node in Figure <ref type="figure">3</ref>) that operate differently based on the location of the hosts in the computed path. For instance, for on-campus transmissions (e.g., "a transfer from the Computer Science department to the Physics department") a switch could behave as an L2/L3 switch that rewrites MAC addresses or VLAN tags for every packet header in a flow. Additionally, that same switch could serve (simultaneously) as a stateless Network Address Translator (NAT) for big data transfers going off-campus (e.g., "sending data to a national lab"). In this particular case, the flexible nature of graph databases allows us to not only store the de facto NAT table, but also the set of public IP addresses (and their availability) to appropriately assign and produce OpenFlow rules that rewrite the source and destination IP addresses of packet headers for outbound (i.e., from the campus) and inbound (i.e., to the campus) big data flows going through the high-speed SDN network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">TOPOLOGY VERSIONING</head><p>For a big data science flow to obtain high-throughput, the path computation must have accurate topology data. The challenge, of course, is to determine the frequency of updating the topology data without compromising efficiency. Theoretically, the topology stored on the database T db should always match the actual topology T c (known by the controller) at any given time. In practice, however, this would add unnecessary overhead: if no science flows are requested for a period of time, it is wasteful to continuously update T db . The opposite approach is to check if T db = T c before each path query and update T db if the condition is not met before computing the path. While this eliminates unnecessary topology updates, it adds a user-noticeable delay to the path calculation process as the topology grows. To tackle this problem, we implemented a topology versioning mechanism. Figure <ref type="figure">4</ref> illustrates how the mechanism operates when events take place at different levels of the architecture.</p><p>When the controller boots up or a new version of the topology module is deployed (light-gray box), T c is defined to be the topology learned by the controller. We also associate a random 64-bit number v c with T c indicating the version of the topology stored in T c . We also define a flag T c_r eq that is used to indicate whether the topology has been requested by the path computation service or not. Later, when a topology event (i.e., change in topology) is detected by the controller (yellow elements), the version number v c is increased by 1 iff (1) the event is not associated with a host joining or leaving the topology -which happens continually because the controller assumes a host has left when no packets have been seen for some amount of time, but then adds the host to the topology as soon as a packet is seen from the host, and (2) the path computation service has requested a newer version of the topology. This avoids the situation where T db is constantly being updated even though the path computation service does not currently need the latest version.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 4: Topology versioning mechanism flow chart</head><p>When the database is initialized, T db is set to the current version of T c , and the database version number is set to match the controller's version number (i.e., v db = v c ). Later, when a user requests a high-speed VIP Lanes path via the path computation service (white elements), the path computation service calculates the VIP Lanes path using T db . It then sends the computed path along with v db to the controller to install the SDN path. When the controller receives the request, it first checks if the v db is equal to the v c (i.e., the topologies are in sync). If so, OpenFlow rules are generated with the corresponding actions (e.g., MAC, IP or VLAN rewrites) at every hop in the path and a success message is returned to the path computation service. Otherwise, the controller notes that the path computation service needs a new version of the topology (by setting T c_r eq = True), and rejects the current path installation request. As a result, new topology events will cause v c be increased (yellow elements), and the response message is built including the most recent values of v c and T c . Once the response gets back to the path computation service, current data of T db and v db is archived as an old version in the Neo4j database, and new snapshots are added with the T c and v c values provided in the response. After this update, the process starts over again and the path calculation is done on a more recent topology snapshot.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_0"><p>CONCLUSIONThere is a growing need for high-speed big data transfers both east/west across campus and north/south to the cloud. Optimizing data transfer across and SDN-enabled campus using VIP Lanes requires an intelligent path computation service that understands the details of the campus network topology, including the location of performance-limiting middleboxes that should be avoided. In this paper, we introduced a novel path computation service that leverages the features of the Neo4j graph database to efficiently calculate constrained middlebox-free paths like the shortest, the fastest, and the widest paths, but also determines custom behaviors of individual switches on a per-hop and per-flow basis. We described how our system can implement network function virtualization capabilities such as NAT via OpenFlow rules, and introduced a topology versioning mechanism that guarantees the paths (and the generated OpenFlow rules) are calculated with an accurate snapshot of the current network topology.</p></note>
		</body>
		</text>
</TEI>
