<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A Practical Approach for Dynamic Taint Tracking with Control-flow Relationships</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>04/30/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10316631</idno>
					<idno type="doi">10.1145/3485464</idno>
					<title level='j'>ACM Transactions on Software Engineering and Methodology</title>
<idno>1049-331X</idno>
<biblScope unit="volume">31</biblScope>
<biblScope unit="issue">2</biblScope>					

					<author>Katherine Hough</author><author>Jonathan Bell</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Dynamic taint tracking, a technique that traces relationships between values as a program executes, has been used to support a variety of software engineering tasks. Some taint tracking systems only consider data flows and ignore control flows. As a result, relationships between some values are not reflected by the analysis. Many applications of taint tracking either benefit from or rely on these relationships being traced, but past works have found that tracking control flows resulted in over-tainting, dramatically reducing the precision of the taint tracking system. In this article, we introduce              Conflux              , alternative semantics for propagating taint tags along control flows.              Conflux              aims to reduce over-tainting by decreasing the scope of control flows and providing a heuristic for reducing loop-related over-tainting. We created a Java implementation of              Conflux              and performed a case study exploring the effect of              Conflux              on a concrete application of taint tracking, automated debugging. In addition to this case study, we evaluated              Conflux              ’s accuracy using a novel benchmark consisting of popular, real-world programs. We compared              Conflux              against existing taint propagation policies, including a state-of-the-art approach for reducing control-flow-related over-tainting, finding that              Conflux              had the highest F1 score on 43 out of the 48 total tests.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Taint tracking is a technique for monitoring the ow of information through a system. Traditionally, it has been used in privacy analyses to prevent condential data from leaking into a program's public outputs and in security analyses to detect the ow of untrusted values into sensitive program locations <ref type="bibr">[60,</ref><ref type="bibr">62]</ref>. In recent years, it has also been applied to other software development tasks, for instance, assisting automated input generation systems (fuzzers) <ref type="bibr">[59]</ref>, helping to identify poorlydesigned software tests <ref type="bibr">[36]</ref>, providing debugging guidance <ref type="bibr">[7,</ref><ref type="bibr">19]</ref>, and creating performance models for congurable systems <ref type="bibr">[69]</ref>.</p><p>Dynamic taint tracking associates labels (also referred to as taint tags) with program data and propagates these labels through the system during the execution of a program. The set of rules dening how taint tags propagate when an operation executes forms the tainting tracking system's propagation policy. This policy eectively describes what it means for information to "ow" through a program. Most taint tracking systems focus on tracing the ow of data through assignment, arithmetic, and logical operations which directly pass information from their operands to their result. This direct passage of information is referred to as an explicit or data ow <ref type="bibr">[60]</ref>. In a data ow, the value of the ow's target, the operation's result, is derived from the value of ow's source, the operands of the operation. For instance, in line three of Listing 1 there is a data ow from x and y to z. Thus, the labels of x and y should propagate to z.</p><p>1 int x = taint(4, X ); 2 int y = taint(8, Y ); 3 int z = x + y; 4 int q = 0; 5 if (y == 8) { 6 q = 1; 7 } Listing 1. A basic taint tracking example.</p><p>However, tracking only these explicit ows can provide an incomplete picture of the ow of information through the system. For example, one might expect q to be tainted after the code in Listing 1 executes, since it is clear that q's value reveals y's value. This indirect passage of information between values can occur as a result of conditional branches, array operations, and pointer dereferencing and is referred to as an implicit ow <ref type="bibr">[45]</ref>. Unlike a data ow, the value of an implicit ow's target is not related to the value of its source through some computation. Instead, the value of the implicit ow's source is used to "select" the value of the implicit ow's target. For example, if a tainted index is used to access an element of an array, then the retrieved element's value is not directly derived from the value of the tainted index. However, the retrieved element is selected from the collection of elements in the array as a result of the value of the tainted index. The lack of direct computational relationship between the target of an implicit ow and its source can mean that little to no information is passed along the ow. For example, consider a situation where a tainted index is used to access an element of an array containing the same item at every position. If the tainted index's value is guaranteed to be within the bounds of the array, then the value of the accessed element is not inuenced by the value of the tainted index. In this case, propagating labels from the tainted index to the accessed element falsely conveys a relationship between the two values. This is typically referred to as over-tainting.</p><p>Implicit ows resulting from conditional branches are specically referred to as control ows since information is passed via the control structure of the program. The source of a control ow is a tainted branch condition that guards the execution of an assignment statement. The branch condition's value impacts whether the assignment statement executes and therefore selects whether a new value is assigned to the statement's destination storage location. Like other implicit ows, not propagating along control ows can result in critical data relationships being lost, which is referred to as under-tainting. For instance, when looking up the value associated with a tainted key in an associative-array data structure, there is likely a control ow, but not a data ow between the key and the value. Thus, many existing taint tracking systems support the propagation of taint tags through control ows <ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref><ref type="bibr">18]</ref>.</p><p>The standard semantics for propagating control ows propagates the taint tag of a branch's predicate to every value written by an assignment statement whose execution is controlled by that branch. However, prior works have found that using the standard control ow propagation semantics resulted in severe over-tainting making it impractical for their applications <ref type="bibr">[7,</ref><ref type="bibr">8,</ref><ref type="bibr">19,</ref><ref type="bibr">66]</ref>.</p><p>For example, in their tool for debugging conguration errors, Attariyan and Flinn <ref type="bibr">[7]</ref> reported that "a strict denition of causal dependencies [control ows] led to our tool outputting almost all conguration values as the root cause of the problem." Clause and Orso <ref type="bibr">[19]</ref> discovered that using control ow tracking in Penumbra, a tool for identifying inputs that are relevant to a failure, resulted in larger failure-relevant input sets and in the case of one application resulted in almost all of the program's roughly 15 million inputs being marked as failure-relevant. We performed a case study of Penumbra (detailed in Section 6.5) and observed a similar result: propagating along control ows resulted in impractically large failure-relevant input sets. We also found that this over-tainting could be reduced by using alternative control ow propagation semantics. However, due to the over-tainting that occurs when using the standard control ow propagation semantics, existing software engineering tools that use taint analysis typically ignore control ows, favoring precision over recall. This over-tainting is not caused by a bug in the taint tracking system, but by a mismatch between the standard control ow propagation semantics and the expectations of downstream analyses. In particular, the standard control ow propagation semantics tend to overreport relationships between values. Downstream analyses expect information ows to be indicative of strong, causal relationships between values. If a single, specic condition results in a particular value being assigned to a particular location, then there is a strong relationship between that condition and that assignment. However, if that same assignment can be triggered by many dierent conditions, then the relationship between those conditions and that assignment is weaker.</p><p>Prior work has considered renements to the standard control ow propagation policy to address control-ow-related over-tainting. For instance, Bao et al. <ref type="bibr">[8]</ref> proposed a renement to control ow tracking that only considered control ows resulting from strict equality checks rather than control ows resulting from all comparison operators. Kang et al. <ref type="bibr">[41]</ref> used symbolic execution to identify and propagate along control ow paths that can only be reached by a single input value. Approaches like these, which reduce over-tainting by considering only a subset of control ows, cannot fully address under-tainting without also causing over-tainting. That is, even if it were possible to determine and propagate along the optimal, minimal subset of control ows necessary to prevent under-tainting, over-tainting could still occur.</p><p>What constitutes over-tainting is ill-dened; the types of relationships that need to be tracked varies between applications. Generally, within the context of an application of taint tracking, if a label assigned to a piece of data conveys a relationship between that data and the source of the label that is not useful in that application, then that data is said to be over-tainted. For example, a privacy analysis expects data labeled as condential to contain enough information from the original private source that if that data were leaked publicly, it would violate some expectation of secrecy or condentiality. Although this is still ill-dened, it has likely motivated prior work on reducing control-ow-related over-tainting to consider the root cause of the over-tainting to be the amount of information that is transferred across a control ow. However, not all low-information ows result in over-tainting. Consider the data ow from x to y in the statement y = x % 2. This ow transfers very little information about the value of x to y. Half of all possible values for x map to 0 and the other half to 1; so it is clearly not a one-to-one mapping. Regardless, the value of y is what it is because of the value of x, and taint tracking tools are generally expected to report such a ow. Propagating along these sorts of low information data ows does not seem to cause the same over-tainting issues as propagating along control ows. It is our position that control-ow-related over-tainting stems from a mismatch between the nature of control and data ows.</p><p>In particular, taint tags propagated at runtime along data ows only contain information about what has actually happened during a particular execution. Code that has not executed does not impact data ows; they are determined by what has happened and not what could have happened. Dynamic taint tracking only provides insights into observed executions; unlike a static taint analysis it cannot prove things. This is often presented as a disadvantage of dynamic taint tracking over static taint tracking. However, many software engineering tools rely upon this behavior. For example, OraclePolish <ref type="bibr">[36]</ref> used dynamic tainting tracking to evaluate the quality of a test suite and was therefore only interested in code that was actually executed by the test suite. Additionally, ConfAid <ref type="bibr">[7]</ref>, a system for identifying the root cause of conguration errors, used dynamic taint tracking because the system sought to identify the cause of the specic failure that actually occurred.</p><p>In contrast to data ows, control ows contain information about execution paths that did not happen; they are impacted by code that did not execute. A control ow is produced by a conditional branch splitting the ow of control into two or more paths. Some statements execute on only some and not all of those paths. These statements are therefore considered to be dependent on the branch's outcome. Thus, control ows are inherently concerned with execution paths that were not taken.</p><p>Recent applications of dynamic tainting tracking that need or benet from precise, ne-grained tainting tracking such as OraclePolish <ref type="bibr">[36]</ref>, VUzzer <ref type="bibr">[59]</ref>, ConfAid <ref type="bibr">[7]</ref>, and Rivulet <ref type="bibr">[35]</ref> underscore the potential benets of bridging the gap between the existing semantics for data and control ow tracking. To that end, this paper makes the following contributions:</p><p>&#8226; Alternative control ow scope semantics for reducing the amount of over-tainting &#8226; A heuristic that considers both dynamic and static information to reduce control-ow-related over-tainting &#8226; A benchmark for evaluating a control ow propagation policies' ability to precisely capture control ows in real-world Java programs</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BACKGROUND AND MOTIVATION</head><p>Prior works on taint tracking, information ow control, slicing, and other related topics have used a variety of terms to describe the same or similar concepts to ones discussed in this paper. Thus, for the sake of clarity, the terminology used in this work is dened below:</p><p>Data ow. A data ow (also known as an explicit ow) occurs due to an assignment, arithmetic, or logical operation that directly passes information from its operands to its result <ref type="bibr">[12,</ref><ref type="bibr">18,</ref><ref type="bibr">27,</ref><ref type="bibr">45,</ref><ref type="bibr">60]</ref>. In a data ow, the value of the ow's target (the operation's result) is derived from the value of ow's source (the operands of the operation). Implicit ow. An implicit ow is the indirect passage of information between values typically as a result of conditional branches, array operations, or pointer dereferencing <ref type="bibr">[45]</ref>. In an implicit ow, the value of the ow's source is used to "select" the value of the ow's target. This denition of implicit ows is broader than the one used by Chandra and Franz <ref type="bibr">[12]</ref>, Sabelfeld and Myers <ref type="bibr">[60]</ref>, and Enck et al. <ref type="bibr">[27]</ref> which includes only the indirect passage of information as a result of control structures. Control ow. A control ow is an implicit ow resulting from a conditional branch <ref type="bibr">[18,</ref><ref type="bibr">27]</ref>. Scope of inuence of a branch. The scope of inuence of the execution of a conditional branching statement is the set of statements that execute after that execution of the conditional branching statement but before the next execution of the rst statement in the immediate post-dominator of the basic block containing the branching statement. This denition is equivalent to the range of inuence of a branch used by Weiser <ref type="bibr">[71]</ref> and Denning and Denning <ref type="bibr">[26]</ref>.</p><p>Control ow scope. The scope of a control ow is the dynamic set of instruction executions during which any taint tags associated with the source of the ow propagate to written values.</p><p>Standard control ow scope. We use the term standard control ow scope to refer to the typical denition for the scope of a control ow which is dened with respect to the postdominance relation. In particular, the standard scope of a control ow introduced by some conditional branch is dened as the set of instructions that execute after the ow of control splits at the branch but before the ow of control rejoins at the immediate post-dominator of the basic block containing that branch <ref type="bibr">[26]</ref>.</p><p>Over-tainting. Over-tainting is when a taint tag assigned to a value falsely conveys a relationship between that value and the source of the taint tag. While conceptually, overtainting can be caused by an imprecision in the underlying program analysis, this paper focuses on over-tainting caused by a mismatch between a taint tracking system's propagation rules and the expectations of analyses built on top of that taint tracking system. This type of over-tainting behavior was described by Clause and Orso <ref type="bibr">[19]</ref>, Staicu et al. <ref type="bibr">[66]</ref>, and Attariyan and Flinn <ref type="bibr">[7]</ref>.</p><p>Under-tainting. Under-tainting is when a value has not been assigned a particular taint tag falsely conveying a lack of relationship between the value and the source of the taint tag. Propagation policy. A tainting tracking system's propagation policy is the set of rules dening how taint tags should propagate when an operation executes.</p><p>The typical approach to control ow tracking considers there to be a control ow from the predicate of a conditional branch to any values written within the control ow's "scope". Many dynamic taint analysis systems track these scopes using a stack <ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref><ref type="bibr">18]</ref>. The taint tag of a branch's predicate is pushed onto this stack at the start of a control ow's scope and popped at the end of its scope. Traditionally, this scope is dened with respect to the post-dominance relation. Specically, the standard denition used for the scope of a control ow introduced by some conditional branch is dened as the set of instructions that execute after the ow of control splits at that branch but before the ow of control rejoins at the immediate post-dominator of the basic block containing that branch <ref type="bibr">[26]</ref>.</p><p>Bao et al. <ref type="bibr">[8]</ref> and Kang et al. <ref type="bibr">[41]</ref> propose propagating taint tags along a subset of control ows based on the "syntax of [the] comparison expression" using the standard, post-dominator-based denition for control ows' scopes. Additionally, Kang et al. <ref type="bibr">[41]</ref> attempt to identify "culprit" ows, control ows along which taint tags need be propagated in order to avoid under-tainting. However, even if propagation occurs only along the optimal, minimal subset of control ows necessary to prevent under-tainting (i.e., culprit ows), the standard control ow scope denition can cause over-tainting. Consider the code in Figure <ref type="figure">1a</ref>. If taint tags are not propagated along control ows, under-tainting can occur because the relationship between a plus sign in the input and a space in the output is missed. The minimal subset of control ows needed to correct this under-tainting contains only the ow introduced by the branch from the switch statement's case on line 8. Figure <ref type="figure">1d</ref> shows the labels expected to propagate to the output produced by spaceDecode when presented with an input array with each of its characters tainted with its position in the array (as shown in Figure <ref type="figure">1c</ref>). However, the control ow graph for spaceDecode (depicted in Figure <ref type="figure">1b</ref>) shows that the immediate post-dominator of the basic block that contains switch(input[i]) is the exit node. Thus, once the branch associated with the case on line 8 is traversed, the label for the predicate of that branch will be pushed onto the taint stack and impact all subsequent instructions until the method is exited. This causes the output of spaceDecode to be over-tainted as described in Figure <ref type="figure">1d</ref>. (b) Control flow graph for the spaceDecode method (Figure <ref type="figure">1a</ref>). (c) Sample tainted input for the spaceDecode method (Figure <ref type="figure">1a</ref>). Each input character is tainted with its position in the input array (e.g., the first input character, "H", is tainted with the tag "0" ). (d) Expected tainted output from the spaceDecode method (Figure <ref type="figure">1a</ref>) when presented with the sample input from Figure <ref type="figure">1c</ref> compared with the actual tainted output when propagating along the minimal subset of control flows necessary to prevent under-tainting using the standard scope definition. Fig. <ref type="figure">1</ref>. An example of a program in which using the standard control flow propagation semantics results in over-tainting. The method spaceDecode in Figure <ref type="figure">1a</ref> takes a sequence of characters that are not spaces. When a plus sign is encountered, a space is added to the output. Every other input character is copied to the output. When spaceDecode is executed with the tainted input displayed in Figure <ref type="figure">1c</ref>, the taint tag of every input plus sign ("+") is expected to flow to the output with the produced space character. Additionally, the taint tag of every other input character is expected to flow to the output with the character. However, as shown in Figure <ref type="figure">1d</ref>, the standard control flow propagation semantics over-taint the output.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Output</head><p>This over-tainting can be xed by reducing the scope of the control ow to include only the basic block that contains the instructions on lines 9 and 10. However, even if control ow propagation occurs only along the minimal subset of control ows necessary to prevent under-tainting with the minimal, basic-block level scopes necessary to prevent under-tainting for each control ow, over-tainting can still occur. Consider the code in Figure <ref type="figure">2a</ref>. If taint tags are not propagated along control ows, under-tainting can occur because the relationship between a percent sign in the input and a decoded character in the output would be missed. The minimal subset of control ows needed to correct this under-tainting contains only the branch on line 5. As shown in the control ow graph for percentDecode depicted in Figure <ref type="figure">2b</ref>, the minimal scope for that control ow includes only the basic block that contains the instructions on lines 6 and 7. Figure <ref type="figure">2d</ref> shows the labels expected to propagate to the output produced by percentDecode when presented with an input array with each of its characters tainted with its position in the array (as shown in Figure <ref type="figure">2c</ref>). When a percent sign is encountered in the input and the branch on line 5 evaluates to true, the label for that input is pushed onto the taint stack. That label then correctly propagates to the element of the result array that is assigned a value on line 6. But, it also incorrectly propagates to the variable size and the looping variable i when their values are incremented on lines 6 and 7. The end of the control ow's scope is then hit and its label is popped from the taint stack. On subsequent iterations of the loop, when i is used to access elements of the input array, i's taint tag propagates to the accessed element. Additionally, when size is used to select where in the output array to store a value, size's taint tag propagates to the stored value. This causes the output of percentDecode to be over-tainted as described in Figure <ref type="figure">2d</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">APPROACH</head><p>Our approach, C, aims to precisely propagate taint tags through information-preserving transformations in programs. C leverages novel heuristics to identify control ows that are likely to correspond to information-preserving transformations and ignore taint tag propagation from others. Like Bao et al. <ref type="bibr">[8]</ref>'s approach, C considers only a subset of control ows for propagation based on the comparative operator of the control ow's branch. Specically, C includes only control ows introduced by equality checks. However, even when considering only control ows introduced by equality checks, signicant over-tainting can still occur. As discussed in Section 2, the standard, post-dominator based denition for the scope of a control ow is overly conservative. Thus, C does not use the standard scope denition and instead introduces the notion of a "binding" scope which includes a subset of the basic blocks contained in the control ows' standard scope. However, this alone is not sucient to produce the expected tainted output in Figure <ref type="figure">2a</ref>, since the loop index i will still become tainted with the taint tag of input[i] on line 5. To address this and similar over-tainting, C introduces a dynamic heuristic, "loop-relative stability", which reasons about the strength of the relationship between a conditional branch and an assignment statement by considering the impact of executing loops on program semantics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Binding Scope</head><p>Overall, C aims to identify conditional branch executions that are not strictly information preserving and avoid propagating along the resulting control ows. To achieve this, C uses an alternative control ow scope denition that distinguishes between statements that can execute only as a result of a single, specic condition and statements that can execute under multiple conditions. The traditional denition for control ows' scopes is highly conservative; it considers every basic block between a branch in the ow of control and where that branch rejoins to be within the scope of the branch. Unlike traditional control ow scopes, which are dened with respect to nodes in the control ow graph, "binding" scopes are dened with respect to edges. In particular, binding scopes are dened with respect to the edges which are traversed when a conditional branching statement is "taken" or "not taken" (or for switch statements the edges associated with the cases of the switch). The binding scope of a branch edge includes instructions (c) Sample tainted input for the percentDecode method (Figure <ref type="figure">2a</ref>). Each input character is tainted with its position in the input array (e.g., the first input character, "%", is tainted with the tag "0" ).</p><p>Output Value</p><p>Expected tainted output from the percentDecode method (Figure <ref type="figure">2a</ref>) when presented with the sample input from Figure <ref type="figure">2c</ref> compared with the actual tainted output when propagating along the minimal subset of control flows with the minimal, basic-block level scopes necessary to prevent under-tainting.</p><p>Fig. <ref type="figure">2</ref>. An example of a program in which using the standard control flow propagation semantics results in over-tainting. The method percentDecode in Figure <ref type="figure">2a</ref> takes a sequence of characters and percent-encoded octets. Each input character that is not part of a percent-encoded octet is copied to the output. Each input percent-encoded octet is decoded into a character and that character is copied to the output. When percentDecode is executed with the tainted input displayed in Figure <ref type="figure">2c</ref>, the taint tag of an input character that is not part of a percent-encoded octet is expected to flow to the output with the character. Additionally, the union of the taint tags of an input percent-encoded octet is expected to flow to the output with the character decoded from the octet. However, as shown in Figure <ref type="figure">2d</ref>, the standard control flow propagation semantics over-taint the output.</p><p>which only execute if the edge is traversed, i.e., an instruction is included if every path from the distinguished entry node of the control ow graph to that instruction contains the branch edge. Since C only considers branch edges corresponding to equality checks (e.g., the branch taken side of an equality check or the branch not taken side of an inequality check), the execution of an instruction within the scope of a branch edge occurs only if some value was equal or "bound" to another value. Binding scopes can be calculated for the branch edges in a method by constructing its control ow graph, &#8999; = [+ , &#8674;], with designated entry and exit nodes denoted by E 4=CA ~2 + and E 4G8C 2 + respectively. Each node in + other than E 4=CA ~and E 4G8C represents a basic block and consists of a sequence of instructions. The successors of a node D 2 + is dened as BD22 (D) = {E | (D, E) 2 &#8674;}. A node E 2 + is said to be a branch node if it has more than one successor. The last instruction of a branch node is its conditional branch instruction. We refer to the set of outgoing edges of a branch node as its branch edges. Each branch edge is associated with a set of conditions under which the edge is traversed. For example, an if statement will have two branch edges: one edge corresponding to the branch being taken with a singleton condition set corresponding to the statement's predicate evaluating to true and one edge corresponding to the branch not being taken with a singleton condition set corresponding to the statement's predicate evaluating to false. Whereas, a switch statement's branch edges' condition sets will partition the cases of the switch statement.</p><p>An instruction 8 is within the binding scope of a branch edge 4 if 8 can only execute if 4 has been traversed. By denition <ref type="bibr">[2]</ref>, all of the instructions within a basic block execute if and only if the rst instruction in the basic block executes. Therefore, we can dene binding scopes with respect to basic blocks instead of individual instructions. Thus, the binding scope of a branch edge 4 is the set of all basic blocks E such that all paths in &#8999; from E 4=CA ~to E contain 4.</p><p>To calculate the binding scope of branch edges we rst construct a modied control ow graph, &#8999; 0 , from &#8999; by replacing each branch edge (D, E) 2 &#8674; with a new node 1 D,E and a pair of edges (D, 1 D,E ) and (1 D,E , E). Given this construction, the binding scope of a branch edge 4 is the set of all E 2 + such that 1 4 dominates E in &#8999; 0 . Proof of the correctness of this calculation is as follows: P 1. Given a branch edge 4 2 &#8674; and 1 4 , its node replacement in &#8999; 0 , 1 4 dominates a node E 2 &#8999; 0 if and only if all paths in &#8999; from E 4=CA ~to E contain 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P.</head><p>Suppose that 1 4 dominates E in &#8999; 0 . Assume that there exists a path, % = [E 4=CA ~, E 0 , . . . , E = , E], in &#8999; that does not contain 4. Since 1 4 dominates E in &#8999; 0 , every path in &#8999; 0 from E 4=CA ~to E must go through 1 4 . Given that 1 4 8 + and therefore 1 4 8 %, % must contain at least one edge, (E 8 , E 8+1 ), that is not in &#8674; 0 such that all paths from E 8 to E 8+1 in &#8999; 0 contain 1 4 . Furthermore, (E 8 , E 8+1 ) 2 &#8674; and (E 8 , E 8+1 ) 8 &#8674; 0 implies that (E 8 , E 8+1 ) was a branch edge that was replaced during the construction of &#8999; 0 . Thus, 1 E 8 ,E 8+1 2 + 0 , (E 8 , 1 E 8 ,E 8+1 ) 2 &#8674; 0 , and (1 E 8 ,E 8+1 , E 8+1 ) 2 &#8674; 0 . As a result, there is a path [E 8 , 1 E 8 ,E 8+1 , E 8+1 ] in &#8999; 0 . This path must contain 1 4 , therefore 1 E 8 ,E 8+1 = 1 4 . Therefore, given the construction process of &#8999; 0 , (E 8 , E 8+1 ) = 4. This contradicts the assumption % does not contain 4. Thus, 1 4 dominates E in &#8999; 0 implies that all paths in &#8999; from E 4=CA ~to E contain 4.</p><p>Now suppose that all paths in &#8999; from E 4=CA ~to E contain a branch edge 4. Assume that 1 4 does not dominate E in &#8999; 0 because there exists some path, % = [E 4=CA ~, E 0 , . . . , E = , E] in &#8999; 0 that does not contain 1 4 . Given a sequential pair of nodes E 8 and E 8+1 , in P, the edge (E 8 , E 8+1 ) is either present in &#8674; or it was added when a branch edge in &#8674; was replaced. If (E 8 , E 8+1 ) 8 &#8674;, then either E 8 8 + or E 8+1 8 + This is a consequence of the construction procedure for &#8999; 0 , since each edge added to &#8999; 0 is between an element of the original set of nodes and a node not in the original set of nodes. Furthermore, as a result of this, every edge in &#8999; 0 uses at least one node that is an element of + . Since % begins at a node in + , if E 8 8 + , there must be a node, E 8 1 2 + immediately before E 8 in %.</p><p>Since % ends at a node in + , if E 8+1 8 + , there must be a node, E 8+2 2 + immediately after E 8+1 in %. Thus, % can be broken into a sequence of sub-paths either of the form [E 8 , E 8+1 ] where (E 8 , E 8+1 ) 2 &#8674; or the form [E 8 , E 8+1 , E 8+2 ] where E 8 2 + , E 8+1 8 + , and E 8+2 2 + . For each such sub-path, there is an edge in &#8674; that does not equal 4 that connects the start of the sub-path directly to the end. Proof is as follows for each of two cases:</p><p>] where E 8 2 + , E 8+1 8 + , and E 8+2 2 + The edges (E 8 , E 8+1 ) and (E 8+1 , E 8+2 ) must have been added to &#8674; 0 when the node E 8+1 was added to + 0 as a replacement for the edge (E 8 , E 8+2 ). Since 1 4 is not in % and E 8+1 is in %, E 8+1 &lt; 1 4 . Therefore, (E 8 , E 8+2 ) 2 &#8674; and (E 8 , E 8+2 ) &lt; 4.</p><p>These edges form a path in &#8999; from E 4=CA ~to E not containing 4 contradicting the assumption that all paths in &#8999; from E 4=CA ~to E contain the branch edge 4. Thus, if all paths in &#8999; from E 4=CA ~to E contain a branch edge 4, 1 4 dominates E in &#8999; 0 . &#8676;</p><p>The binding scope of a branch edge is "scope-like"; all nodes within the scope lie on paths between the edge and its dominance frontier. Thus, the taint tag of a branch's predicate need only be pushed onto a taint stack once at the start of the branch edge's scope and can be safely popped at the ends of its scope. Another desirable property of binding scopes is that the set of basic blocks within a branch edge's binding scope is a subset of the basic blocks within its branch's standard (post-dominator based) scope. The immediate post-dominator of a branch node is reachable from all of its branch edges and therefore either part of or beyond the dominance frontier of its node replacement in the modied graph.</p><p>We can now apply the binding scope denition to the spaceDecode method in Figure <ref type="figure">1a</ref> using its control ow graph shown in Figure <ref type="figure">1b</ref>. There are two branch edges corresponding to equality checks: the one associated with case + and the one associated with case . The case + 's edge's binding scope contains only the basic block with the instruction result[i] =</p><p>. The case 's edge's binding scope contains only the basic block with the instruction throw new RuntimeException(). In this case, using binding scopes allows the relationship between a plus sign in the input and a space in the output to be reected without introducing over-tainting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Loop-Relative Stability Heuristic</head><p>When propagating along control ows, taint tags often accumulate on program data during the execution of a loop leading to an "explosion" of taint tags. For example, regardless of whether taint tags are applied in standard control ow scopes or only in binding scopes, during the execution of the loop in the percentDecode method (Figure <ref type="figure">2a</ref>), taint tags build up on the looping variable, i, resulting in progressively larger label sets for each successive output. C mitigates this accumulation of taint tags by making special considerations when determining whether to propagate taint tags between the branch of a control ow and a statement within the control ow's scope. This process is guided by the novel "loop-relative stability" heuristic.</p><p>The underlying idea for loop-relative stability is that loops introduce alternative paths to statements within the scope of a control ow. For instance, within a single call to the percentDecode method (Figure <ref type="figure">2a</ref>), the instruction on line 7, i += 2, can execute even if the branch on line 5 is not taken on a particular iteration of the loop. That is, on subsequent iterations of the loop, the branch on line 5 could evaluate to true, causing the statement i += 2 to execute, thereby producing the same eect that would have happened had the branch been taken on the earlier iteration. This weakens the relationship between the value of i and the element of the input array that caused the branch on line 5 to be taken. However, when a value is stored to result[size++] on line 6, the storage location to which that value is stored is dierent on each iteration of loop. Like before, on subsequent iterations of the loop, the branch on line 5 could evaluate to true causing the statement on line 6 to execute. However, unlike the statement on line 7 which updates the same local variable i on dierent iterations, the statement on line 6 updates a dierent element in the result array on dierent iterations. This produces a stronger relationship between the value of input[i] on line 5 and the value written on line 6 to result[size++] than the relationship between the value of input[i] on line 5 and the value written on line 7 to i.</p><p>The loop-relative stability heuristic aims to identify these cases in which a loop introduces multiple conditions under which the same location could be assigned the same value. In order for this to occur, there must be a conditional branching statement that is contained within a loop and an assignment statement within the scope of that branch's control ow. If the values used by the branch change on dierent iterations of the loop but the values used by the assignment statement stay the same, then on each iteration of the loop a dierent condition could cause the same eect. Since the conditional branching statement might occur in a dierent method than the loop and assignment statement, the loop-relative stability heuristic is dened in terms of the dynamic execution of statements. To capture these ideas, we dene an execution of a program statement as being "stable" relative to an executing loop if the values used by that statement are the same on every iteration of that loop. The loop-relative stability heuristic determines whether to propagate along a particular control ow based on the stabilities of statement executions. More specically, let 1 be an execution of conditional branching statement and 0 be an execution of an assignment statement that is within the scope of 1's control ow. The loop-relative stability heuristic propagates along the control ow from 1 to 0 only if 1 is stable relative to every loop to which 0 is relatively stable.</p><p>The loop-relative stability heuristic considers only "natural" loops as dened by <ref type="bibr">[2]</ref>. In particular, a natural loop is a single-point-of-entry cycle in the control ow graph dened with respect to a back edge, i.e., an edge whose target dominates its source. The natural loop of some back edge (D, E) consists of all nodes G such that E dominates G and there exists a path from G to D not containing E.</p><p>The node E is said to be the header of the natural loop dened by the back edge (D, E). Any two natural loops with the same header are combined and treated as a single loop. This denition of loops ensures that any two loops are either disjoint or nested, i.e., one loop is fully contained within the other. The use of arbitrary GOTO statements may in some cases produce control ow graphs that contain cycles that do not correspond to natural loops. However, most structured programming languages do not allow programmers to produce control ow graphs that contain cycles that do not correspond to natural loops. Thus, we feel that it is appropriate for the loop-relative stability heuristic to only consider natural loops.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Instability Levels.</head><p>Conceptually, the loop-relative stability heuristic could be expressed in terms of "stability sets"; a stability set is the set of loops with respect to which a statement execution is relatively stable. Let ( 1 be the stability set of an execution of conditional branching statement 1 and ( 0 be the stability set of an execution of an assignment statement within the scope of 1's control ow. The loop-relative stability heuristic propagates along the control ow from 1 to 0 only if ( 1 is a superset of ( 0 . However, instead of tracking these stability sets, it is possible to express the same concept using a single number, an "instability level". An instability level is a numeric value between zero and the number of loops containing the statement currently executing. This number represents the "depth" of the innermost loop relative to which a statement is not stable. The loop-relative stability heuristic propagates along the control ow from an execution of a conditional branching statement 1 to an execution of an assignment statement 0 within the scope of 1's control ow only if 1's instability level is less than or equal to 0's instability level. An explanation of why this simplication is possible is provided below.</p><p>Given a statement execution contained within two nested loops, if the values used by the statement change over the duration of the inner loop then they must also change over the duration of the outer loop, since the inner loop is fully contained within the outer loop. Thus, the stability set of a statement execution is dened by the innermost loop relative to which it is not stable. If an execution of a program statement occurs outside of a loop, that execution must be stable relative to that loop, since it is not possible for the values used by the statement to change over the duration of the loop. As a result, when calculating the loop-relative stability for the statement currently executing, only the set of loops containing that statement need to be considered in the calculation. However, this does not by itself guarantee that all comparisons for the loop-relative stability heuristic made at runtime only need to consider the set of loops containing the statement currently executing. The stability set of the execution of a conditional branching statement needs to be used before the execution of any assignment statement within the scope of the branch's control ow. This means that the stability set of an execution of a conditional branching statement may be considered by the loop-relative stability heuristic after the innermost loop relative to which the execution was not stable was exited. However, once the innermost loop relative to which an execution of a conditional branching statement is not stable is exited all subsequent statements will execute outside of that loop and therefore be stable relative to it. Since these executions are stable relative to a loop that the branch execution was not, the loop-relative stability heuristic will prevent the branch's predicate from propagating to these statements as a result of the control ow. Thus, C stops all propagation along the control ow from the execution of a conditional branching statement as soon as the innermost loop relative to which that branch execution is not stable is exited. As a result, all comparisons for the loop-relative stability heuristic made at runtime only need to consider the set of loops containing the statement currently executing. Given this and the fact that the stability set of a statement execution can be dened by the innermost loop relative to which it is not stable, loop-relative stabilities can be specied at runtime as a single number, an instability level, between zero and the number of loops containing the statement currently executing.</p><p>These instability levels are calculated at runtime as a function of both static information, the stability "classier" of a statement (Section 3.2.2), and dynamic information, the context of the statement's execution (Section 3.2.3). A purely dynamic approach could only consider instructions that have already executed. This is problematic because it's possible for the predicate of a conditional branching statement to be the same on all but the last iteration of a loop. By the time that last iteration occurs the loop-relative stability heuristic may have already been applied causing C to incorrectly propagate along a control ow from that branch. Furthermore, the loop-relative stability is concerned with the presence of alternative conditions that could have produced the same outcome. It does not matter whether those conditions were met on a particular execution. For example, consider the any method on line 3 of Listing 2. The for-loop on lines 4-8 introduces multiple conditions under which the value of x is set to true, specically if any of the elements of the array z is true. If any is passed an array new boolean[]{false, false, false, true}, then there happened to be one condition, z <ref type="bibr">[3]</ref> == true that was satised and caused x to be assigned the value true. However, there were still possible alternative conditions under which that assignment could have occurred. Therefore, propagation should not occur from the branch on line 5 to the assignment statement on line 6 according to the loop-relative stability heuristic. To handle this, C uses static information to reason about possible executions and assign stability classiers to program elements. C also relies upon dynamic information about calling contexts. Consider the any_ method on line 15 of Listing 2. The any_ method is functionally equivalent to the any method on line 3 of Listing 2, but this functionality is split across two methods, any_ and setX. Because any_ is functionally equivalent to any, taint tag propagation for the two methods should ideally be the same, meaning that propagation should not occur during the execution of the assignment statement on line 12 of the setX method. However, setX is also called by the method checkFirst on line 25 of Listing 2. The conditional branching statement on line 24 of checkFirst does not necessarily occur within a loop. Therefore, propagation may need to occur during the execution of the assignment statement on line 12 of the setX method when called from the checkFirst method. Thus, it is not possible to determine whether propagation should occur during the execution of the assignment statement on line 12 without considering the dynamic execution context of the call to setX. Thus, C constructs and passes between methods information about execution contexts at runtime. These execution contexts are used to calculate instability levels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Stability</head><p>Classifiers. C statically assigns stability classiers to program elements; these classiers encode information about possible program executions. There are three types of stability classiers: stable, dependent, and unstable. The stable classier type indicates that all executions of a program element are stable relative to all loops regardless of the dynamic execution context. We use S to denote a stable-type classier. The dependent classier type indicates that the instability level of an execution of a program element depends on the instability level of one or more arguments passed to the method call that contains the execution. We refer to the set of parameters corresponding to these arguments as the dependency set of the classier. We use Dh3i to denote a dependent-type classier with a dependency set 3. In addition to dependencies on arguments, a dependent-type classier can also indicate a dependency on the execution context's return value location. We use U to denote a dependency on the execution context's return value location. The unstable classier type indicates that the instability level of an execution of a program element depends on the loops that contain the method call that contains the execution. In this case, the element is not stable with respect to all loops containing the method call and possibly additional loops within the method. We use Uh;i to denote an unstable-type classier for a program element that is not stable relative to the elements of ;, a set of loops within the method.</p><p>At runtime, C determines instability levels based on stability classiers and execution contexts. The instability level of a stable-type classier is always zero. The instability level of a dependent-type classier is dynamically calculated as the maximum instability level of the arguments corresponding to the parameters in the program element's dependency set. The instability level of an unstable-type classier, Uh;i, is dynamically calculated as the number of loops containing the method call plus |; |. We describe this more precisely in Section 3.2.4 below.</p><p>To calculate stability classiers for a method, C rst converts the method into an intermediate representation in static single assignment (SSA) form. By converting the method into SSA form, C can easily identify reaching denitions since there will be exactly one definition reaching each use. Next, C creates a control ow graph for the method and uses this graph to identify which natural loops within the method contain each instruction. We use loops <ref type="bibr">(8)</ref> to denote the set of natural loops containing an instruction 8. Finally, C calculates the stability classier of each conditional branching statement, assignment statement, non-void return statement, method call receiver, method call argument, and method call return value location. The stability classiers of conditional branching statements, assignment statements, and non-void return statements (which C treats like interprocedural assignment statements) are directly used to determine whether to propagate along a control ow in accordance with the loop-relative stability heuristic. In order to construct an execution context for a method call, C uses the stability classier of the method call's receiver, arguments, and return value location. We discuss this in detail in Section 3.2.3.</p><p>For the sake of computing stability classiers, we dene a function merge(2 1 , 2 2 ) in Algorithm 1 for combining two classiers, 2 1 and 2 2 to produce a classier 2 3 such that for any possible execution context the instability level of 2 3 will be greater than or equal to the the instability level for 2 1 and 2 2 . Using this merge function, C can then calculate stability classiers for value expressions (program entities that are evaluated to produce a value) and storage locations (program entities that represent a place for storing a value, i.e., the left-hand-side of an assignment statement) as described in Algorithms 2 and 3, respectively.</p><p>Algorithm 1 Function for combining two stability classiers 2 1 and 2 2 . end if 32: end function</p><p>The valueClassifier function (Algorithm 2) is also used to calculate the stability classier of each conditional branching statement by applying the function to the predicate expression of the branching statement. To calculate the stability classier of a non-void return statement, C rst calculates the stability classier of the expression returned by the statement and then combines this classier with a dependent-type classier with a single dependency on the execution context's return value location, More formally, the stability classier of a non-void return statement is merge(Dh{U }i, c v ), where 2 E is the stability classier of the expression returned by the statement. The stability classier of a method call return value location is determined by applying the locationClassifier (Algorithm 3) function to the storage location assigned the value of the result of the method call. If the return value of a method call is unused, then the stability classier of the return value location for that method call is S.</p><p>Algorithm 3 Function for calculating the stability classier of a storage location G that appears in an instruction 8.</p><p>1: function C(G, 8)</p><p>5 loops(8)</p><p>if G is directly used in an invoke expression on the right-hand-side of an assignment statement then else if G is an array element of the form 0[1] then 25:</p><p>2 0 C(0, 8)</p><p>26:</p><p>end if</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>29:</head><p>return S 30: end function C uses both the valueClassifier and locationClassifier functions to calculate the stability classier of an assignment statement. However, C makes a special consideration when calculating the stability classiers of assignment statements in order to address situations in which the value to be stored by an assignment was produced from an expression containing a term matching the current value at the destination storage location of the assignment statement. We refer to this as an "update" assignment and exclude any update terms, terms that match the current value at the storage location, from the stability calculation for the value. Thus, before C can determine the stability classier of an assignment statement, it must identify any portions of the right-hand-side of the assignment statement that correspond to excluded update terms. If the destination storage location of the assignment statement is a local variable, any uses of that denition of that local variable that reach the assignment statement are considered to be update terms. If the destination storage location is an array, C considers any values that are loaded from the same array at the same position to be candidate update terms. These values may represent update terms, but it is also possible that the value at that storage location was written between the instruction that loaded the candidate update term and the assignment statement. C ignores the potential for concurrent writes to that storage location and accepts a candidate as being a true update term if there does not exist an execution path containing an array store or a method call between the instruction that loaded the candidate and the assignment statement. In practice, checking all possible paths between the two instructions may be impractical. Thus, for small control ow graphs (those containing less than 10 basic blocks), C checks all possible paths between the two instructions. For larger control ow graphs, C assumes that such a path exists if the two instructions are not in the same basic block. If the destination storage location of the assignment statement is a eld, C considers any values loaded from the same eld and, for instance elds, from the same instance to be candidate update terms. Once again, C checks all paths between the instruction that loads the candidate update term and the assignment statement. However, in this case, the paths are checked for method calls and eld stores.</p><p>Once C has identied any portions of the right-hand-side of the assignment statement that are considered to be excluded update terms, it calculates the stability classier of the assignment statement based on the remaining portions of the right-hand-side of the assignment statement and the left-hand-side of the assignment statement (i.e., its destination storage location). Algorithm 4 shows how C performs this calculation.</p><p>Algorithm 4 Function for calculating the stability classier of an assignment statement that appears in an instruction 8 and assigns a value expression E to a storage location G.</p><p>, set containing the portions of E that are not excluded update terms for F 2 , do 5:</p><p>end for 8:</p><p>return 2 9: end function 3.2.3 Execution Contexts. The loop-relative stability of a statement can vary between executions depending upon the dynamic execution context in which the call to the method that contains the statement was made. For example, if a call is made to a method from within a loop, then statements within the method may vary with respect to that loop. Additionally, whether a statement within the callee method is stable with respect to a loop may depend on the arguments passed to that method. As a result, the loop-relative stability of the execution of a statement cannot be fully determined through purely static analysis. Instead, it must be calculated at runtime within a specic execution context. To address this, at runtime C records information about execution contexts and passes this information between methods.</p><p>The execution context for a particular method call consists of two components: a depth and a level map. The depth of a method call is the number of loops that contain the call to that method. We use 34?C&#8984;(4) to denote the depth of an execution context 4. The level map of a method call species the instability levels of the arguments passed to the method call, the method call's receiver (if the method is an instance method), and the storage location for the return value of the method call (if the method is non-void). We use level(4, 0) to denote the instability level of an argument, method receiver, or return value location 0 specied by some execution context 4.</p><p>Before a method call is made from some caller method, &lt;, C constructs an execution context for the call, 4 0 . This execution context is created using the calling method's execution context and the stability classiers that were determined for the method call's arguments, receiver, and return value location as described in Section 3.2.2. C denes 34?C&#8984;(4 0 ) to be the sum of 34?C&#8984;(4) and the number of loops within &lt; that contain the method call about to be made. C calculates the instability level of each of the arguments passed to the method call using 4 and the stability classier for the argument; this value is then recorded in 4 0 's level map. If the callee method is an instance method, C calculates and records the instability level of the method call's receiver using 4 and the stability classier for the receiver. If the callee method is a non-void method, C calculates and records the instability level of the method call's return value location using 4 and the stability classier of the return value location. Finally, C passes 4 0 to the callee method.</p><p>At the program entry point, C creates an initial execution context for the initial method call, since this call does not have a caller from which it would receives an execution context. Thus, C uses an initial execution context 4 such that 34?C&#8984;(4) = 0 and level(4, 0) = 0 for all method elements 0. Algorithm 5 Function for calculating an instability level based on a stability classier 2 and an execution context 4.</p><p>1: function L(</p><p>if 2 = S then Computed instability levels are used to construct execution contexts as discussed in Section 3.2.3 and, ultimately, to apply the loop-relative stability heuristic. In particular, let 1 be an execution of a conditional branching statement and 0 be an execution of an assignment statement that is within the scope of 1's control ow. The loop-relative stability heuristic propagates along the control ow from 1 to 0 only if 1's instability level is less than or equal to 0's instability level.</p><p>For example, consider the programs shown in Listings 3 and 4 in Figure <ref type="figure">3</ref>. These programs are nearly identical; they dier only in the names of methods and the value passed to the call to the method set on line 13. However, this slight, semantic dierence is enough to the impact instability levels in a way that produces dierent taint tag propagation.</p><p>Consider two program executions: one starting from the main method in Listing 3 and the other starting from the main method in Listing 4. Both executions proceed as follows. The main method calls an intermediate method, indexOf in one case and contains in the other. This intermediate method contains one natural loop ! (the for-loop on lines 9-15). In both executions, there are two iterations of this loop. On the rst iteration, when i is 0, the predicate of the branch on line 12 evaluates to true causing a call to be made to the set method. The set method assigns the value 0 (b) Fig. <ref type="figure">3</ref>. Listings 3 and 4 depict Java methods with programs elements colored green, yellow, orange, or red to indicate a stability classifier of S, Dh{~}i, Uh&#250;i, or Uh{!}i, respectively. Figure <ref type="figure">3a</ref> shows the stack trace for an execution of the program in Listing 3 just before the instruction on line 4 of the set method in Listing 3 is executed for a program execution starting from the main method in Listing 3. Figure <ref type="figure">3b</ref> shows the stack trace just before the instruction on line 4 of the set method in Listing 4 is executed for a program execution starting from the main method in Listing 4. Both of these stack traces (Figures <ref type="figure">3a</ref> and<ref type="figure">3b</ref>) consist of three frames. For each frame, we show the value (Value) and associated taint tags (Tags) of each local variable within the scope of the frame. We also show the execution context (Execution Context) calculated by C for each frame.</p><p>to a static eld x and then returns. On the second iteration of !, the predicate of the branch on line 12 evaluates to false, and no call is made to the set method. After the loop ! ends, the intermediate method returns. Then, nally, the method main returns. In both executions, when the branch on line 12 of the intermediate method was taken, the value of the operand a[0] of the predicate of that branch was tainted (as depicted in the middle frame of Figures <ref type="figure">3a</ref> and<ref type="figure">3b</ref>). Thus, when the assignment statement on line 4 of set executed, it wrote a value within the scope of a control ow introduced by a tainted predicate. Figure <ref type="figure">3a</ref> shows the stack trace for the moment just before the assignment statement on line 4 of the set method executed for the execution of the program in Listing 3. Similarly, Figure <ref type="figure">3b</ref> shows the stack trace for the same moment for the execution of the program in Listing 4. In order to determine whether to propagate along these control ows, C must know the static stability classiers for the methods called in these two executions and the dynamic execution contexts of the calls made at runtime to these methods.</p><p>We annotated Listings 3 and 4 to indicate the stability classiers of program elements. These stability classiers are statically determined by applying the rules described in Section 3.2.2. The assignment statement x = y on line 4 of both programs has a stability classier of Dh{~}i. The storage location x always refers to the same location regardless of the context in which it executes. However, whether the value represented by the expression y changes over the iterations of some theoretical loop containing a call to set depends on whether the argument y changes over the iterations of that loop. Thus, this assignment statement has a stability classier of Dh{~}i. The assignment statement i = 0 on line 9 of both programs has a stability classier of S because, regardless of the context in which it executes, it always assigns the same value, 0, to the same location, the local variable i. The statement i++ on line 11 of both programs is an augmented assignment equivalent to i = i + 1. The term i in the assignment i = i + 1 is an excluded update term, and the remaining term 1 is constant. Therefore, the statement i++ on line 11 of both programs has a stability classiers of S. The stability classiers of the branch on line 10 of both programs (i &lt; a.length) and the branch on line 12 of both programs (a[i] == h ) are both Uh{!}i because both statements use the value of i which changes over the iterations of !. The argument i passed to the call to set on line 13 of Listing 3 has a stability classier of Uh{!}i because the value of i changes over the iterations of !. By contrast, the argument 0 passed to the call to set on line 13 of Listing 4 has a stability classier of S because it is a literal. The assignment statement on lines 19-22 of both programs contains a storage allocation, thus its stability classier is Uh&#250;i. The argument a passed to the method call on line 23 of both programs has a stability classier Uh&#250;i because the value of the reaching denition of a from lines 19-22 contains a storage allocation.</p><p>In addition to these stability classiers, C must also calculate the dynamic execution contexts of method calls according to the rules described in Section 3.2.3. Let 4 &lt;08= , 4 8=34G$ 5 , and 4 B4C denote the execution contexts for the calls made to main, indexOf, and set, respectively, in the program execution for Listing 3. Let 4 &lt;08= 0 , 4 2&gt;=C08=B , and 4 B4C 0 denote the execution contexts for the calls made to main, contains, and set, respectively, in the program execution for Listing 4.</p><p>Since main is the program entry point, 4 &lt;08= and 4 &lt;08= 0 are initial execution contexts. Thus, depth(4 &lt;08= ) and depth(4 &lt;08= 0 ) are 0, and level(4 &lt;08= , args) and level(4 &lt;08= 0, args) are also 0. The bottom frame of Figure <ref type="figure">3a</ref> (labeled "main: line 23") depicts 4 &lt;08= , and the bottom frame of Figure <ref type="figure">3b</ref> (labeled "main: line 23") depicts 4 &lt;08= 0 . C constructs 4 8=34G$ 5 using 4 &lt;08= and 4 2&gt;=C08=B using 4 &lt;08= 0 . Since there are no natural loops inside main that contain the call to indexOf, depth(4 8=34G$ 5 ) is depth(4 &lt;08= ) +0 = 0. The argument passed to the call to indexOf has a stability classier of Uh&#250;i, thus level(4 8=34G$ 5 , a) is depth(4 &lt;08= ) +|&#250;| = 0. The middle frame of Figure <ref type="figure">3a</ref> (labeled "indexOf: line 13") depicts 4 8=34G$ 5 . Since the main methods for the two programs are identical and 4 &lt;08= = 4 &lt;08= 0 , 4 2&gt;=C08=B is identical to 4 8=34G$ 5 . The middle frame of Figure <ref type="figure">3b</ref> (labeled "contains: line 13") depicts 4 2&gt;=C08=B .</p><p>Next, C constructs 4 B4C using 4 8=34G$ 5 and 4 B4C 0 using 4 2&gt;=C08=B . Since the loop ! contains the call to set in indexOf, depth(4 B4C ) is depth(4 8=34G$ 5 ) +1 = 1. For the same reason, depth(4 B4C 0 ) is depth(4 2&gt;=C08=B ) +1 = 1 The argument passed to the call to set in indexOf has a stability classier of Uh{!}i. Therefore, level(4 B4C , y) is depth(4 8=34G$ 5 ) +|{!}| = 1. However, the argument passed to the call to set in contains has a stability classier of S. Therefore, level(4 B4C 0, y) is 0. The top frame of Figure <ref type="figure">3a</ref> (labeled "set: line 4") depicts 4 B4C . The top frame of Figure <ref type="figure">3b</ref> (labeled "set: line 4") depicts 4 B4C 0 .</p><p>Finally, these execution contexts and stability classiers can be used to calculate instability levels for the execution of the program in Listing 3. The stability classier of the branch on line 12 of indexOf is Uh{!}i, and the execution of this branch occurred within the runtime execution context 4 8=34G$ 5 . Therefore, the instability level of the execution of the conditional branching statement is depth(4 8=34G$ 5 ) +|{!}| = 1. The stability classier of the assignment statement on line 4 of set is Dh{~}i. The execution of the assignment statement occurred within the runtime execution context 4 B4C . Therefore, the instability level of the execution of the assignment statement is max 0 2{~} level(4 B4C , 0) = 1. Overall, the instability level of the execution of the conditional branching statement was less than or equal to that of the execution of the assignment statement. Thus, following the loop-relative stability heuristic, C would propagate along the control ow between the two statement executions.</p><p>As was done for the execution of the program in Listing 3, the execution contexts and stability classiers can be used to calculate instability levels for the execution of the program in Listing 4. The stability classier of the branch on line 12 of contains is Uh{!}i, and the execution of this branch occurred within the runtime execution context 4 2&gt;=C08=B . Therefore, the instability level of the execution of the conditional branching statement is depth(4 2&gt;=C08=B ) +|{!}| = 1. The stability classier of the assignment statement on line 4 of set is Dh{~}i. The execution of the assignment statement occurred within the runtime execution context 4 B4C 0 . So, the instability level of the execution of the assignment statement is max 0 2{~} level(4 B4C 0, 0) = 0. In this case, the instability level of the execution of the conditional branching statement was greater than that of the execution of the assignment statement. Therefore, unlike in the other execution, C would not propagate along the control ow between the two statement executions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">IMPLEMENTATION</head><p>Although our overall approach is language-agnostic and suitable to many languages, we chose Java as a target language, implementing C as an extension to P [9, 10], a Java taint tracking framework which propagates taint tags by rewriting Java bytecode using the ASM bytecode instrumentation and analysis framework <ref type="bibr">[55]</ref>. P is a state-of-the-art taint tracking tool upon which several software engineering tools have already been built <ref type="bibr">[13,</ref><ref type="bibr">35,</ref><ref type="bibr">38,</ref><ref type="bibr">64,</ref><ref type="bibr">68,</ref><ref type="bibr">69]</ref>. Implementing C as an extension to P allows C to be easily integrated with these tools and any future tools built on top of P. Furthermore, this choice allows C to support any existing features supported by P, such as P's "auto-tainting mode" which allows developers to specify "source" methods at which taint tags are automatically applied and "sink" methods at which taint tags are automatically checked <ref type="bibr">[11]</ref>. P creates a shadow variable for each local variable, object eld, and method argument to store taint information. It propagates control ows by adding a parameter of the type ControlFlowStack to methods' signatures. This allows P to use ControlFlowStack instances to track the scopes of control ows between method boundaries by passing a ControlFlowStack instance from the caller method to callee method as an argument. In a similar fashion, C uses ControlFlowStack instances to pass loop-relative stability information and execution contexts between methods.</p><p>We modied P to support custom control ow propagation policies by allowing users to extend P's default behavior at three phases: annotation, instrumentation, and runtime; we then used these extension points to implement C. During the annotation phase, the system is provided with a list containing a method's instructions and may insert annotations, special notes used to inform the instrumentation process, into this list. In the instrumentation phase, P traverses the instructions of a method forwarding any annotations it encounters to the extending system, informing the extending system when certain structures within the method are encountered, and allowing the extending system to add instructions to the method in response. Phosphor allows a system to modify behavior during the runtime phase by specifying a custom subclass of ControlFlowStack for use at runtime.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Annotation</head><p>Before a method is instrumented, C statically annotates its instructions with control ow information. These annotations mark static features of the method (e.g., the scopes of control ows) as well as properties of those features (e.g., stability classiers). When the method is instrumented, these features and properties are used to determine what code to generate.</p><p>C starts by adding annotations to delineate the binding scopes of control ows. The control ow graph of the method being annotated is created to identify branch edges that introduce control ows. Each branch edge that performs an equality check is marked as introducing a control ow that should be propagated at runtime. A branch edge is considered to perform an equality check if one of the following is true:</p><p>&#8226; it is the edge between a branching instruction conditioned on a non-null equality check and its branch target &#8226; it is the edge between a branching instruction conditioned on a non-null inequality check and the instruction following it &#8226; it is the edge between a switch instruction and the branch target associated with one of its non-default cases and no other case for that switch instruction has the same branch target &#8226; it is either of the edges out of a branching instruction conditioned on a boolean comparison The last condition, which deals with boolean comparisons, is a special case. Since there are only two possible values for a boolean, both edges out of the branch are traversed for only a single value. Thus, both edges correspond to equality checks. The Java Virtual Machine does not provide dedicated boolean instructions and instead uses instructions that operate on integer values <ref type="bibr">[44]</ref>. As a result, C will mark both edges of an integer conditional branch instruction for propagation if it statically determines that the instruction likely performs a boolean comparison.</p><p>Once branch edges have been marked for propagation, C annotates the start of the scope of the control ow associated with each of the edges. Then, it constructs the modied control ow graph described in Section 3.1, and uses Cooper et al. <ref type="bibr">[23]</ref>'s algorithm to calculate the dominance frontier of each replacement node (1 4 ) associated with each marked branch edge ( <ref type="formula">4</ref>). An annotation is added at the beginning of each basic block in the dominance frontier of a replacement node to indicate the end of the scope of the control ow associated with the node.</p><p>Next, C annotates program elements with their the stability classiers. The method being annotated is converted from Java bytecode into a register-based intermediate representation and then placed into SSA form using Cytron et al. <ref type="bibr">[25]</ref>'s algorithm. C maintains a direct correspondence between this intermediate representation and the original Java bytecode so that properties calculated on the intermediate representation can be used to annotate instructions in the original bytecode without having to convert out of the intermediate form. This intermediate representation is used to calculate the stability classiers of program elements in the method being analyzed (as described in Section 3.2.2). C labels each conditional branching statement, assignment statement, non-void return statement, method call receiver, method call argument, and method call return value location with its calculated stability classier.</p><p>Finally, C uses the control ow graph for the method to record loop information. Each method call is annotated with the number of natural loops that contains it within the method being annotated. Additionally, C records the exit points (i.e., the rst instruction of a basic block not contained within a particular loop whose predecessor is contained within that loop) of any loops within the method. These features of the method are used at runtime to calculate execution contexts and instability levels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Instrumentation and Runtime</head><p>During the instrumentation of the method, the annotations added by C are used to generate method calls to the ControlFlowStack, the structure P uses to store the taint tags of branches and propagate control ows <ref type="bibr">[11]</ref>. P's default behavior adds method calls to push the taint tag associated with each control ow's predicate at the start of its scope and pop any taint tags pushed for each control ow at the end of its scope as described in Section 2. C modies this behavior to support the loop-relative stability heuristic.</p><p>Before making a method call, the ControlFlowStack prepares an execution context for the call. The instability levels of the receiver, arguments, and return value location of the method call are calculated based on their stability classiers and the current method call's execution context. These instability levels are recorded by the ControlFlowStack. ControlFlowStack also calculates the number of loops containing the method call based on the number of loops within the current method containing the method call and the current method call's execution context.</p><p>At the start of the callee method, the ControlFlowStack initializes an execution context based on the values prepared by the caller method. Then, the ControlFlowStack pushes this new execution context onto a stack of execution contexts. Before a method exits, the current method call's execution context is popped from this stack. When the start of a control ow's scope is hit, the ControlFlowStack calculates the instability level of the control ow based on the conditional branching statement's stability classier and the current method call's execution context. Then, the ControlFlowStack records the taint tag of the predicate of the branch with the ow's instability level. At the end of the scope of a control ow, any taint tags recorded for the ow are cleared. As discussed in Section 3.2.1, in order to use instability levels, all control ows from a branch's predicate must be terminated as soon as the innermost loop relative to which that branch is not stable is exited. Thus, at the exit point of a loop any taint tags recorded at the instability level associated with that loop are removed from the ControlFlowStack.</p><p>When a non-void return statement or assignment statement executes, the ControlFlowStack calculates its instability level based on its stability classier and the current method call's execution context. The taint tags of any control ow recorded at an instability level less than or equal the instability level of the statement propagate to the program data being assigned a value by the statement.</p><p>At instrumentation boundaries, such as native code calls, the callee method will not receive a prepared execution context from the caller. In these cases, C assumes that no loops contain the method call. If there are no loops that contain the method call, there are no loops with respect to which the method call's receiver, arguments, and return value location could be non-stable. Thus, C uses an execution context 4 such that 34?C&#8984;(4) = 0 and level(4, 0) = 0 for all method elements 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">LIMITATIONS</head><p>Like prior work from Bao et al. <ref type="bibr">[8]</ref> and Kang et al. <ref type="bibr">[41]</ref>, C uses a heuristic approach to mitigate control-ow-related over-tainting. Therefore, malicious programs, those intentionally designed to circumvent analyses, are outside the scope of this work. Some applications of dynamic taint tracking, such as condentially enforcement, may be interested in analyzing malicious programs. However, many applications of dynamic taint tracking primarily consider non-malicious programs <ref type="bibr">[7,</ref><ref type="bibr">19,</ref><ref type="bibr">33,</ref><ref type="bibr">35,</ref><ref type="bibr">36]</ref>.</p><p>Additionally, since C relies on a heuristic rather than a sound or complete ow analysis it is dicult to determine how appropriate it is for a particular application. Furthermore, the heuristic we use relies on conservative assumptions about values loaded from arrays and elds, and it only takes into account direct uses of a local variable when determining its storage location stability. The loop-relative stability heuristic does not consider cycles introduced via recursion which could potentially impact its applicability to languages that favor recursion over loops. Nonetheless, in our evaluation of real-world Java programs, we found our approach to be quite eective. C does not address all sources of control-ow-related over-tainting. For example, if an element conditionally added to a resizable array triggers an array resize, then all of the elements copied from the original array to the new, larger array will be tainted with the label associated with the branch that triggered the element to be added to the resizable array. However, mature Java code typically uses System.arraycopy to copy elements from the original array to the new, larger array. System.arraycopy is a native method, and therefore not instrumented by P. P uses predened propagation logic for many native methods (regardless of the propagation policy). If this were not the case, there would likely be over-tainting in System.arraycopy. However, code performing this sort of array resizing is often contained within a single method. Therefore, it could be identied statically and handled specially by a taint tracking system.</p><p>Finally, C does not aim to address over-tainting from sources other than control-ow propagation, e.g., bit-packing and caches. This is an interesting topic for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">EVALUATION</head><p>We performed an evaluation of C to answer the following research questions: RQ1: How accurately does C propagate taint tags? RQ2: How does C's performance vary with input sizes? RQ3: How does C impact a concrete application of taint tracking? RQ1 and RQ2 examine the utility of C. For these questions we report metrics, like F1 score, which quantify the accuracy of C by comparing taint tags propagated by C against a ground truth. F1 score is calculated as TP TP+0.5&#8676;(FP+FN ) where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives. However, it is dicult to know for an arbitrary, real-world program which taint tags should propagate to which values and, therefore, challenging to specify a ground truth.</p><p>Past works have explored automated approaches for determining ground truth taint tag sets. For example, Bao et al. <ref type="bibr">[8]</ref> and Jee <ref type="bibr">[37]</ref> used automated approaches to determine ground truth taint tag sets in their evaluations of taint tracking systems. These automated approaches run the same program multiple times with dierent inputs and compare the outputs. Jee <ref type="bibr">[37]</ref> interpreted dierences in the outputs for dierent input values as indicating that taint tags from the input should have propagated to the output. Bao et al. <ref type="bibr">[8]</ref> assume that if dierent input values lead to the same output, the input's tag should not propagate to the output. However, it is infeasible to exhaustively explore any non-trivial input space. Thus, these approaches can only consider a sample of possible input values and their ecacy is tied to the quality of that sampling. This is problematic because automatically generating diverse samples from an input space is a challenging problem in its own right.</p><p>Furthermore, in ordered output, like text, dierent inputs may cause outputs to shift positions without impacting their actual values. This makes it challenging to determine which outputs were impacted by a particular change to the input. For example, consider a simple program that receives an HTTP request containing a "message" query parameter and returns an HTML document that contains that parameter. The inputs used in the execution in Figure <ref type="figure">4a</ref> and Figure <ref type="figure">4b</ref> dier only in a single character. However, the outputs dier in every character after "hello". An automated approach may misinterpret this change and expect the taint tag of the changed input character to propagate to every output character following "hello".</p><p>Due to these issues, we chose to not use an automated approach for determining expected taint tag sets and, instead, manually determined expected taint tag sets. Manually determining expected taint tag sets for arbitrary, real-world programs is error-prone and subjective. However, it is possible to leverage known properties of a program to determine a ground truth. This approach was used by McCamant and Ernst <ref type="bibr">[45]</ref> to evaluate Flowcheck, their system for measuring the amount of information that ows from secret program inputs to public outputs. One of the experiments conducted by McCamant and Ernst <ref type="bibr">[45]</ref> used bzip2, a lossless compression tool, as an evaluation subject. For their purposes, bzip2 was an appealing subject because the expected ow size for a valid input could easily be determined due to the nature of the tool. Specically, when presented with a compressible, secret input, all of bzip2's output (except a small amount of input-independent output like headers) will leak secret information because bzip2's compression algorithm is lossless.</p><p>Like McCamant and Ernst <ref type="bibr">[45]</ref>, we chose to leverage program properties to construct a ground truth for programs included in our benchmark for evaluating RQ1 and RQ2. Unfortunately, this limited the types of programs that could be included in the benchmark to those that met certain criteria. These criteria are discussed in detail in Section 6.2. However, the subjects used to evaluate RQ3 were not impacted by this limitation because we do not report quantitative metrics for RQ3 and, therefore, do not require a known ground truth.</p><p>RQ3 explores the practical impacts of C on an application of taint tracking. For this purpose, we chose to examine Clause and Orso <ref type="bibr">[19]</ref>'s approach for identifying failure-relevant inputs. The primary assessment originally performed by Clause and Orso <ref type="bibr">[19]</ref> was qualitative. Therefore, we decided to also provide a qualitative assessment in our case study instead of a quantitative one. This choice allowed us to be less constrained in our selection of subject applications for the case study since we no longer needed to determine a ground truth.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Experimental Setup</head><p>We evaluated C in comparison to three other propagation policies: , , and SCD, an implementation of the control ow semantics presented by Bao et al. <ref type="bibr">[8]</ref>. Each of these policies was implemented using P. All four of the policies propagate taint tags along data ows in a similar fashion to what was described by Bell and Kaiser <ref type="bibr">[10]</ref>. Additionally, all of the policies propagate taint tags from an index used to access an element of an array to the element accessed; this applies to both read and write accesses. Only the and policies propagate taint tags through INSTANCEOF operations as that instruction does not match the semantics targeted by C and SCD. C, , and SCD all propagate taint tags through pointer dereferencing. Each of the policies uses dierent semantics for propagating control ows. does not propagate along control ows. propagates along all control ows using the standard scoping semantics described in Section 2. SCD uses the standard scoping semantics, but only propagates along edges corresponding to equality checks. These edges are determined using the same rules described in Section 4.1.</p><p>All of our experiments were conducted on OpenJDK version 1.8.0_222. Tests for each propagation policy were run in a JVM that was instrumented according to the policy. Before a test started any taint tags on values in the JVM were cleared to ensure that taint tags from one test could not impact the results of another test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Benchmark</head><p>The benchmark we created to answer RQ1 and RQ2 consists of methods for encoding and decoding text drawn from the OpenJDK Java Class Library (version 1.8.0_222) <ref type="bibr">[54]</ref> and seven dierent real-world Java projects:</p><p>&#8226; Apache Commons Text (version 1.8) <ref type="bibr">[4]</ref> &#8226; Apache Commons Codec (version 1.14) <ref type="bibr">[3]</ref> &#8226; Bouncy Castle Provider (version 1.46) <ref type="bibr">[43]</ref> &#8226; Guava (version 28.2-jre) <ref type="bibr">[29]</ref> &#8226; jsoup (version 1.11.3) <ref type="bibr">[39]</ref> &#8226; Spring Web (version 5.2.5) <ref type="bibr">[57]</ref> &#8226; Tomcat Embed Core (version 9.0.19) <ref type="bibr">[5]</ref> To the best of our knowledge, DroidBench <ref type="bibr">[6]</ref> is the only existing Java taint tracking benchmark that contains tests that consider control ows. However, DroidBench only contains four tests involving control ows and was designed for evaluating static analyzers. Thus, we choose not to use it in our evaluation.</p><p>The programs featured in our benchmark for encoding and decoding text are all information preserving. Each of these programs transforms one representation of a sequence of abstract entities into another representation of the same sequence of abstract entities. An abstract entity is an atomic unit of information. For example, when escaping text for inclusion in HTML, the character &lt;, the character entity reference &amp;lt;, and the numeric character reference &amp;#60; all represent the same abstract entity. The information-preserving nature of the programs included in our benchmark provides a clear ground truth: the expected set of taint tags for a program output contains the taint tags of the program inputs that represent the same abstract entity that the output represents. For example, a program might perform percent encoding on the string :@ resulting in the output string %3A%40. The rst percent-encoded octet, %3A, represents the same abstract entity represented by the input character :. The second percent-encoded octet, %40, represents the same abstract entity represented by the input character @. Thus, the expected label set for each of the characters in the rst octet would contain the unique label assigned to :, and the expected label set for each of the characters in the second octet would contain the unique label assigned to @. Each method selected for the benchmark we created had to meet several criteria. First, the method had to be deterministic. For valid input, the output of the method had to represent the same sequence of abstract entities as the input. A taint tracking policy that does not propagate control ows should not produce false positives when tracking taint tags through the method. Likewise, a taint tracking policy that propagates along every control ow using the standard semantics described in Section 2 should not produce false negatives when tracking taint tags through the method. This limits the scope of the benchmark to situations in which observed under-tainting or over-tainting is likely related to control ow propagation, as opposed to other sources of imprecision such as caches and bit-packing.</p><p>Every test in the benchmark uses one of the selected methods and follows the same general format. The test starts by creating an input representing a sequence of abstract entities appropriate for the method. Each character (or byte in the case of hex encoding) in the input is assigned a unique taint label. The test then transforms the tainted input using the target test method. The taint label set that is propagated to each character (or byte in the case of hex decoding) of the method's output is compared to the set of labels expected for that element. Labels in both the propagated and expected sets are counted as true positives. Labels in the expected set, but not the propagated set are counted as false negatives. Labels in the propagated set, but not the expected set are counted as false positives.</p><p>In some cases, a single method is used in multiple tests (e.g., the method PercentCodec.encode() from Apache Commons Codec is used in tests in the groups unicode-percent-decode and reservedpercent-decode). In these instances, we choose to separate dierent transformations performed by a single method into multiple tests so that their results could be more directly compared to a dierent implementation of the same transformation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">RQ1: Accuracy</head><p>In order to compare the accuracy of , , SCD, and C, we applied each of the propagation policies to the tests in our benchmarks. For each test we created an input sequence of 8 abstract entities (RQ2 considers dierent length inputs). We recorded the number of false negatives, false positives, and true positives that were reported by each propagation policy on each test.</p><p>Table <ref type="table">1</ref> presents our ndings for the dierent propagation polices on the benchmarks. Overall, C had the highest F1 score on a majority of the tests, 43 out of the 48 total tests. had the highest F1 score on 23 tests. SCD had the highest F1 score on 21 tests. And nally, had the highest F1 score on only 3 tests. Due to the selection criteria for the methods used in the benchmark, reports no false positives and reports no false negatives. In some of the tests, fails to propagate any tags to the output because there are no data ows between the input and the output. For example, in the html-escape test using jsoup's Entities class, input values ow into a switch statement which selects a constant string to append to the output; all of the information that ows between the input and the output is transferred through the switch statement. In other tests, does report some or all of the possible true positives because data ows are able to fully or partially capture the relationship between the input and the output. For instance, for tests in the reserved-percent-encode group, reports two true positives for every false negative. This occurs because tests in this group take a sequence of URI reserved characters and encode them using percent-encoded octets (e.g., the character @ would end up encoded as %40). There is a data ow between the value of the input character and the two hex digits in the octet, but there is only a control ow between the input character and the percent sign. Thus, correctly tracks the relationship between an input character and the two hex digits of the output octet. However, it misses the relationship between an input character and the percent sign of the output octet. By contrast, never missed a relationship between an input and an output, but reported a relatively large number of false positives on all but three tests. In many cases, marked all of the inputs as being related to all of the outputs.</p><p>SCD reported relatively few false negatives, and 102 out of the 132 of these false negatives occurred in tests from the unicode-percent-encode group. Tests in the unicode-percent-encode group take a sequence of non-ASCII characters and transform them into UTF-8, percent-encoded octets. Typical implementations of this transformation determine whether an input character needs to be encoded either by checking if it falls into some value range, is outside some value range, or is not present in some set of values that do not need to be encoded. These checks are generally not equality checks, so SCD does not propagate along the branches associated with them resulting in under-tainting. C also under-taints in these tests for the same reason. C only reported false positives in two tests, as opposed to the 28 tests in which SCD reported false positives and the 45 tests in which reported false positives. Both of these two tests have the same problematic ow. A simplied version of this ow is shown in Table <ref type="table">1</ref>. Comparison of , , SCD, and C on our benchmark. Each row reports results for a single test. Tests are grouped by the type of transformation they perform. For each of the propagation policies, we report the number false negatives (FN), the number of false positives (FP), the number of true positives (TP), and the F1 score (F1) recorded for each test. The highest F1 score or scores for each test are colored purple. Listing 5. C marks both the branch on line 6 and the statement on line 12 as unstable with respect to all loops that contain them. Thus, C propagates taint tags from the predicate of the branch on line 6 to the assigned value on line 12, c, resulting in over-tainting. In this case, the loop header (c == % ) is the source of the ow, rather than the ow occurring within the body of the loop. Had the loop header instead been written as input[inputPosition] == % , then the load would have been considered as outside of the binding scope of the loop header, and C would not have over-tainted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4">RQ2: Accuracy Versus Input Size</head><p>Taint tags often accumulate on program data over loop iterations leading to progressively more over-tainting on each iteration. Many common applications of taint tracking (e.g., fuzzing guidance) tend to use relatively large inputs which often trigger a large number of loop iterations. In these cases, it may be impractical to use the standard semantics for control ow propagation due to the "explosion" of taint tags resulting from their accumulation in loops. To evaluate whether C could be used to address this issue, we applied the propagation policies (, , SCD, and C) to our benchmark with inputs of various lengths <ref type="bibr">(8,</ref><ref type="bibr">16,</ref><ref type="bibr">32,</ref><ref type="bibr">64,</ref><ref type="bibr">128,</ref><ref type="bibr">256,</ref><ref type="bibr">512</ref>, and 1024 abstract entities). We then recorded the F1 score for each policy on each test for each of the lengths and produced a series of plots. We examined these plots to determine how the F1 score for each policy changed as the size of the input scaled. Figure <ref type="figure">5</ref> shows four of these plots that represent common cases seen across many of the plots.</p><p>In all of the tests the F1 score for was constant as the size of the input increased. The behavior of the F1 scores for the other three policies either stayed constant, decreased to some non-zero value, or decreased to zero as the size of the input increased. We divided tests into categories based on how the F1 score reported for the dierent policies changed as the size of the input increased.</p><p>There were 3 tests where the F1 scores for all of the policies remained constant as the input size increased. In 17 tests, the F1 scores for , SCD, and C remained constant, but the F1 scores for decreased to zero. A plot for one of these tests is depicted in Figure <ref type="figure">5a</ref>. In 6 tests, the F1 scores for and C remained constant, the F1 scores for decreased to zero, and the F1 scores for SCD decreased to some non-zero value. A plot for one of these tests is depicted in Figure <ref type="figure">5b</ref>. In 20 of the tests, the F1 scores for and C remained constant, but the F1 scores for and SCD decreased to zero. A plot for one of these tests is depicted in Figure <ref type="figure">5c</ref>. There were only 2 tests in which the F1 scores for remained constant, and the F1 scores for , SCD, and C decreased to zero. A plot for one of these tests is depicted in Figure <ref type="figure">5d</ref>.</p><p>Overall, C's F1 score stayed constant in all but 2 tests, similar to . By contrast, the F1 score of and SCD typically degraded as scaled. this C's control ow tracking behaved more similarly to data ow tracking than the control ow tracking performed by and SCD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ3: Impact on a Concrete Application of Taint Tracking</head><p>To examine the eect of C on a practical application of taint tracking, we implemented a prototype of Clause and Orso <ref type="bibr">[19]</ref>'s approach for identifying which failure-inducing inputs (i.e., inputs that produce a failure) are failure-relevant (i.e., useful for analyzing the failure). We used this prototype to perform a case study exploring the impact of C on Clause and Orso <ref type="bibr">[19]</ref>'s approach. We conducted an experiment similar to the one originally performed by Clause and Orso <ref type="bibr">[19]</ref>. In this experiment, we provide a qualitative assessment of the failure-inducing inputs that were marked as relevant by the dierent propagation policies, , , SCD, and C, on ve failures from popular, open-source Java projects. Our assessment is limited to the impact of the propagation policies on the values marked as failure-relevant; it does not examine the usefulness of Clause and Orso <ref type="bibr">[19]</ref>'s approach in general since that was explored in the original work.</p><p>Table <ref type="table">2</ref> details the ve failures that we included in our case study. Each of the these failures was chosen from an issue reported in a project's issue tracker that described a system failure. Only issues in which the system accepted a human-interpretable input or inputs were considered since it would otherwise be dicult to apply Clause and Orso <ref type="bibr">[19]</ref>'s approach. For the sake of SCD C simplicity, we only selected issues in which the described failure could be consistently reproduced. Furthermore, we only selected issues that had already been xed in a single, associated commit in order to facilitate analysis of the failure. A similar criterion was used by Just et al. <ref type="bibr">[40]</ref> in the selection of bugs for Defects4J, a widely-used database of real Java defects. We used these xing commits along with the issues led in the issue trackers to construct appropriate failure-inducing inputs for each failure.</p><p>Additionally, for each of the failures, we identied developer comments made in either the xing commit or the issue that discussed the conditions that produced the failure. These comments reect the developers' understanding of the failure and the conditions under which the failure manifests. However, mapping these conditions to specic portions of the input is subjective, and the developers' understanding of the failure may be awed. Thus, these developer-identied failure conditions cannot be used as a ground truth for the failure-relevant portions of an input. For each of the failures, we also created a simplied, failure-inducing input using a combination of random trials of reduced inputs and Zeller and Hildebrandt <ref type="bibr">[73]</ref>'s ddmin algorithm. The ddmin algorithm produces a failure-inducing input that is guaranteed to be "1-minimal", i.e., removing any single element would cause the input to no longer induce the failure; however, the simplied input is not guaranteed to be minimal <ref type="bibr">[73]</ref>. Even though a simplied, input can be useful in understanding a failure, it does not necessarily correspond to the the failure-relevant portions of the original, failure-inducing input. For example, even a minimized, failure-inducing input can contain portions of the input that are necessary to pass a validation step, but not necessarily useful for investigating the failure. While neither developer-identied failure conditions nor simplied failure-inducing inputs can be used as a ground truth for the failure-relevant portions of an input, they both provide valuable insights into the nature of a failure. Therefore, we used these insights to guide our qualitative assessment of the failure-inducing inputs that were marked as relevant by the dierent propagation policies.</p><p>Table <ref type="table">2</ref>. Evaluation subjects used in the RQ3 case study. For each subject, we list the name and version of the project (Project), the issue in which the failure was reported (Issue), and the commit in which the bug that produced the failure was corrected (Fix).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Project Issue Fix</head><p>Checkstyle (version 8.37)</p><p>[16] #8934 <ref type="bibr">[15]</ref> 70c7ae0 [14] Google Closure Compiler (version v20140814) <ref type="bibr">[22]</ref> #652 <ref type="bibr">[21]</ref> aac5d11 <ref type="bibr">[20]</ref> Mozilla Rhino (version 1.7.11) <ref type="bibr">[48]</ref> #539 <ref type="bibr">[47]</ref> 0c0bb39 <ref type="bibr">[46]</ref> OpenRene (version 3.4-SNAPSHOT)</p><p>[53] #2584 <ref type="bibr">[52]</ref>  Our prototype implementation applies a unique taint tag to each character input presented to the applications. These taint tags are propagated in accordance with a particular propagation policy as described in Section 6.1. In addition to this policy-based propagation, our prototype employs special propagation logic for exceptions. If a Java exception is thrown by the execution of an instruction (as detailed in the Java Virtual Machine Specication <ref type="bibr">[44]</ref>), then our prototype propagates the taint tags of the operands of that instruction to the exception. Any input values associated with taint tags that propagate to the exception that produces the studied system failure are marked as failure-relevant.</p><p>For each of the bugs that we analyzed, we display the entire program input with annotations that specify which portions of the input were marked as failure-relevant by each policy. These visualizations demonstrate the impact of propagation policies on a concrete software engineering application of taint tracking. Across all ve examples, marks only a single character of input as failure-relevant, demonstrating the need to propagate taint tags along control ows in this application. Meanwhile, marks almost every input character as failure-relevant, underscoring the consequences of control-ow-related over-tainting. We discuss in detail the input values marked as failure-relevant by each policy for each of the studied failures below. Listing 10. The simplified, failure-inducing SQL input for the H2 failure reported in issue #2550 [31] created as described in Section 6.5. Whitespace characters have been added to improve the readability of the input.</p><p>Listing 11. The simplified, failure-inducing Java input for the Checkstyle failure reported in issue #8634 <ref type="bibr">[15]</ref> created as described in Section 6.5. Whitespace characters have been added to improve the readability of the input.</p><p>Checkstyle. Checkstyle is a static analysis tool that nds and reports violations of coding standards in Java code <ref type="bibr">[16]</ref>. We studied the failure reported in Checkstyle issue #8934 <ref type="bibr">[15]</ref>. A Checkstyle developer described this failure by saying, "FinalLocalVariable throws a NPE on Switch expression in assignment" <ref type="bibr">[15]</ref>. However, in the xing commit for the failure, a dierent Checkstyle developer noted that, "assigning to [the] switch is not the problem . . . wrapping [the] switch inside a function foo() makes the problem disappear" <ref type="bibr">[14]</ref>. These developer comments indicate that the failure reported in Checkstyle issue #8934 occurs when the "FinalLocalVariable" rule is applied to a Java source code class containing a switch expression that is not contained within a method-level or block-level scope <ref type="bibr">[14,</ref><ref type="bibr">15]</ref>.</p><p>The failure-inducing input that we used to reproduce the failure reported in Checkstyle issue #8934 was a Java source code class based on the Java source code class included in issue #8934 <ref type="bibr">[15]</ref>. Figure <ref type="figure">6</ref> shows the input Java source code class in its entirety along with the portions of the input identied as failure-relevant by each propagation policy. Listing 11 depicts the simplied, failureinducing input produced for the failure-inducing input in Figure <ref type="figure">6</ref>. This simplied, failure-inducing input consists of a Java class declaration containing a eld declaration that initializes the eld to be of the input that are not likely to be helpful to developers trying to debug the failure. C reports fewer of these regions than SCD.</p><p>Google Closure Compiler. The Google Closure Compiler is a tool for compiling and optimizing JavaScript code <ref type="bibr">[22]</ref>. The failure we selected from the Google Closure Compiler was reported in issue #652 <ref type="bibr">[21]</ref>. In the x for this failure, a Closure developer noted that Closure should "report a parse error if there is a throw followed by a semicolon or newline" <ref type="bibr">[20]</ref>. This developer continued by noting that according to grammar for JavaScript, "'throw;' or 'throw n expr;' are illegal" <ref type="bibr">[20]</ref>. These developer comments indicate that the failure manifests when Closure compiles code containing a malformed throw statement where the throw keyword is followed by a semicolon or newline and not an expression.</p><p>We reproduced the failure reported in issue #652 using JavaScript source code based on the JavaScript source code provided in the original issue. Figure <ref type="figure">7a</ref> shows the input JavaScript source code in its entirety along with the portions of the input identied as failure-relevant by each propagation policy. Listing 6 depicts the simplied, failure-inducing input produced for the failureinducing input in Figure <ref type="figure">7a</ref>. The simplied, failure-inducing input consists of just the keyword throw. Both the developer-identied failure conditions and the simplied, failure-inducing input indicate that the failure described in issue #652 is triggered by the malformed throw statement on line 9 of Figure <ref type="figure">7a</ref>.</p><p>As in the Checkstyle failure, did not mark any portions of the input as failure-relevant.</p><p>, SCD, and C all marked the malformed throw statement as failure-relevant. However, also marked almost every other input character as failure-relevant. SCD and C report some additional regions of the input that may obfuscate the cause of the failure. Once again, C reports fewer of these regions than SCD. Mozilla Rhino. Mozilla Rhino is an implementation of JavaScript written in Java <ref type="bibr">[48]</ref>. Rhino includes a compiler for translating JavaScript source code into Java class les. Issue #539 <ref type="bibr">[47]</ref> describes the failure that we examined. In the x for this failure, a Rhino developer noted that the failure "cropped up when generators were used in a function that had a try..catch..nally block and a yield after the nally" <ref type="bibr">[46]</ref>. This comment indicates that the failure manifests when Rhino tries to compile JavaScript source code that contains generator function legacy generator function in a expression is present after finally block.</p><p>reproduced the failure in issue #539 using JavaScript source code based on a test case that was added in the commit that xed the failure. Figure <ref type="figure">7c</ref> shows the input JavaScript source code in its entirety along with the portions of the input identied as failure-relevant by each propagation policy. Listing 9 depicts the simplied, failure-inducing input produced for the failure-inducing input in Figure <ref type="figure">7c</ref>. The simplied, failure-inducing input contains a legacy generator function with a yield expression following a try-finally statement. Both the developer-identied failure conditions and the simplied, failure-inducing input indicate that the failure described in issue #539 is related to the finally on line 6 and the yield on line 11 of Figure <ref type="figure">7c</ref>.</p><p>As shown in Figure <ref type="figure">6</ref>, did not mark any portions of the input as failure-relevant and marked almost every input character as failure-relevant. Unexpectedly, SCD and C often marked only part of a keyword as failure relevant, for example, both policies marked only the "i" in finally as failure relevant. This was due to the structure of the code that Rhino uses for lexing, converting characters sequences into tokens, JavaScript inputs <ref type="bibr">[48]</ref>. Part of this code is displayed in Listing 12. In the case of the keyword finally, there is not control ow from all of the characters of the input string "nally" to the token produced from it; there is only a ow from the character "i".</p><p>OpenRene. OpenRene is a tool for manipulating, managing, and visualizing data <ref type="bibr">[53]</ref>. We studied the OpenRene failure reported in issue #2584 <ref type="bibr">[52]</ref>. A developer for OpenRene described H2 Database Engine. H2 is a Relational Database Management System (RDBMS) implemented in Java that supports a subset of Structured Query Language (SQL) <ref type="bibr">[32]</ref>. H2 issue #2550 <ref type="bibr">[31]</ref> details the failure that we selected. In the xing commit for this failure, an H2 developer described the failure as a "NullPointerException with MERGE containing unknown column in AND condition of <ref type="bibr">WHEN" [30]</ref>. This comment indicates that the failure occurs when H2 tries to execute a MERGE statement containing a WHEN NOT MATCHED clause with an AND expression that refers to an unknown column <ref type="bibr">[30,</ref><ref type="bibr">31]</ref>.</p><p>We reproduced the failure reported in issue #2550 using the SQL statements provided in issue #2550 <ref type="bibr">[31]</ref>. Figure <ref type="figure">7e</ref> shows the input SQL statements entirety along with the portions of the input identied as failure-relevant by each propagation policy. Listing 10 depicts the simplied, failure-inducing input produced for the failure-inducing input in Figure <ref type="figure">7e</ref>. The simplied, failureinducing input contains a CREATE TABLE statement and a MERGE statement. Both the developeridentied failure conditions and the simplied, failure-inducing input indicate that the command MERGE, the clause WHEN NOT MATCHED, the keyword AND, and the unknown reference, "s.b", that appears on line 8 of Figure <ref type="figure">7e</ref> are relevant to the failure.</p><p>As depicted in Figure <ref type="figure">7e</ref>, did not mark any portions of the input as failure-relevant. Conversely, reported the entire input as failure-relevant. SCD and C only marked portions of the MERGE statement as failure-relevant. SCD marked almost the entire MERGE statement as failure-relevant. C only marked small portions of the MERGE statement as relevant; it is unclear whether these smaller portions better illuminate the cause of the failure.</p><p>Summary. When considering the input values marked as failure-relevant by each propagation policy, it is clear that and are unlikely to be useful to a developer attempting to debug a failure. This underscores the need for alternative taint tag propagation semantics. The portions of each input marked as failure-relevant by SCD and C appear to be more useful for analyzing the failures. SCD tended to mark more characters as failure-relevant than C and, in some cases, marked most of the input as failure-relevant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.6">Threats to Validity</head><p>One threat to the validity of our experiments stems from our selection of evaluation subjects for the benchmark. Our benchmark tests a limited number of methods from only a handful of projects. As discussed in Section 6, it is challenging to determine which taint tags should propagate to which values for an arbitrary, real-world program. Thus, only methods that met certain criteria (detailed in Section 6.2) could be included in the benchmark. As a result, the benchmark is not necessarily representative of all Java programs. However, we selected these methods based on a search for popular Java libraries.</p><p>Additionally, the ground truth expected label sets we used for the benchmark may not be appropriate for every application of taint tracking. For example, in some applications it could be desirable for propagated labels to reect looser or stronger relationships than those reected in our ground truth. Nonetheless, we feel that our ground truth selection follows best practices of state-of-the-art taint tracking evaluations <ref type="bibr">[45,</ref><ref type="bibr">56]</ref>.</p><p>Unlike the benchmark, the case study evaluation that we performed did not require a manually specied ground truth. However, the case study evaluation was limited to a single application of taint tracking and examined a limited number of subjects and failures. Therefore, it may not generalize to other applications of taint tracking or other subjects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">RELATED WORK</head><p>Control ow tracking approaches. Several existing taint tracking systems do not oer support for control ow tracking <ref type="bibr">[17,</ref><ref type="bibr">50,</ref><ref type="bibr">58]</ref>. However, some systems support the standard semantics discussed in Section 2 <ref type="bibr">[11,</ref><ref type="bibr">12,</ref><ref type="bibr">18]</ref>. Several of these systems attempt to address control-ow-related over-tainting. Bao et al. <ref type="bibr">[8]</ref> propose propagating control ows only along branches that correspond to equivalence checks. Kang et al. <ref type="bibr">[41]</ref> put forward a similar approach, but instead target branches related to information-preserving transformations. These branches are determined by analyzing execution traces to nd control ow paths that can only be reached by a single input value. Attariyan and Flinn <ref type="bibr">[7]</ref> mitigate over-tainting from control ows by reducing the weight of a taint tag when it is propagated via a control ow. Cox et al. <ref type="bibr">[24]</ref> address control-ow-related over-tainting in their approach for preventing Android applications from leaking users' passwords by quantifying the amount of information revealed by control ows. This quantity is then used to decide whether to propagate taint tags along a control ow. Unlike these approaches, C uses an alternative denition for control ow scopes and propagates to a subset of statements within control ows' scopes.</p><p>Applications of control ow tracking. Both static and dynamic information ow analyses have been applied to a variety of applications in which control ows could signicantly impact results. Sabelfeld and Myers <ref type="bibr">[60]</ref> explore approaches to security-type systems and semanticsbased security models for enforcing information-ow condentiality policies. They focus on a noninterference policy for condentiality which is highly conservative and must therefore take implicit ows, such as control ows, into account. McCamant and Ernst <ref type="bibr">[45]</ref> propose measuring the maximum ow of secret information with a network ow capacity model instead of using a taint tracking approach to condentiality enforcement. This quantitative approach may support dierent techniques for addressing control ows than traditional taint tracking. Halfond et al. <ref type="bibr">[33]</ref> provide an automated technique for detecting and preventing SQL injection attacks using "positive-tainting". Positive-tainting tracks the ow of trusted values opposed to the more common approach of "negative" tainting which tracks untrusted values. However, control ows can result in malicious values being built from trusted ones; this issue is not addressed by Halfond et al. <ref type="bibr">[33]</ref>. Clause and Orso <ref type="bibr">[19]</ref> present Penumbra, a tool for identifying the subset of a failure-inducing inputs that are failure-relevant in order to assist with program debugging. They found that for some programs propagating taint tags along control ows resulted in a prohibitively large number of program inputs being marks as failure-relevant. Huo and Clause <ref type="bibr">[36]</ref> use dynamic taint analysis to identify test cases that check too much of the program state making them dicult to maintain and test cases that check too little of the program state reducing their ability to detect bugs. Their approach requires both data and control ows to be tracked. Various existing fuzzing tools leverage taint tracking to generate "interesting" inputs capable of nding bugs deep in a program's execution <ref type="bibr">[28,</ref><ref type="bibr">59,</ref><ref type="bibr">70]</ref>. To the best of our knowledge, none of these tools propagate taint tags along control ows. However, we believe that these tools could likely benet from applying C's control ow propagation semantics.</p><p>Evaluating taint tracking systems. An assortment of techniques have been used to evaluate the accuracy of taint tracking systems. Jee <ref type="bibr">[37]</ref>'s tool, TaintMark, looks at system outputs when given dierent input values to determine if taint tags should propagate from the inputs to the outputs. Dierences in the outputs for dierent input values are interpreted as meaning that taint tags from the input should have propagated to the output. By contrast, ReproDroid, Pauck et al. <ref type="bibr">[56]</ref>'s framework for comparing Android taint analysis tools, requires the ground truth for test cases to be manually classied. Pauck et al. <ref type="bibr">[56]</ref> note that this manual determination of the ground truth is necessary because, "tools that could potentially be used to derive the ground truth are at the same time the tools we want to evaluate." Inspired by Pauck et al. <ref type="bibr">[56]</ref>, our evaluation does not use an automatically-determined ground truth. Other evaluations have also used application specic techniques <ref type="bibr">[7,</ref><ref type="bibr">35,</ref><ref type="bibr">36,</ref><ref type="bibr">59]</ref>.</p><p>Various studies have considered the impact of propagating implicit and control ows on taint tracking results. Staicu et al. <ref type="bibr">[66]</ref> investigate the prevalence of implicit ows and the criticality of detecting implicit ows when using dynamic taint tracking to enforce security and privacy policies in JavaScript applications. They conclude that it is sucient to consider only data ows in order to detect security-related source-to-sink ows, but in order to discover privacy-related source-to-sink ows, implicit ows also needed to be considered. King et al. <ref type="bibr">[42]</ref> examine explicit and implicit ows that are reported by a security-typed language enforcing noninterference. They found that implicit ows caused the majority of true positives reported, but also caused a large number of the false positives. Additionally, they found that the vast majority of exception-induced ows due to unchecked exceptions were false alarms that could not occur at runtime. Exception-induced information ows are beyond the scope of this work. Clause et al. <ref type="bibr">[18]</ref> explore the relationship between dierent taint propagation approaches and the amount of memory tainted nding that propagating control ows resulted in signicantly more tainted memory. However, they did not evaluate the accuracy of dierent propagation policies.</p><p>Dynamic slicing. One related technique to taint tracking is dynamic slicing <ref type="bibr">[67,</ref><ref type="bibr">71]</ref>. Dynamic slicing computes the subset of program statements that aected values at a particular program point for a particular program execution or executions, referred to as a "slice". Like taint tracking, slicing aims to reason about relationships in programs. However, slicing relates values to statements whereas, taint tracking relates values to other values.</p><p>Early work on dynamic slicing examined imprecision related to inaccurate dynamic dependence calculations. These errors were caused by analyses that did not distinguish between dierent executions of the same instruction. Dynamic taint tracking systems typically track control ows using the stack-based approach described in Section 2. This stack-based approach accurately calculates dynamic dependences and is not subject to the inaccuracy of early dynamic slicers <ref type="bibr">[72]</ref>. Agrawal and Horgan <ref type="bibr">[1]</ref> introduces the notion of the "Dynamic Dependence Graph" (DDG) which contains a node for each instruction execution and edges between instruction executions that are dynamically dependent. They propose a technique for calculating precise dynamic slices by using DDGs and more ecient technique that uses a compacted version of the DDG. Zhang et al. <ref type="bibr">[76]</ref> improve upon the work of Agrawal and Horgan <ref type="bibr">[1]</ref> by using novel data structures that reduce the computational cost of constructing the DDG and leveraging an on-demand construction of DDGs to reduce memory usage. Zhang and Gupta <ref type="bibr">[75]</ref> explore the space and time performance benets of leveraging a novel, highly compact representation of the DDG. Unlike works on precise dynamic slicing, inaccurate dynamic dependence computations are not the source of the control-ow-related imprecision addressed in this work. Instead, C aims to avoid propagating along control ows arising from dynamic dependences that are not likely to be information preserving despite being genuine dynamic dependences.</p><p>Although not identical to the control-ow-related over-tainting problem that C aims to address, prior work on slicing has proposed techniques for reducing the size of slices to better support automated debugging and program understanding tasks. Zhang et al. <ref type="bibr">[74]</ref> use a heuristic approach based on the correctness of outputs computed using a statement to identify and remove statements that are unlikely to be related to a fault from computed slices. In contrast to Zhang et al. <ref type="bibr">[74]</ref>'s approach, C's heuristic is suitable for applications other than fault analysis and does not require an oracle for determining the correctness of outputs. A related technique, thin slicing, was proposed by Sridharan et al. <ref type="bibr">[65]</ref>. Thin slicing only includes statements which are part of some sequence of assignments that compute and copy a value to a target location in the slice for the target location <ref type="bibr">[65]</ref>. Whereas C uses binding scopes and the loop-relative stability heuristic to identify control ow relationships that are unlikely to be information preserving, thin slicing simply excludes all control dependences in its slice construction. An interesting topic for future work might be to apply C's notion of binding scopes and loop-relative stability to slicing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">CONCLUSION</head><p>Techniques that require high-precision, dynamic taint tracking are highly prevalent in software engineering research <ref type="bibr">[7,</ref><ref type="bibr">27,</ref><ref type="bibr">28,</ref><ref type="bibr">35,</ref><ref type="bibr">36,</ref><ref type="bibr">49,</ref><ref type="bibr">59,</ref><ref type="bibr">61,</ref><ref type="bibr">63,</ref><ref type="bibr">70]</ref>. Many of these techniques would greatly benet from accurately propagated control ow relationships. However, the standard control ow propagation semantics are mismatched with the standard data ow semantics. This mismatch often results in severe over-tainting making the standard control ow propagation semantics impractical for most applications. Prior approaches to mitigate this over-tainting fail to address many of its fundamental sources. C, our alternative control ow propagation semantics, decreases the scope of control ows and leverages a novel heuristic, loop-relative stability, to determine whether a control ow's taint tags should propagate to a particular statement. We compared C to three other control ow propagation policies on a benchmark containing 48 tests consisting of programs for encoding and decoding text. C had the highest F1 score on 43 out of the 48 total tests when using test inputs of a xed size. Additionally, when the size of test inputs scaled, C's F1 score remained constant on all but 2 of the 48 tests indicating that C helped to mitigate taint explosions associated with large inputs. We also examined the impact of C of a concrete application of taint tracking, automated debugging. C and the experiments described in this paper are publicly available under the BSD 3-Clause License <ref type="bibr">[34]</ref>.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2021.</p></note>
		</body>
		</text>
</TEI>
