<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Principled Data-Driven Decision Support for Cyber-Forensic Investigations</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/27/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10429966</idno>
					<idno type="doi">10.1609/aaai.v37i4.25628</idno>
					<title level='j'>Proceedings of the AAAI Conference on Artificial Intelligence</title>
<idno>2159-5399</idno>
<biblScope unit="volume">37</biblScope>
<biblScope unit="issue">4</biblScope>					

					<author>Soodeh Atefi</author><author>Sakshyam Panda</author><author>Emmanouil Panaousis</author><author>Aron Laszka</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In the wake of a cybersecurity incident, it is crucial to promptly discover how the threat actors breached security in order to assess the impact of the incident and to develop and deploy countermeasures that can protect against further attacks. To this end, defenders can launch a cyber-forensic investigation, which discovers the techniques that the threat actors used in the incident. A fundamental challenge in such an investigation is prioritizing the investigation of particular techniques since the investigation of each technique requires time and effort, but forensic analysts cannot know which ones were actually used before investigating them. To ensure prompt discovery, it is imperative to provide decision support that can help forensic analysts with this prioritization. A recent study demonstrated that data-driven decision support, based on a dataset of prior incidents, can provide state-of-the-art prioritization. However, this data-driven approach, called DISCLOSE, is based on a heuristic that utilizes only a subset of the available information and does not approximate optimal decisions. To improve upon this heuristic, we introduce a principled approach for data-driven decision support for cyber-forensic investigations. We formulate the decision-support problem using a Markov decision process, whose states represent the states of a forensic investigation. To solve the decision problem, we propose a Monte Carlo tree search based method, which relies on a k-NN regression over prior incidents to estimate state-transition probabilities. We evaluate our proposed approach on multiple versions of the MITRE ATT&CK dataset, which is a knowledge base of adversarial techniques and tactics based on real-world cyber incidents, and demonstrate that our approach outperforms DISCLOSE in terms of techniques discovered per effort spent.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Cybersecurity is an ever growing concern for businesses and individuals alike. While it is not always possible to prevent cybersecurity incidents, one should always strive to mitigate them promptly and to learn from them as much as possible. Thus, it is imperative in the aftermath of an incident to promptly discover the techniques that the threat actors used to breach security. By discovering how threat actors operate, we can learn to develop more effective cyber defences and detect-sufficiently early-cyber-attacks that bypass these defences. To this end, victims of cyber incidents can launch cyber-forensic investigations.</p><p>Since we do not know in advance which adversarial techniques the threat actors used, we have to prioritize the investigation of these techniques under uncertainty, not knowing if the investigation of a technique will waste precious time or reveal crucial information. In prior work, <ref type="bibr">Nisioti et al. (2021a)</ref> demonstrated that one can effectively prioritize the investigation of techniques based on datasets of prior incidents, exploiting the fact that most threat actors follow common tactics. However, this existing data-driven approach is based on a heuristic that does not approximate optimal decisions, regardless of the size of the dataset or the available computational power.</p><p>To address this limitation, we propose a principled datadriven approach for prioritization. We introduce a novel model of cyber-forensic investigations based on Markov decision processes, which enables us to formally define the prioritization problem. Note that this problem is inherently challenging for multiple reasons. First, datasets of prior incidents are limited in size as many businesses are reluctant to share detailed information about their security. Second, although threat actors follow common tactics, it is inherently difficult to predict which particular techniques were used in a particular incident.</p><p>In light of these challenges, we propose a computational approach for decision support that combines a nonparametric machine-learning model, based on k-nearest neighbor, with a Monte Carlo tree search. By working directly with the data instead of training a parametric model, we get as much "mileage" out of the limited data as possible.</p><p>The remainder of this paper is organized as follows. In Section 2, we model cyber-forensic investigations as Markov decision processes and formulate the problem of cyber-forensic decision support. In Section 3, we describe our computational approach, which is based on k-NN regression and Monte Carlo tree search. In Section 4, we evaluate our approach on datasets of real-world incidents, demonstrating that it outperforms the state-of-the-art approach. In Section 5, we give a brief overview of related work. Finally, in Section 6, we provide concluding remarks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Model and Problem Formulation</head><p>To formally define the problem of providing optimal decision support for cyber forensics, we first introduce fundamental assumptions and notations for the key elements of </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Adversarial Techniques and Cyber Forensics</head><p>Adversarial Techniques We consider a forensic investigation whose objective is to discover-as soon as possiblewhat techniques the threat actors used in the incident that is under investigation. The motivation for discovering how the threat actors breached security is to learn from the incident and to prevent or mitigate future breaches. We let A denote the set of all techniques that threat actors may have used in an incident (e.g., DLL search order hijacking, spearphishing attachment, drive-by compromise). In practice, the set A can be taken from a well-known knowledge base, such as MITRE ATT&amp;CK <ref type="bibr">(Barnum 2012)</ref>, which provides a comprehensive list of common adversarial techniques.</p><p>Incident Next, we let I Y &#8838; A denote the set of techniques that were used by threat actors in incident I; and for ease of presentation, we let I N = A \ I Y denote the set of techniques that were not used in incident I. A key aspect of cyber-forensic investigations is that I Y is not known by the forensic experts at the beginning of the investigation. Since an investigation is typically launched when a breach is detected, forensic experts might know some elements of I Y (e.g., techniques that triggered an intrusion detection system, leading to the detection of the breach); however, discovering set I Y is the very objective of the investigation. To capture this uncertainty that forensic experts face, we model the set of techniques I Y as a random variable, whose realization is unknown at the beginning of the investigation. The distribution of this random variable represents the threat actors' tactics, that is, how likely they are to use certain subsets of techniques in a cyber attack.</p><p>Prior Incidents In practice, we can estimate the distribution of I Y from prior incidents, which have already been investigated. Note that for this estimation, we can use a public repository of prior incidents, which were perpetrated by other threat actors against other targets (e.g., MITRE Cyber Threat Intelligence Repository from MITRE Corporation). While there are differences between how different threat actors operate, most threat actors do follow common tactics. Therefore, subsets of techniques that were frequently used in prior incidents are likely to have been used in the incident that we are currently investigating. We let I denote the set of prior incidents; and we assume that for each prior incident &#206; &#8712; I, we know the set &#206;Y &#8838; A and that &#206;Y was drawn from the same distribution as I Y .</p><p>Cyber-forensic Investigation During the investigation of incident I, forensic experts discover the realization of I Y step-by-step. In each step, the experts choose and investigate a technique a &#8712; A that they have not investigated yet, and they discover whether or not the threat actors used technique a. Note that we assume this discovery to be perfect (i.e., if a technique is investigated, experts correctly learn whether it was used or not); our model and computational approach could be generalized in a straightforward manner to account for false negatives and positives in discovery, but this is not the focus of our work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Costs and Benefits of Forensic Discovery</head><p>To investigate whether a particular technique was used by the threat actors, forensic experts have to spend time, effort, resources, etc. We let constant C a denote the cost of investigating technique a &#8712; A, which represents the time, effort, resources, etc. spent. In practice, we can estimate these costs based on domain experts' knowledge <ref type="bibr">(Nisioti et al. 2021a</ref>). On the other hand, discovering that a technique was used by the threat actors yields benefit since it provides information that we can use to prevent or mitigate future breaches. We let constant B a denote the benefit obtained from discovering that technique a &#8712; A was used in the incident. In practice, we can estimate these benefit values based on the impacts of using various techniques, which we can take from well-known knowledge bases (e.g., MITRE ATT&amp;CK <ref type="bibr">(Barnum 2012)</ref>). Some experts may inherently prefer investigating certain actions over others, e.g., because certain actions provide crucial information on how the adversary breached the system. All such preferences can be captured by the benefit values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Markov Decision Process</head><p>Based on the assumptions and notations introduced in the preceding subsection, we now model the cyber-forensic investigation of an incident as a Markov decision process (MDP) <ref type="bibr">(Puterman 2014)</ref>, whose steps correspond to the step-by-step investigation of the adversarial techniques.</p><p>Modeling the cyber-forensic investigation as an MDP provides the foundation for formulating our decision-support problem in the next subsection. To define an MDP, we have to specify the state space of the process, the set of actions that can be taken in each state, the transition probabilities between subsequent states, and the immediate reward for taking a particular action in a particular state. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>State Space</head><p>and</p><p>For ease of notation, we will let Pr[a | Y t , N t ] denote the first probability (i.e., probability that the threat actors used technique a given that the state is &#10216;Y t , N t &#10217;). In practice, we have to estimate these conditional probabilities based on the set of prior incidents I.</p><p>Rewards We formulate rewards to capture the benefits of the cyber-forensic investigation. The experts obtain a benefit B a from investigating technique a only if technique a was used in the incident. Hence, if the process transitions from state &#10216;Y t , N t &#10217; to state &#10216;Y t+1 , N t+1 &#10217; = &#10216;Y t &#8746; {a}, N t &#10217;, then we let the reward for step t be B a ; if the process transitions to state &#10216;Y t+1 , N t+1 &#10217; = &#10216;Y t , N t &#8746; {a}&#10217;, then we let the reward for step t be 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Cyber-forensic Decision-Support Problem</head><p>We represent the decision-support system as a policy &#960;, which maps a state &#10216;Y t , N t &#10217; to a recommended action a t &#8712; A \ (Y t &#8746; N t ) in each time step t. Our goal is to provide a policy that maximizes the expected rewards obtained during the forensic investigation. Following <ref type="bibr">Nisioti et al. (2021a)</ref>, we formulate this objective with a budget limit G on the total cost C at :</p><p>where T limit = max T T t=0 C at &#8804; G (i.e., T limit is the last step before the investigation budget G is exhausted), and C at is the cost of investigating the action that is chosen in time step t. Note that we focus on this objective because it is practical and enables a fair comparison with DISCLOSE <ref type="bibr">(Nisioti et al. 2021a)</ref>; in Equation ( <ref type="formula">8</ref>), we will provide a more conventional formulation with temporal discounting.</p><p>In practice, the budget may be flexible, in which case our goal is to provide a policy that attains a good cost-benefit tradeoff. We will quantify this tradeoff in our experiments using the area under the cost-benefit curve (i.e., AUC for the expected benefit as a function of the budget limit).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Computational Approach</head><p>To solve the decision-support problem based on the objective in Equation ( <ref type="formula">3</ref>), we propose a computational approach based on Monte Carlo tree search (MCTS) and k-nearest neighbour (k-NN) algorithms. Specifically, we implement the policy &#960; as an MCTS algorithm (Section 3.2), relying on k-NN for estimating transition probabilities (Section 3.1).</p><p>Note that in recent years, deep reinforcement learning (DRL) algorithms have garnered significant attention from researchers for their applications in cybersecurity decision support (e.g., <ref type="bibr">(Ganesan et al. 2016;</ref><ref type="bibr">Kurt et al. 2018)</ref>). While we could apply DRL algorithms to our problem in principle, they are ill-suited to our problem for multiple reasons. First, DRL algorithms tend to be sample inefficient (i.e., require a large number of training experiences to learn a performant policy), which poses a significant challenge since it is difficult to collect large datasets of prior incidents with sufficient details. Second, the limited size of the dataset enables us to make decisions based directly on the dataset, instead of having to train a parametric machine-learning model. By working directly with the dataset, decisions can take advantage of all the information in the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Probability Estimation</head><p>In a nutshell, Monte Carlo tree search finds which action to take in a given state by randomly sampling sequences of actions, simulating how they play out starting from the given state, and choosing the action that leads to the highest rewards on average. We can simulate the cyber-forensic investigation process from any given state using the statetransition probabilities (Equations ( <ref type="formula">1</ref>) and ( <ref type="formula">2</ref>)).</p><p>Empirical Probabilities However, since we do not know the underlying probability distribution in practice, we have to estimate the probabilities based on the set of prior incidents I. We can estimate the state-transition probability Pr[a | Y t , N t ] as the conditional empirical probability:</p><p>where</p><p>In other words, I &#10216;Yt,Nt&#10217; is the set of prior incidents which "match" the current state of the investigation, that is, the set of prior incidents in which threat actors did use all of the techniques from Y t but did not use any of the techniques from N t . Equation ( <ref type="formula">5</ref>) estimates the probability Pr[a | Y t , N t ] as the ratio of incidents from I &#10216;Yt,Nt&#10217; in which threat actors used technique a.</p><p>k-nearest Neighbors The weakness of this estimation is that its accuracy depends on the cardinality of the set I &#10216;Yt,Nt&#10217; , and as the sets Y t and N t grow with each step of the cyber-forensic investigation, the set I &#10216;Yt,Nt&#10217; shrinks. In fact, due to the limited number of prior incidents, the number of prior incidents that "match" the current state of the investigation may reach zero, which precludes us from calculating the estimates at all. To address this issue, we extend the set I &#10216;Yt,Nt&#10217; to include prior incidents that do not exactly "match" the current state of the investigation but are sufficiently similar. We measure similarity between the current state &#10216;Y t , N t &#10217; and a prior incident &#206; &#8712; I using a Hamming distance over the techniques Y t &#8746; N t that have already been investigated:</p><p>The first term of the right-hand side is the number of techniques that were used in incident I but not in prior incident &#206;, while the second term is the number of techniques that were used in prior incident &#206; but not in incident I. Equivalently, we could formulate the following metric: s(&#10216;Y t , N t &#10217;, &#206;) = Y t &#8745; &#206;Y + N t &#8745; &#206;N , which measures similarity by counting techniques where the current state and the prior incident are the same; in contrast, our formulation measures dissimilarity (specifically, Hamming distance) by counting techniques where the current state and the prior incident differ. These measures are practically equivalent since</p><p>Finally, we replace the set of "matching" prior incidents I &#10216;Yt,Nt&#10217; in Equation ( <ref type="formula">5</ref>) with the set of k prior incidents that are closest with respect to metric d (breaking ties arbitrarily). Notice that this estimation of the probability Pr[a | Y t , N t ] is actually a k-nearest neighbor regression with prior incidents I as the dataset, d as the distance metric, and a binary value representing if technique a was used in an incident as the output feature. While the value of k could be a constant hyper-parameter, we found that our approach performs better if k varies throughout the investigation. Hence, we let k = &#946; 1 + &#946; 2 &#8226; t, where &#946; 1 and &#946; 2 are hyper-parameters, which can find experimentally.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 1 Exploration Decision</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Monte Carlo Tree Search</head><p>Since Monte Carlo tree search randomly samples sequences of actions, its coverage of the state space becomes sparser and sparser as it looks further into the future. Therefore, when choosing between actions, it should assign more importance to the near future than the uncertain far future. We can express this consideration by reformulating the objective as maximizing the expected discounted sum of rewards:</p><p>where &#947; &#8712; (0, 1) is a temporal discount factor: rewards obtained at step t are discounted by a factor &#947; t .</p><p>Note that we also replace the reward B at with the benefit to cost ratio B at /C at . The rationale behind this is to incentivize the tree search to consider cost C a regardless of how far the investigation is from reaching a budget limit G; otherwise, the tree search would focus only on immediate benefit B a . Note that we found the ratio B at /C at formulation to work best in practice (e.g., better than the more intuitive semi-MDP formulation with &#947; t &#964; =0 Ca &#964; temporal discount and reward B at ).</p><p>In each step of the investigation, we run a Monte Carlo tree search (Algorithm 2), starting from the current state &#10216;Y t , N t &#10217;, which outputs an action a t that is estimated to result in the maximum expected discounted sum of rewards. To estimate the expected rewards for each action, the algorithm performs a number of iterations. In each iteration, it first generates a sequence of actions and states, starting from the current state, which is a random sample of how the investigation might play out if certain actions are selected. Then, in the back-propagation phase of the iteration, it updates its estimates of the expected rewards based on the experience of the sampled sequence. While our algorithm follows the common principles of MCTS, it is tailored to our problem and takes advantage of the specific rules of our MDP.</p><p>Variables and Initialization Here, we describe the variables that are maintained by the algorithm throughout the iterations and how they are initialized; we will describe later how they are updated.</p><p>Variable n[Y, N, a] is the number of times that the algorithm has tried action a in state &#10216;Y, N &#10217; (initialized to 0 at the beginning of the search). Note that this and all other initializations can be lazy, i.e., we can store these variables in a dictionary that is initially empty, and we can assign an initial value to a variable when we use it for the first time. For Algorithm 2 MCTS for Forensic Decision Support</p><p>ease of presentation, the pseudocode initializes all variables explicitly at the beginning.</p><p>Variable R[Y, N, a] is an estimate of the expected discounted sum of rewards if we started from state &#10216;Y, N &#10217;, took action a first, and then followed an optimal policy.</p><p>Finally, variable R[Y, N ] is an estimate of the expected discounted sum of rewards if we started from state &#10216;Y, N &#10217; and then followed an optimal policy. While these variables could be initialized to 0, we can improve the performance of the search by initializing them with an estimate:</p><p>Starting from state &#10216;Y, N &#10217;, we can investigate techniques A \ (Y &#8746; N ); however, we do not know the optimal order in which to investigate them. Our estimate calculates expected rewards assuming a random order: each technique a &#8712; A \ (Y &#8746; N ) is equally likely to be investigated first (discount factor 1), second (factor &#947;), third (factor &#947; 2 ), ..., and last (factor &#947; |A\(Y &#8746;N )|-1 ). Our estimate can be calculated very quickly, especially since the probability estimates Pr[a|Y t , N t ] need to be calculated anyway for the back-propagation phase. Note that for terminal states (i.e., when A \ (Y &#8746; N ) = &#8709;), this formula correctly calculates the expected rewards to be 0, following the notational convention that summation over an empty set yields 0.  <ref type="bibr">, a]</ref> gives preference to actions that should be selected becauseaccording to our current estimates-they lead to high expected rewards; while the second term gives preference to actions that have been explored less (i.e., tried fewer times) than other actions in the given state. The exploration factor M is a constant hyper-parameter that balances exploration and exploitation, which we can find experimentally. Note that we add 1 to the divisor to avoid division by zero since n[Y, N, a] is initialized to 0; however, this is just for ease of exposition, the addition of 1 is negligible as we run a large number of iterations. After selecting an action a i , we update the number of times n[Y i , N i , a i ] that action a i has been tried, and we simulate the random transition to state &#10216;Y i+1 , N i+1 &#10217;. However, instead of basing the transition on the actual probability Pr[a i |Y i , N i ], we select one of the two possible transitions with the same probability (i.e., 0.5 probability that a i was used, and 0.5 probability that it was not). We use uniform probabilities for two reasons. First, we factor in the actual probability Pr[a i |Y i , N i ] during back-propagation; hence, we obtain unbiased estimates regardless of the selection probabilities. Second, uniform probabilities lead to a more balanced and thorough exploration of the transitions; we found that with the actual probabilities, transitions that have high impact but low probability are not explored enough.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Selection and Expansion</head><p>We stop generating the sequence of states and actions when we reach either a terminal state (Y i &#8746; N i = A) or the depth limit (i &#8805; t + D, where D is a hyper-parameter).</p><p>Myopic Pruning To improve search performance, we prune the search tree during selection by exploring only those actions that are most attractive in terms of the expected immediate reward. Specifically, in state &#10216;Y, N &#10217;, Algorithm 1 explores only those F actions that have the highest</p><p>where F is a hyper-parameter. While myopically focusing on actions with the highest expected immediate reward may occasionally disregard optimal actions, it typically provides better results by letting the search thoroughly explore the most promising branches of the tree. Backpropagation During the back-propagation phase of each iteration, we loop over the sequence of states and actions backwards, and we update our estimates based on the experience of this sequence. For each action a i , we update the estimate R[Y i , N i , a i ] using the following formula: </p><p>This formula calculates a best estimate since in each state &#10216;Y i , N i &#10217;, we should take the optimal action a that maximizes the expected discounted sum of rewards (based on our best estimates R[Y i , N i , a]).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Numerical Evaluation</head><p>We evaluate our proposed approach numerically on public datasets of real-world cyber incidents. For each cyber incident, we simulate how our approach would have prioritized the investigation, and plot the benefit obtained through discovery as a function of the effort spent. We compare our approach to two baselines, DISCLOSE and a na&#239;ve policy.</p><p>Our implementation and datasets are publicly available. 1 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Experimental Setup</head><p>Dataset We use the MITRE ATT&amp;CK Enterprise repository <ref type="bibr">(Barnum 2012)</ref>, which is a public repository of adversarial tactics, techniques, &amp; procedures, referencing realworld cyber incidents in which some of these techniques were used. Since its original publication, both the ATT&amp;CK framework and its CTI dataset have been regularly updated. We use three versions of the repository in our evaluation: v6.3 ( <ref type="formula">2019</ref>), which is the version that <ref type="bibr">Nisioti et al. (2021a) used to evaluate DISCLOSE;</ref><ref type="bibr">v10.1 (2021);</ref><ref type="bibr">and v11.3 (2022)</ref>, which is the latest version at the time of writing. Evaluating our approach on multiple versions demonstrates that it can be applied to newer versions without any changes (other than standard hyper-parameter optimization).</p><p>For a fair comparison, we use the same 31 techniques A and the same benefit B and cost C values as <ref type="bibr">Nisioti et al. (2021a)</ref>. Since the categorization of techniques changed slightly between versions, we had to map some techniques to equivalent ones for later versions. This leaves us with 29 techniques for versions v10.1 and v11.3. The benefit and cost values are based on the Common Vulnerability Scoring System and interviews with cyber-forensic experts.</p><p>For each technique, we collected cyber incidents in which the technique was used via the external references field (replicating <ref type="bibr">Nisioti et al. (2021a)</ref>). Our v6.3, v10.1, and v11.3 datasets contain 331, 670, and 716 cyber incidents, respectively. Version v11.3 includes 425 incidents that are new compared to v6.3, and 73 incidents that are new compared to v10.1. In these datasets, every incident used at least 2 techniques. In dataset v11.3, incidents used 4.1 techniques on average (min. 2; max. 17). One example of an incident that used many techniques is the Frankenstein campaign <ref type="bibr">(Biasini 2019)</ref>, which employed Phishing, OS Credential Dumping, Exfiltration Over C2 Channel, and other techniques.</p><p>Baselines We compare our approach to two baselines, DISCLOSE (implemented as specified in <ref type="bibr">Nisioti et al. (2021a)</ref>) and a na&#239;ve baseline, which we call the static policy. At every step, the static policy selects the technique (from the set of techniques that have not been investigated yet) that is most frequent across all prior incidents, instead of considering conditional probabilities. This na&#239;ve baseline represents investigation without decision support (i.e., decisions that ignore the state).</p><p>Simulation Setup and Metrics Since the datasets are relatively small, we use a leave-one-out cross-validation: when evaluating a policy on an incident, we treat all other incidents in our dataset as prior incidents I. We always simulate the investigation starting with a single randomly chosen technique from I Y as the singleton set Y 0 (and letting N 0 = &#8709;). Same as <ref type="bibr">Nisioti et al. (2021a)</ref>, we terminate an investigation when the cumulative effort cost (i.e., a&#8712;Yt&#8746;Nt C a ) reaches 45 or 65 to assess which techniques could have been promptly discovered by the approach. We also conducted experiments with two more budget limits (70 and 100) and without a budget limit to provide a more comprehensive comparison (we include results in Appendix A.1). Further, to quantify promptness, we measure the area under the benefit-effort curve (AUCBE): we plot the discovery of benefits obtained as a function of the cumulative effort cost for three different approaches (e.g., see Figure <ref type="figure">1</ref>). Higher AUCBE values are better. Solid lines are averages over all incidents, dotted lines are 25% quantiles, and dashed lines are 75% quantiles.</p><p>Hyperparameter Optimization While the baseline does not have any hyper-parameters, our proposed approach has a number of them. First, we optimized the hyper-parameters for the k-NN probability estimation (&#946; 1 , &#946; 2 ) using a grid search, maximizing the average AUCBE over all incidents. During this search, we restricted the MCTS to one-action depth (D = 1, in which case the other parameters of the MCTS are irrelevant), providing us with good hyperparameters for probability estimation. Then, we optimized the hyper-parameters for MCTS using Hyperopt Python library, again maximizing AUCBE. Note that we optimized the hyper-parameters for datasets separately. We provide results from the hyper-parameter search in Appendix A.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Numerical Results</head><p>Running Time For a single decision, the MCTS algorithm takes less than 7 seconds on average using a single core of a 2.4GHz Intel Core i9 CPU, and less than a second using multiple cores. Compared to the amount of time that forensic analysts need to investigate the selected technique (at the very least minutes), these running times are negligible. <ref type="figure">1</ref> and<ref type="figure">2</ref> show cumulative benefits obtained during the cyber-forensic investigations (i.e., a&#8712;Yt B a ) as functions of the cumulative effort costs (i.e., a&#8712;Yt&#8746;Nt C a ) using MCTS, DISCLOSE, and static approach on the v6.3 dataset. We conducted the experiments with budgets 45 and 65. We provide results for other versions of the dataset and different budgets in Appendix A.1. Each curve is based on all incidents in the dataset. For the v6.3 dataset, our approach outperforms the baselines in both of the scenarios (45, and 65). The average AUCBE of MCTS, DISCLOSE, and static policy for budget limit 45 are 3,503, 3,411, and 3,175 respectively. The average AUCBE of our approach, DISCLOSE, and static policy for budget limit 65 are <ref type="bibr">6,288, 6,108, and 5,826, respectively.</ref> For the v11.3 dataset, our approach outperforms the baselines in both of the scenarios as well. The average AUCBE of MCTS, DISCLOSE, and static policy for budget limit 45 are 4, <ref type="bibr">061, 3,982, and 3,865, and for budget limit 65 are 7,072, 6,975, and 6,816</ref> respectively. This demonstrates that while our principled approach provides superior prioritization, the decision-support problem is very challenging due to the inherent uncertainty of cyber incidents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Benefits of Prioritization Figures</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Related Work</head><p>Several prior research efforts focused on enhancing the efficacy of forensic investigations by decreasing the time or resources required for an investigation without impacting its quality and objectivity <ref type="bibr">(Hossain et al. 2018</ref><ref type="bibr">(Hossain et al. , 2017;;</ref><ref type="bibr">Hossain, Sheikhi, and Sekar 2020;</ref><ref type="bibr">Hassan et al. 2020;</ref><ref type="bibr">Satvat, Gjomemo, and Venkatakrishnan 2021)</ref>. Our approach is primarily motivated by DISCLOSE <ref type="bibr">(Nisioti et al. 2021a</ref>), a data-driven decision-support framework, which creates TTP space using the MITRE ATT&amp;CK dataset, computes conditional probabilistic relations and proximity values between TTPs, and proposes actions based on these relations. DIS-CLOSE can assist an expert in each step of the investigation by considering benefits, costs, and available budget. We extend this work by introducing a formal model of the cyberforensic investigation process, modeling it as a Markov decision process. Further, we propose a computational approach which, at each step of the investigation, considers all the techniques that have been investigated, instead of considering only the very last technique-as DISCLOSE does. <ref type="bibr">Horsman et al. (2014)</ref> present CBR-FT, a case driven reasoning based technique for device triage. Similar to our approach, CBR-FT uses a knowledge base of past cases to calculate probable subsequent actions based on system file paths, which is then used to offer prediction of triage process of the current case under investigation. Unlike the system file paths, we use TTPs which allows for more flexible analysis and reasoning of adversarial behavior. Attack graphs were included in forensic investigation by <ref type="bibr">Lui et al. (2012)</ref> to guide the decision process. The notion of anti-forensic steps, parallel to other adversarial actions, were included in the attack graphs to represent that the adversary may hide relevant forensic evidence to dissuade the defender. Likewise, <ref type="bibr">Nassif and Hruschka (2012)</ref> propose the use of clustering techniques to support forensic investigation. Using a similar approach, Barrere et al. ( <ref type="formula">2017</ref>) proposed an algorithm using condensed attack graphs that transformed the structure of the original attack graph allowing for more effective exploration. Algorithms are also proposed to support forensic investigation of online social networks <ref type="bibr">(Arshad et al. 2020)</ref> and for unsupervised prediction for proactive forensic investigation of insider threats <ref type="bibr">(Wei, Chow, and Yiu 2021)</ref>. Similar to our work, these algorithms aim to increase the efficacy of forensic investigation by decreasing the investigation time; however, they approach the problem differently by decreasing the time required for crucial tasks. <ref type="bibr">Quick and Choo (2014)</ref> highlight the effects of large amounts of digital-forensic data on modern forensic investigations. <ref type="bibr">Saeed et al. (2020)</ref> and <ref type="bibr">Nisioti et al. (2021b)</ref> present game-theoretic approaches used to model interactions between an investigator and an adversary using antiforensic method. By finding the Nash equilibrium of the game, the authors find the optimal trade-off strategy for each player. Even though our work does not explicitly model antiforensic techniques and their analysis, our approach is developed considering the potential existence of anti-forensic techniques for an incident under investigation.</p><p>Defenders face a similar prioritization problem with sensitive intrusion-detection systems, which may raise a large number of false alarms <ref type="bibr">(Tong et al. 2020;</ref><ref type="bibr">Yan et al. 2019</ref><ref type="bibr">Yan et al. , 2018;;</ref><ref type="bibr">Laszka et al. 2017)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion and Conclusions</head><p>To address the limitations of DISCLOSE, we introduced a principled approach by modeling cyber-forensic investigation as Markov decision processes and proposing an MCTS and k-NN regression based computational approach. A key advantage of our proposed approach is that it works directly with the data (i.e., at every step and every iteration, probabilities are estimated based on the dataset of all prior incidents), instead of training a parametric machine-learning model. By relying on non-parametric machine-learning models, we aim to get as much "mileage" out of the data as possible, which is both enabled and necessitated by the limited amount of public data about cyber incidents. Another key advantage of our proposed approach is that the tree search approximates best estimates-and hence optimal decisions-based on the dataset (as the number of iterations increases).</p><p>While these advantages enabled our approach to outperform baselines, including DISCLOSE, we found the prioritization problem to be very challenging. The primary reason for this is the inherent difficulty of predicting how threat actors work. In fact, DISCLOSE itself is not too far from the na&#239;ve baseline policy. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A Appendix</head><p>A.1 Evaluation with Different Budgets </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2 Hyper-Parameter Search</head><p>For each budget limit and each dataset we performed hyperparameter optimization (&#946; 1 , and &#946; 2 ) using Myopic approach. For each dataset and different budget limit, there is a heatmap (see Figures 16 to 30). Each cell of heatmap shows AUCBE of average percentage of benefit attained up to the budget limit. &#946; 1 variable ranges from 1 to 130, and &#946; 2 ranges from 0 to 6 with 0.1 intervals. For the sake of better visualization, heatmaps are cropped at y-axis (&#946; 1 ) ranges 1 to 61, 1 to 101, and 1 to 121 for datasets v6.3, v10.1, and v11.3 respectively. For all heatmaps the y-axis shows &#946; 1 and x-axis shows &#946; 2 .          </p></div></body>
		</text>
</TEI>
