<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>An FDR-oriented approach to multiple sequential fault detection and isolation</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/01/2017</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10054666</idno>
					<idno type="doi">10.1109/ALLERTON.2017.8262795</idno>
					<title level='j'>Proceedings of the 55th Annual Allerton Conference on Communication, Control and Computing</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Jie Chen</author><author>Wenyi Zhang</author><author>H. Vincent Poor</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The problem of sequential fault detection and isolation in multiple data streams is considered. In this work, it is assumed that many independent parallel data streams, each of which has a (possibly infinite) change point, are sequentially observed with a maximum sampling constraint. The pre-change data follow a known distribution, and the post-change data follow one of J possible distributions. A sequential procedure is proposed to detect the changes for all data streams, and to isolate the types of changes upon their detection. The sequential procedure is shown to control the false discovery rate. An asymptotic upper bound on the average detection delay over the parallel data streams is also derived. A simulation study is presented to illustrate the proposed procedure and to corroborate the analysis.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. Introduction</head><p>The problem of sequential change detection is concerned with the situation in which there is an abrupt change at some time in a system that is monitored in real time. After the change, there are J &#8805; 2 distinct post-change distributions, only one of which is correct, and the goal is to detect the change and identify the correct one as soon as possible after the change occurs, subject to a false alarm and a false isolation constraint. Sequential change detection and isolation has many applications, including fault diagnosis in complex dynamical systems and industrial processes, environment surveillance and monitoring, and radar and sonar signal processing <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>. The general sequential change detection and diagnosis problem was first considered in <ref type="bibr">[4]</ref>, and subsequently studied in various works <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>.</p><p>The rapid development of sensing technology allows for the generation of large-scale real-time streaming data <ref type="bibr">[8]</ref>. Such data may involve multiple data streams subject to change points, which is the situation considered in this paper. More precisely, suppose that at each time t = 1, 2, . . ., we have observations of K parallel data streams, X (k)  t , k = 1, 2, ..., K. There is a change point t (k) &#8805; 1 for each of the streams, such that, before the change point, the stream is in a normal condition but after t (k) it is in an abnormal condition, and it is possible to have t (k) = &#8734; (i.e., there is no change in the stream). These K change points are unknown and possibly different, and the abnormal states are not known exactly. This setting motivates the study of change detection and isolation in multiple data streams.</p><p>Compared with its single-stream counterpart, multi-stream sequential change detection and isolation has been less well studied. To characterize the control of errors in multiple hypothesis testing, there are various error metrics, such as the false discovery rate (FDR) and the familywise error rate (FWER) <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref>, to evaluate the false-alarm performance. The FDR is the expected value of the proportion of falsely rejected hypotheses among all rejected hypotheses, while the FWER is the probability of rejecting any true null hypotheses. So FDR and FWER represent different methods of quantifying the frequency of false alarms (i.e., Type I errors) among multiple data streams. Procedures based on the control of FDR are more powerful in identifying alternative hypotheses than are FWER-based procedures <ref type="bibr">[8]</ref>, at the cost of increased error rates.</p><p>In this paper, we adapt the definition of FDR to handle both false alarm events and false isolation events. We propose a sequential procedure that achieves FDR control for the Bayesian multiple change detection and isolation problem that is similar to the Bayesian multiple change detection approach of <ref type="bibr">[10]</ref>. We impose a constraint on the maximum number of observed samples. With the advantage of FDR control, we can detect and isolate more changes subject to an error constraint than an FWER-based control procedure does. For the k-th data stream, the procedure updates a statistic S (k)  n after the n-th sampling. Then these statistics across the data streams are processed by a variant of Benjamini and Hochberg's procedure <ref type="bibr">[9]</ref> to yield detection and isolation results. To characterize the statistical properties of our proposed procedure, we establish a theoretical guarantee for the FDR and derive an asymptotic upper bound of the average detection delay (ADD). We further compare procedures that control the FDR and the FWER via simulation, and illustrate the advantage of the FDR control procedure in terms of smaller ADD.</p><p>The rest of this paper is organized as follows. Section II describes the Bayesian multiple change detection and isolation problem and the corresponding definition of FDR. Section III describes the detection and isolation procedure. Section IV develops the performance guarantee of the proposed pro-978-1-5386-3266-6/17/$31.00 &#169;2017 IEEE 629</p><p>Fifty-Fifth Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 3-6, 2017</p><p>cedure. Section V presents the simulation and numerical results. Finally, Section VI concludes this paper and discusses possible extension of this work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. Model and Formulation</head><p>Consider a situation in which there are K &#8805; 2 data streams:</p><p>Denote [K] = {1, . . . , K}. For each data stream k &#8712; [K], there is a change point t (k) &#8805; 1, which is assumed to be a random variable with a known prior distribution &#960; m = P{t (k) = m}. If t (k) = &#8734;, there does not exist a change point in the kth data stream. For a measurable space (&#8486;, F ), consisting of a sample space &#8486; and a &#963;-field F of events, we consider a family</p><p>of probability measures on (&#8486;, F ) that describes both the prior distribution of t (k) and the distribution of {X (k)  n |n = 1, 2, ...}, such that, under</p><p>are independent and identically distributed (i.i.d.) with a distribution H 0 , and</p><p>For simplicity, we use P (k)  &#960;, j to denote P (k) &#960;, j (k) . We assume a prior distribution p j on the change-type j. We further assume mutual independence among the K data streams, and that H 0 and H j have densities h 0 and h j , j = 1, ..., J, with respect to some measure P on (&#8486;, F ). Now, we consider imposing a constraint on the maximum sample size of each observed data stream; that is, we may observe at most N samples in each data stream, where N may be a very large but finite integer. Then our goal is to define a (multi-)detection-isolation rule</p><p>which means that a change is detected at time T (k) and determined to be of type &#309;(k) for the kth data stream, such that the FDR is no greater than some given value &#945; &#8712; (0, 1). Note that T (k) = N means that no change has been detected. We define the FDR as</p><p>where E &#960;,p {&#8226;} denotes expectation under P (k) &#960;, j k &#8712; [K], j &#8712; [J] and change-type distribution p, " R&#8744;1" denotes the maximum between R and 1, R is the number of change points declared, i.e., the size of the subset of [K] such that T (k) &lt; N, and V is the number of false alarm or false isolation events, i.e. the size of k) . For example, suppose we have ten data streams, six of which possess finite change points, and we have seven stopping times with T (k) &lt; N, among which six of them have T (k) &#8805; t (k) , but one of these six results is not isolated correctly, i.e. &#309;(k) j (k) . As a result, V/{ R &#8744; 1} should be 2/7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. Detection-Isolation Procedure</head><p>For each data stream k &#8712; [K], we define a statistic according to <ref type="bibr">[5,</ref><ref type="bibr">Thm. 6</ref>]</p><p>in which m &#945; is a window size and</p><p>Let</p><p>As in <ref type="bibr">[5]</ref>, we assume that m &#945; satisfies</p><p>Given a prescribed parameter &#945; &#8712; (0, 1), we construct, for each k &#8712; [K], threshold values</p><p>such that when S (k)</p><p>To construct the thresholds U (k) s to satisfy (6), according to [5, Thm. 7], we have</p><p>Here T (k) , &#309;(k) is attained by the procedure DI-FDR described in Algorithm 1. Then we have</p><p>where the first inequality follows from an argument similar to the proof of <ref type="bibr">[5,</ref><ref type="bibr">Thm. 7]</ref>, and the second inequality is from the condition we set on the distribution of change points in Theorem 1 in Section IV-A and the condition we set on the window size (4). Then we set</p><p>Thus we can get the thresholds U (k) s as we fix the window size m &#945; and U (k)  s &#8764; | log(&#945;/K)| as K &#8594; &#8734;. Let I q denote the indices of active data streams (i.e., data streams that have not been stopped) at the beginning of the qth stage of sampling, and let n q denote the accumulated sample size of active data streams at the end of the qth stage of sampling. Clearly I 1 = [K] and n 0 = 0. Algorithm 1 describes the qth stage of sampling (q = 1, 2, ...).</p><p>Algorithm 1 DI-FDR 1) Sample the active data streams {X (k)  n } k&#8712;I q ,n&gt;n q-1 , and update the statistics {S (k)  n } k&#8712;I q ,n&gt;n q-1 using (3). 2) Sort the statistics in ascending order as S (i(n,l)) n , where i(n, l) denotes the index of the lth ordered active detection statistic at sample size n.</p><p>3) Continue until n equals</p><p>(11) a) If n q &lt; N, then stop sampling and declare change points at n q for the following data streams:</p><p>, where</p><p>Declare the corresponding change types as</p><p>respectively. Update I q+1 to include the indices of the remaining active data streams and proceed to the (q + 1)th stage; if |I q+1 | = 0, then stop. b) Otherwise, n q = N so declare all the remaining active data streams to have no change points, and stop.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. Properties of DI-FDR</head><p>In this section we analyze statistical properties of the proposed parallel detection-isolation procedure DI-FDR in Algorithm 1. We consider two performance metrics: (i) the FDR; and (ii) the ADD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. False Discovery Rate</head><p>The following result, Theorem 1, states that DI-FDR controls FDR. Theorem 1. Let I * = min 1&#8804; j&#8804;J min 0&#8804;g&#8804;J,g j I jg &gt; 0 and let the change-point distribution be such that log</p><p>as K &#8594; &#8734;. Then for a given 0 &lt; &#945; &lt; 1, as K &#8594; &#8734;, the procedure DI-FDR satisfies</p><p>So if we choose N to satisfy (K log K) N &#8594; 0 as K &#8594; &#8734;, we have</p><p>Proof. The proof is similar to that of [10, Thm. 1]. Here we provide only the difference between the proofs. We use the following result about the expected delay in the fault detection-isolation problem from <ref type="bibr">[5]</ref>.</p><p>&#8764; (1 + o( <ref type="formula">1</ref>))| log &#945;| min 0&#8804;g&#8804;J,g j I jg as &#945; &#8594; 0. ( <ref type="formula">16</ref>)</p><p>Hence, we have</p><p>where I * = min 1&#8804; j&#8804;J min 0&#8804;g&#8804;J,g j I jg , and o(1) tends to 0 as K &#8594; &#8734; in our situation. Then we can take this result into the proof of <ref type="bibr">[10,</ref><ref type="bibr">Thm. 1]</ref> to establish Theorem 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Average Detection Delay</head><p>We are interested in the expected number of additional observations after the change point required for detection and isolation. We consider the following ADD function as the average of ADDs for those data streams detected:</p><p>According to (17), we have the following result providing an asymptotic upper bound on the ADD.</p><p>Theorem 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ADD(FDR</head><p>The proof is similar to that of <ref type="bibr">[10,</ref><ref type="bibr">Thm. 3</ref>].</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. Numerical Experiments</head><p>In this section we perform a Monte Carlo experiment to illustrate our procedure proposed in Section III, and compare the performance of the proposed procedure with another procedure, which will be introduced shortly below.</p><p>Before presenting the setup of the experiment, we first introduce FWER control procedures. The FWER is defined as the probability of rejecting any true hypothesis in multiple hypothesis testing problems. In our problem, this means the probability of false alarm or false isolation on any data streams. To achieve an FWER no greater than &#945; within K data streams, Bonferroni's procedure <ref type="bibr">[8]</ref> requires the error probability in <ref type="bibr">(6)</ref> on each data stream to be no greater than &#945;/K.</p><p>We describe the qth stage of this FWER-control procedure in Algorithm 2.</p><p>Algorithm 2 DI-FWER 1) Sample the active data streams {X (k)  n } k&#8712;I q ,n&gt;n q-1 , and update the statistics {S (k)  n } k&#8712;I q ,n&gt;n q-1 using (3). 2) Continue until n equals</p><p>(20) a) If n q &lt; N, then stop sampling and declare change points at n q for those data streams that satisfy (20). Declare the corresponding change types as</p><p>respectively. Update I q+1 to include the indices of the remaining active data streams and proceed to the (q + 1)th stage; if |I q+1 | = 0, then stop. b) Otherwise, n q = N so declare all the remaining active data streams to have no change points, and stop.</p><p>To make our comparison fair, we set the same upper bound for FDR and FWER as &#945; = 0.1. As for the prior distributions of the change points, we set the probability of infinite change points as &#960; &#8734; , and assume that the finite change points each follows a (conditional) geometric distribution with p = 0.05 (that is, the mean change point on each stream is 20). We choose the distribution of the post-change type as a uniform distribution on {1, ..., J}. The maximum sample size is set as 2000. We consider i.i.d N(&#181;, 1) data streams with pre/postchange distributions:</p><p>We consider four configurations for comparison as follows:  Tables I-IV show the detection and isolation results with different configurations. DI-FDR and DI-FWER satisfy the upper bound &#945;, notably much less than the prescribed value. We can see that a decrease in {I jg |1 &#8804; j &#8804; J, 0 &#8804; g &#8804; J, g = j} will increase the FDR/FWER and ADD, which is consistent with intuition. From Table <ref type="table">III</ref>, we see that an increase in the probability of infinite change points can increase the FDR/FWER significantly. Because of the assumption on the distribution of change points in (14), a larger probability of infinite change points can worsen the FDR control. From Table <ref type="table">IV</ref>, we note that an increase in the number of post-change types J decreases the FDR/FWER. Because the threshold values increase with J, and according to <ref type="bibr">(7)</ref>, the error probability is the worst case probability, we can have a smaller error probability in practice. This means that when J is large, we may set smaller threshold values in order to control the error at the same level with a better detection delay.</p><p>As for ADD, the performance of the FDR control pro-   cedure is always better than that of the FWER control procedures. With an increase in the number of data streams K, the ADD is not visibly affected by K in DI-FDR, but tends to grow logarithmically with K in DI-FWER (see Fig. <ref type="figure">1</ref>). In this simulation, when J changes from 2 to 3, the minimum Kullback-Leibler divergence I * does not change, but the ADD still get worse. Recall that the threshold values increase with J, and for different change types the expected detection delay is different. The ADD is an average expected detection delay over different change types, so it will get worse.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. Conclusion</head><p>In this paper, we have considered a multiple change detection-isolation problem under a maximum sampling constraint. We have applied the FDR metric to evaluate the performance of the detection-isolation procedure, with a modification of the definition of FDR, and we have proposed a sequential procedure that controls the FDR. For statistical properties, we have established an upper bound on the FDR, and derived an asymptotic upper bound on the ADD. Moreover, we have shown that the proposed procedure is superior to procedures controlling the FWER in terms of ADD. Although our problem has been developed within a Bayesian setting, we have not made use of the prior distribution of changes in the detection-isolation procedure. So there may be other Bayesian statistics for change detection and isolation with improved statistical properties. A further analysis along that direction is work in progress.</p></div></body>
		</text>
</TEI>
