<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A False Discovery Rate Oriented Approach to Parallel Sequential Change Detection Problems</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10194694</idno>
					<idno type="doi">10.1109/TSP.2020.2977466</idno>
					<title level='j'>IEEE Transactions on Signal Processing</title>
<idno>1053-587X</idno>
<biblScope unit="volume">68</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Jie Chen</author><author>Wenyi Zhang</author><author>H. Vincent Poor</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The problem of sequentially detecting changes in parallel data streams is formulated and investigated. Each data stream may have its own change point at which the underlying probability distribution of its data changes, and the decision maker needs to declare, sequentially, which data streams have passed their change points. With a large number of parallel data streams, the error metric is the false discovery rate (FDR), which is the expected ratio of the number of falsely declared data streams to the total number of declared data streams. A data stream is falsely declared if the detected change point is ahead of its actual change point. Decision procedures that are guaranteed to control the FDR level are developed, and it is also shown that the average decision delays (ADDs) of these decision procedures do not grow with the number of data streams. Numerical simulations and case studies are conducted to corroborate the analytical results, and to illustrate the utility of the decision procedures.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>A S A fundamental breakthrough beyond the classical fixed- sample-size (FSS) binary hypothesis testing, sequential hypothesis testing, in which the decision maker is allowed to sequentially observe the data and make his/her decision at a randomly selected time, has been extensively studied since the seminal work by Wald <ref type="bibr">[3]</ref>. For independent and identically distributed (i.i.d.) data, compared with the FSS Neyman-Pearson test <ref type="bibr">[4]</ref>, the sequential probability ratio test (SPRT) is superior in terms of the average sample size required for achieving the same error performance <ref type="bibr">[5]</ref>.</p><p>An important topic in the theory of sequential analysis is change detection. Suppose that the decision maker is monitoring a stochastic system, in which at some unknown time there is an abrupt change of the underlying probability distribution of the system state. Change detection concerns about methods for sequentially detecting the occurrence of the change point, and the goal is to minimize the delay between the actual change point and the declared time, subject to certain constraint on the risk of false alarms <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>. In the Bayesian formulation, a prior probability distribution of the change point is imposed and exploited when designing the decision procedure; see, e.g., <ref type="bibr">[8]</ref>. In the non-Bayesian (or minimax) formulation, there is no prior knowledge about the change point and the design objective is to optimize the performance under the worst-case scenario; see, e.g., <ref type="bibr">[9]</ref> where a decision procedure based on the cumulative sum (CUSUM) statistic <ref type="bibr">[10]</ref> was shown to be optimal in an asymptotic sense, and <ref type="bibr">[11]</ref> where the CUSUM decision procedure was further shown to be exactly optimal.</p><p>In basic formulations of change detection problems, the decision maker usually only needs to treat a single data stream. But in many modern applications, the rapid development of sensing technology allows for the generation of large-scale real-time streaming data <ref type="bibr">[12]</ref>, corresponding to a large number of parallel data streams. Detecting the change points in such parallel data streams is the situation considered in our work here.</p><p>With multiple data streams, a formulation different from ours is that an unknown subset of the data streams have a common change point. This setting is mainly motivated by surveillance applications, in which a system is monitored by multiple sensors and at an unknown time a disruption leads to a change in the observations of a subset of deployed sensors; see, e.g., <ref type="bibr">[13]</ref>- <ref type="bibr">[19]</ref>. In contrast, our formulation considers the scenario where each data stream has its own change point, and furthermore the data streams as well as their change points are mutually independent. This setting can be suitable for systems whose different components are essentially independent.</p><p>The formulation that a decision maker sequentially detects independent change points for parallel independent data streams may seem uninteresting at the first glance, since one may wonder that this simply boils down to a number of separate single-stream change detection procedures. Such a formulation, however, becomes interesting when the number of data streams becomes large, and the choice of error metric then turns out to be a key factor.</p><p>Let us start with the formulation of making K hypothesis tests given K observations, each corresponding to one of the tests. For any given decision procedure, Table <ref type="table">I</ref> categorizes the outcomes of the K tests. A direct definition of the error metric is the familywise error rate (FWER) <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref>, which is the probability of rejecting any null hypothesis, i.e., Pr{V &#8805; 1}. A TABLE I OUTCOMES OF K PARALLEL HYPOTHESIS TESTS simple decision procedure that guarantees the FWER no greater than &#945; is Bonferroni's procedure, which essentially performs K separate hypothesis tests, each of which guarantees the probability of rejecting a null (i.e., Type I error probability) no greater than &#945;/K. This requirement is very stringent for detecting weak signals when K is large, and decision procedures that control the FWER generally have very low detection power. This motivates the proposal of an alternative error metric called the false discovery rate (FDR) <ref type="bibr">[22]</ref>, <ref type="bibr">[23]</ref>, which is the expected proportion of falsely rejected hypotheses to all rejected hypotheses, i.e., FDR = E[ V R&#8744;1 ]. Compared with FWER-oriented decision procedures, decision procedures that control the FDR are more powerful in identifying alternative hypotheses <ref type="bibr">[12]</ref>, at the cost of increased error rates. The most widely known FDR-oriented decision procedure is the Benjamini-Hochberg (BH) procedure <ref type="bibr">[23]</ref>. These two error criteria have been widely and successfully used in applications with large-scale data streams or when many comparisons are needed, notably biological and medical signal processing, including high throughput gene and protein expression data, brain imaging, and clinical trials; see <ref type="bibr">[12]</ref> for a variety of examples. There have been a few studies about the application of FDR and FWER in sequential testing of multiple hypotheses; see, e.g., <ref type="bibr">[24]</ref> and <ref type="bibr">[25]</ref>. Therein the focus is mainly about extending the metrics of FDR and FWER to sequential multiple hypotheses testing problems, rather than change detection. Some techniques in <ref type="bibr">[24]</ref> have inspired our current work on sequential change detection of multiple data streams. As for the application to sequential change detection over multiple data streams, <ref type="bibr">[26]</ref> considered a setting with multiple changes between two known distributions and extended the setting to multiple data streams. The work established an FDR control, but did not analyze average detection delay within multiple data streams.</p><p>Returning to our problem of sequential change detection over parallel data streams, a rejection corresponds to declaring a change point for a data stream, and a false rejection, i.e., a rejection of a null, corresponds to declaring a change point before the actual change point of the data stream. Thus the FWER is the probability of making an early change decision for at least one of the data streams. Similar to the multiple hypothesis testing scenario in the previous paragraph, to guarantee a prescribed level of the FWER, the change detection for each data stream is highly strict in declaring a change, in turn leading to excessively large decision delays when the number of data streams is large. To remedy this, we propose to use the FDR as the error metric, which quantifies the expected ratio between the number of false rejections and the total number of rejections. As will be shown in this work, FDR-oriented decision procedures substantially reduce the average decision delay (ADD) compared with FWER-oriented decision procedures. In fact, for FDR-oriented decision procedures, the ADD does not grow with the number of data streams, in sharp contrast with FWER-oriented decision procedures for which the ADD grows logarithmically with the number of data streams.</p><p>The main contributions of this work are as follows. 1) We formulate the problem of change detection for parallel data streams, extend the FDR error metric to this problem, and develop the corresponding decision procedure, which is motivated by the classical BH procedure. 2) For the developed FDR-oriented decision procedure, we establish a theoretical guarantee for the FDR. Furthermore, we obtain the asymptotic behavior of the ADD as the number of data streams grows large, which does not grow with the number of data streams; in contrast, the ADD of FWER-oriented decision procedures grows logarithmically with the number of data streams. 3) To cope with the scenario where the post-change statistics are under multiple possible hypotheses, we extend the problem formulation of change detection to change detection and isolation, for parallel data streams, and develop the corresponding decision procedures and performance guarantees. 4) We test the developed decision procedures for simulated data sets as well as an application related to cognitive radios, to corroborate the analytical results, and to illustrate the utility of the decision procedures. In our earlier work <ref type="bibr">[27]</ref>, we considered the non-Bayesian minimax problem formulation where no prior knowledge about change points is assumed, and developed a decision procedure which does not take into account truncation of data. In the current work, we consider the Bayesian problem formulation, and take into account the effect of truncation when analyzing the performance.</p><p>The remaining part of this paper is organized as follows. Section II describes the system model, formulates the multiple change detection problem, and introduces the key definitions. Section III presents the decision procedures that aim at controlling the FDR and the FWER, respectively. Section IV establishes the FDR guarantee and the asymptotic behavior of the ADD. Section V extends the problem formulation, decision procedures, as well as their analysis to the multiple change detection and isolation problem. Section VI presents the simulation results, and Section VII presents a case study where we apply the developed decision procedures to the multichannel dynamic spectrum access problem for cognitive radios. Finally Section VIII concludes this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. PROBLEM SETUP</head><p>Consider K &#8805; 2 parallel data streams:</p><p>(1) Denote the data at time epoch n by X [K]  n = (X (1)  n , X (2)  n , . . . , X (K ) n ). For the k-th data stream, there exists a time epoch t (k) &#8805; 1 called its change point. In particular, we allow t (k) = &#8734;; that is, the underlying statistics of a data stream may never change, and thus a data stream may not have any (finite) change point.</p><p>For a measurable space ( , F), consisting of a sample space and a &#963; -field F of events, consider a family</p><p>of probability measures to describe the prior distribution of t (k)  and the distribution of</p><p>1 , X (k) 2 , . . . , X (k) t (k) -1 are i.i.d. with a (pre-change) distribution h 0 , and X (k)  t (k) , X (k) t (k) +1 , . . . are i.i.d. with another (post-change) distribution h 1 and independent of</p><p>In Section V we will address detection and isolation problems where there are multiple possible postchange distributions. We assume mutual independence among the K data streams, and that h 0 and h 1 are probability densities with respect to some measure P on ( , F).</p><p>Define the Kullback-Leibler (KL) divergence as</p><p>the mean change point as</p><p>which is independent of the index k since we have assumed that all streams have the same change point prior distribution (cf. Remark 4 in Section IV), and a parameter d as</p><p>in which for prior distributions with exponential tails d &gt; 0, and for prior distributions with heavy tails d = 0. Suppose that a statistician sequentially observes the K parallel data streams and makes decision regarding whether changes have occurred for these data streams, up to some deadline N. To be concrete, the statistician keeps observing X [K]  n starting from n = 1, sequentially, until at a certain time epoch T 1 , declaring that a change has occurred for a certain data stream indexed by D 1 . Subsequently, the statistician keeps observing X [K]\{D 1 } n starting from n = T 1 + 1, with the D 1 -th data stream excluded in the observation, until at a certain time epoch T 2 , declaring that a change has occurred for a certain data stream indexed by D 2 ... Such a procedure is executed until either changes have been declared for all the K data streams, or the time epoch n has reached the deadline N and the statistician declares no change for the remain data streams.</p><p>We can formalize the above decision procedure as follows.</p><p>Definition: A decision procedure is a multiple stopping rule R consisting of a sequence of stopping times and decisions, as</p><p>where</p><p>Regarding the stopping times and decisions,</p><p>n 1 , and, in a recursive way, for general k &#8712;</p><p>We can also define the stopping time for every data stream. Definition: For k &#8712; [K], denote by T (k) the stopping time for the k-th data stream, so that</p><p>We then define the FDR and the FWER of a decision procedure R as follows.</p><p>Definition: Denote by R the number of change points declared by R, i.e., the number of elements in {T (k) , k &#8712; [K]} satisfying T (k) &lt; N, and by V the number of falsely declared change points, i.e., the number of elements in</p><p>where a &#8744; b represents the maximum between a and b, and E &#960; represents the expectation with respect to</p><p>Besides the FDR and the FWER, we are also interested in the ADD of a decision procedure. This is the average number of additional samples before declaring a change point after that change point, normalized by the number of data streams with a finite change point. We denote by K &#8734; the number of data streams without a finite change point, which is a binomial random variable Bin(K, &#960; &#8734; ), with probability mass function</p><p>Formally, we have the following definition.</p><p>Definition: For a decision procedure, the ADD is defined as</p><p>where ADD k &#8734; is the conditional ADD upon K &#8734; = k &#8734; , which can be written as</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. DECISION PROCEDURES</head><p>For the k-th data stream, k &#8712; [K], we define a sequence of statistics</p><p>where</p><p>n is the likelihood ratio between the hypotheses that the change occurs at t (k) &#8804; n and that the change occurs at t</p><p>which can be applied for practical implementation. First we describe a decision procedure called MD-FDR that aims at controlling the FDR. Fix a parameter &#945; &#8712; (0, 1), and set an array of thresholds as</p><p>For convenience of description, we divide the operation of a decision procedure into multiple stages. At the beginning, the time epoch n = 1, the decision procedure starts with the first For MD-FDR, set thresholds as <ref type="bibr">(12)</ref>; for MD-Hochberg, set thresholds as <ref type="bibr">(15)</ref>. 1) Initially set q = 1, I q = [K], n = 1, and n 0 = 0.</p><p>2) Sample the active data streams {X (k)  n } k&#8712;I q ,n&gt;n q-1 , and update the statistics {G (k)  n } k&#8712;I q ,n&gt;n q-1 using (11). 3) Sort the statistics in ascending order as {G (i(n,l )) n }, where i(n, l ) denotes the index of the l-th ordered statistic at time epoch n. 4) Repeat 2)-3) with n increased by 1 each time, until n equals</p><p>a) If n q &lt; N, stop the q-th stage and declare change points at n q for the following data streams:</p><p>Update I q+1 to exclude the indices of these declared data streams. If |I q+1 | = 0, then stop; otherwise, set q = q + 1, and go to 2) to begin the next stage. b) Otherwise, n q = N, declare that all the active data streams in I q have no change points, and stop.</p><p>stage, and whenever at least a stopping time is reached and an associated decision for a data stream is declared, the current stage ends and a new stage begins from the next time epoch. We use I q to denote the indices of active data streams (i.e., the data streams that have not been stopped) at the beginning of the q-th stage, and use n q to denote the time epoch at the end of the q-th stage. Initially we set I 1 = [K] and n 0 = 0. We describe the decision procedure MD-FDR in Algorithm 1, which is also illustrated in Fig. <ref type="figure">1</ref>.</p><p>For comparison, we also consider two decision procedures that control the FWER. Fix &#945; &#8712; (0, 1). The decision procedure based on Bonferroni's procedure, denoted as MD-Bonferroni, has a single threshold Q = K/&#945; -1, and is described in Algorithm 2. The decision procedure based on Hochberg's procedure <ref type="bibr">[29]</ref>, denoted as MD-Hochberg, is similar with MD-FDR, but has a different array of thresholds as</p><p>The detailed description of MD-Hochberg is in Algorithm 1.</p><p>Algorithm 2: MD-Bonferroni. 1) Initially set q = 1, I q = [K], n = 1, and n 0 = 0.</p><p>2) Sample the active data streams {X (k) n } k&#8712;I q ,n&gt;n q-1 , and update the statistics {G (k)  n } k&#8712;I q ,n&gt;n q-1 using (11). 3) Repeat 2) with n increased by 1 each time, until n equals</p><p>(16) a) If n q &lt; N, stop the q-th stage and declare change points at n q for data streams that satisfy <ref type="bibr">(16)</ref>. Update I q+1 to exclude the indices of these declared data streams. If |I q+1 | = 0, then stop; otherwise, set q = q + 1, and go to 2) to begin the next stage. b) Otherwise, n q = N, declare that all the active data streams in I q have no change points, and stop.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. ANALYSIS OF DECISION PROCEDURES</head><p>In this section, we analyze the statistical properties of the proposed decision procedure MD-FDR. On one hand, we show that MD-FDR controls the FDR. On the other hand, we compare the ADDs achieved by MD-FDR and by FWER-oriented decision procedures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. False Discovery Rate</head><p>The following result, Theorem 1, states that the proposed decision procedure MD-FDR controls FDR under our problem formulation.</p><p>Theorem 1: Suppose that 0 &lt; I &lt; &#8734;, and t &lt; &#8734;. For any given 0 &lt; &#945; &lt; 1, the decision procedure MD-FDR in Section III satisfies</p><p>where o(1) &#8594; 0 as K &#8594; &#8734;. Proof: We prove Theorem 1 in two parts. In the first part, we consider a decision procedure similar to MD-FDR but without the deadline N, for which we prove that such a decision procedure controls the FDR at the level &#945;. In the second part, we impose the deadline N and estimate the extra cost upon the FDR.</p><p>Part 1:</p><p>As we remove the deadline N in MD-FDR (i.e., setting N = &#8734;), we adjust the definition of the FDR as</p><p>where R is the number of change points declared, i.e., the number of elements of [K] such that T (k) &lt; &#8734;, and V is the number of falsely declared change points, i.e. the number of elements of [K] such that T (k) &lt; t (k) . Let K 1 be the number of data streams satisfying</p><p>For these events, we have</p><p>In order to prove <ref type="bibr">(21)</ref>, consider for each data stream, say, the kth one, a stopping time with respect to the observations</p><p>. . as follows:</p><p>The following result is standard and can be found in several references, e.g., <ref type="bibr">[28,</ref><ref type="bibr">Sec. 4</ref>]. Lemma 2: For any A &gt; 0, P &#960; (T (k) &lt; t (k) ) &#8804; 1/(A + 1). Hence, by setting A = Q s = K/(s&#945;) -1 when defining T (k) , we have</p><p>thus proving <ref type="bibr">(21)</ref>.</p><p>and u data streams correctly stopped}</p><p>Note that the sets {V &#969; v,u } in the union V v,u are mutually disjoint. We claim that</p><p>The following proof of ( <ref type="formula">24</ref>) is modified from an argument in <ref type="bibr">[24]</ref>.</p><p>For any outcome in V &#969; v,u , the k-th data stream is stopped early, at some stage q. By ( <ref type="formula">14</ref>) in the decision procedure MD-FDR, k = i(n q , l ) for some l &#8805; l q , so</p><p>Since Kl q + 1 is the number of stopped data streams until stage q, this value is no greater than the total number of stopped data streams in</p><p>by noting that the thresholds {Q s } are decreasing with s. This thus shows that V &#969; v,u &#8838; W k,v+u for any k &#8712; &#969;. With (24) established, we now follow the argument of <ref type="bibr">[30]</ref>, with a few modifications suitable for our problem formulation:</p><p>Here, ( <ref type="formula">27</ref>) is because that the sets {V &#969; v,u } in the union V v,u are mutually disjoint, ( <ref type="formula">28</ref>) is due to <ref type="bibr">(24)</ref>, and ( <ref type="formula">29</ref>) is due to the definition of v .</p><p>Using <ref type="bibr">(29)</ref>, we have that conditioned upon K 1 , the corresponding (conditional) FDR &#8734; is</p><p>Define U v,u,k to be the event that, conditioned upon the kth data stream being stopped early, some other v -1 data streams are stopped early and some u data streams correctly stopped.</p><p>For any k, U 1,k , . . .,U K,k partition the sample space. Then starting from (30), we have</p><p>Since the data streams are mutually independent, the event of the kth data stream being stopped early and the event of U s,k are independent, and we have</p><p>where ( <ref type="formula">32</ref>) is due to <ref type="bibr">(23)</ref>. Because the bound (33) holds for any</p><p>Now we turn to the decision procedure MD-FDR with a finite N and analyze the relationship between FDR &#8734; in <ref type="bibr">(19)</ref> and FDR in <ref type="bibr">(6)</ref>. With N imposed, the number of change points declared, R, is clearly no greater than R. Denote R = R + , where is the size of {T (k) |N &#8804; T (k) &lt; &#8734;}, which can be further divided into two parts. The first part has a size of</p><p>}|, which is the reduction of the number of false alarm events, and the second part has a size</p><p>where P &#960;/&#8734; (&#8226;) means that this is the conditional probability P &#960; (&#8226;|t (k) &lt; &#8734;), and the last inequality is due to that 2 satisfies</p><p>Then we choose N to control P &#960;/&#8734; ( 2 &#8805; 1). Let us write</p><p>Following the argument in <ref type="bibr">[28,</ref><ref type="bibr">Thm. 4</ref>], when 0 &lt; I &lt; &#8734; holds, for every k &#8712; [K], we have</p><p>(36) where o(1) &#8594; 0 as K &#8594; &#8734;. Applying Markov's inequality, we have</p><p>Back to <ref type="bibr">(35)</ref> and ( <ref type="formula">34</ref>), we have</p><p>This completes the proof of Theorem 1.</p><p>In our problem formulation, we explicitly incorporate the deadline N and establish the FDR control with the regularity conditions on the prior distribution of change point and the KL divergence between the pre and post change distributions. If we remove the deadline, the FDR is controlled without requiring such conditions, as already proved in the first part of Theorem 1. Also we consider the simple case where all the parallel data streams are mutually independent. Following the approach in <ref type="bibr">[30]</ref>, it is possible to provide some degree of FDR control even when the data streams are correlated, e.g., satisfying the so-called positive regression dependence on a subset. A systematic investigation along that direction is an interesting topic for future research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Average Decision Delay</head><p>For the three decision procedures considered in Section III, we have the following result about the asymptotic behaviors of their ADDs.</p><p>Theorem 3: For the three decision procedures such that the FDR (for MD-FDR) and the FWER (for MD-Bonferroni and MD-Hochberg) are controlled by &#945; &gt; 0, if &#960; &#8734; &gt; 0 holds, as K grows without bound, and N grows with K, we have:</p><p>Proof: First, consider MD-FDR which controls the FDR. According to <ref type="bibr">[28,</ref><ref type="bibr">Thm. 4]</ref>, if the kth data stream is stopped by threshold Q s = K/(s&#945;) -1, we have</p><p>Authorized licensed use limited to: Princeton University. Downloaded on September 27,2020 at 00:03:29 UTC from IEEE Xplore. Restrictions apply.</p><p>where o(1) &#8594; 0 as K &#8594; &#8734;. To have the FDR controlled by &#945;, as in the proof of Theorem 1, we make N as a function of K scale to satisfy (K log K ) N &#8594; 0 as K &#8594; &#8734; and thus (37) goes to 0. Then, we have</p><p>According to the definition of the ADD, we can derive its lower bound as follows:</p><p>where the inequality in the second line is due to (42) and that for a fixed k &#8734; , Q K-k &#8734; is the smallest threshold.</p><p>On the other hand, we can derive its upper bound as follows:</p><p>where the inequality in the second line is due to (42) and that the worst case of the thresholds occurs when the thresholds are ascending and different, and the equality in the last line is due to Stirling's approximation. Combining the lower and the upper bounds we have ADD = O(1) which does not grow with K.</p><p>Then, consider MD-Bonferroni and MD-Hochberg which control the FWER. We have for MD-Bonferroni,</p><p>where the equality in the second line is due to that MD-Bonferroni uses a single threshold Q = K/&#945;; and for MD-Hochberg,</p><p>where the inequality in the second line is similar to that in (44) and due to the thresholds in MD-Hochberg.</p><p>From Theorem 3, we see that the ADD of MD-FDR does not increase with the number of data streams, and thus in this sense MD-FDR is a scalable decision procedure suitable for large scale problems. In contrast, the ADDs of MD-Bonferroni and MD-Hochberg scale at a logarithmic speed with the number of data streams. Such a discrepancy is due to that for procedures controlling the FDR, we allow a more relaxed decision criterion compared with procedures controlling the FWER.</p><p>Remark 4: Our assumption that all the data streams have the same prior distribution of their change points is mainly for convenience of analysis, but it can be relaxed. When different data streams have their own prior distributions, i.e. &#960; m = &#960; (k)  m , our results about FDR control and ADD asymptotic behavior can also be established, following the proofs of Theorems 1 and 3, and the general conclusion in Theorem 3 that under FDR control the ADD does not grow without bound with K still holds true. Similarly, the assumption that all the data streams have the same pre/post-change distributions can be relaxed as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EXTENSION TO DETECTION AND ISOLATION PROBLEMS</head><p>In this section, we generalize the problem formulation of change detection to change detection and isolation for parallel data streams. The change detection and isolation problem is about situations where after the change point, there are multiple distinct post-change hypotheses, only one of which is true, and the goal is to detect the change as soon as possible and identify the correct post-change distribution after the change occurs, subject to certain constraints on the false alarms and the false isolations. It is of importance for many applications, including fault diagnosis in dynamical systems and industrial processes, environment surveillance and monitoring, and target identification in radar and sonar signal processing; see, e.g., <ref type="bibr">[7]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Model and Problem Formulation</head><p>Consider a family {P (k)  &#960;, j , k &#8712; [K], j &#8712; [J]} of probability measures on ( , F) that describes both the prior distribution of t (k) and the distribution of</p><p>. with a (pre-change) distribution h 0 , and X (k)  t (k) , X (k) t (k) +1 , . . . are i.i.d. with another (postchange) distribution h j (k) , j (k) &#8712; [J] and further independent of</p><p>For simplicity, we write P (k) &#960;, j for P (k) &#960;, j (k) . We also impose a prior probability distribution p( j) on the change type j, and that h 0 and h j are probability densities with respect to some measure P on ( , F).</p><p>Now the decision maker needs to decide, for each data stream, both its change point and its change type (i.e., the post-change distribution). So we extend the definition of a decision procedure in Section II as follows.</p><p>Definition: A decision procedure for detection and isolation is a multiple stopping rule R consisting of a sequence of stopping times and decisions, as</p><p>where We also define the stopping time and the decided change type for every data stream.</p><p>Definition: For k &#8712; [k], denote by T (k) the stopping time for the k-th data stream, the same as that in Section II, and by &#308;(k) the decided change type for the k-th data stream.</p><p>We then extend the definition of the FDR to incorporate the effect of deciding change types incorrectly.</p><p>Definition: Denote by R is the number of change points declared by R, i.e., the number of elements in {T (k) , k &#8712; [K]} satisfying T (k) &lt; N, and by V the number of falsely declared change points or falsely classified change types, i.e., the number of elements in</p><p>where E &#960;,p represents the expectation with respect to</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Decision Procedure</head><p>For single-stream Bayesian change detection and isolation problems, thus far there has not been an optimal Bayesian solution satisfying the constraints on false alarm and false isolation probabilities for general prior distributions of the change point, except an asymptotically optimal Bayesian solution for the geometric prior in <ref type="bibr">[31]</ref>. Most non-Bayesian solutions are proposed to satisfy constraints on average run length type of false alarm criteria, which are not suitable in the FDR-control problem formulation with multiple data streams; see, e.g. <ref type="bibr">[32]</ref>- <ref type="bibr">[34]</ref>. To construct a decision procedure controlling the FDR, we consider a non-Bayesian approach in <ref type="bibr">[35]</ref>, which aims at having a control of the weighted sum of false alarm and false isolation probabilities, and possesses asymptotic optimality with some conditions. For each data stream k &#8712; [K], we define a sequence of statistics according to <ref type="bibr">[35,</ref><ref type="bibr">Thm. 6]</ref>,</p><p>in which m &#945; is a window size we choose and</p><p>Let</p><p>In the sequel, we assume that</p><p>Furthermore, we assume that the distribution of the change point satisfies log</p><p>as K &#8594; &#8734;.</p><p>In the following, we fix a parameter &#945; &#8712; (0, 1). According to <ref type="bibr">[35]</ref>, we choose m &#945; to satisfy</p><p>We set an array of threshold values</p><p>according to</p><p>which means that for threshold value Q s , the weighted sum of the false alarm and false isolation probabilities does not exceed s K &#945; as shown in (59). We present the decision procedure, called MDI-FDR, in Algorithm 3.</p><p>For the stopping times</p><p>where the subscript t (k) in P t (k) , j and P t (k) indicates that they are probability measures under a given change point t (k) .</p><p>To see that the thresholds {Q s } satisfy (56), note that according to <ref type="bibr">[35,</ref><ref type="bibr">Thm. 7]</ref>, we have</p><p>where (T (k) , &#308;(k) ) is obtained by the decision procedure MDI-FDR. Then we have</p><p>Algorithm 3: MDI-FDR. 1) Initially set q = 1, I q = [K], n = 1, and n 0 = 0.</p><p>2) Sample the active data streams {X (k) n } k&#8712;I q ,n&gt;n q-1 , and update the statistics {S (k)  n } k&#8712;I q ,n&gt;n q-1 using (50). 3) Sort the statistics in ascending order as {S (i(n,l )) n }, where i(n, l ) denotes the index of the l-th ordered statistic at time epoch n. 4) Repeat 2)-3) with n increased by 1 each time, until n equals</p><p>a) If n q &lt; N, stop the q-th stage and declare change points at n q for the following data streams:</p><p>, where</p><p>Meanwhile, declare change types as</p><p>for these declared data streams above. Update I q+1 to exclude the indices of these declared data streams. If |I q+1 | = 0, then stop; otherwise, set q = q + 1, and go to 2) to begin the next stage. b) Otherwise, n q = N, declare that all the active data streams in I q have no change points, and stop.</p><p>where the first inequality follows from an argument similar to the proof of <ref type="bibr">[35,</ref><ref type="bibr">Thm. 7]</ref>, and the second inequality is from (52) and (53).</p><p>The following result provides a bound on the FDR of MDI-FDR.</p><p>Theorem 5: For a given 0 &lt; &#945; &lt; 1, the decision procedure MDI-FDR satisfies</p><p>where</p><p>Proof: The proof is similar to that of Theorem 1, and thus we only indicate the difference between the proofs. Instead of (20), we define W k,s as k) , or &#308;(k) = j (k) , &#8707;n &#8805; t (k) . (65) So according to (56), we have P &#960;,p (W k,s ) &#8804; s&#945;/K, where the subscript p represents the prior distribution of the change type.</p><p>We can then follow the proof of FDR &#8734; in Theorem 1. Upon imposing N, we utilize the following result about the expected delay in detection and isolation problems:</p><p>Lemma 6: <ref type="bibr">[35,</ref><ref type="bibr">Thm. 7</ref>] If (T (k) , &#308;(k) ) satisfies (56), then for every 1 &#8804; j &#8804; J,</p><p>Hence, we have for every k &#8712; [K],</p><p>where o(1) &#8594; 0 as K &#8594; &#8734; in our situation. The expectation E t (k) , j is taken with respect to P t (k) , j . Then we use (67) to replace <ref type="bibr">(36)</ref> the proof of Theorem 1, and the remaining part readily follows.</p><p>We also study the ADD performance of MDI-FDR, which is defined in the same way as that in Section II. Applying (67), we have the following result regarding the ADD.</p><p>Theorem 7: For the decision procedure MDI-FDR controlling the FDR by &#945;, as K grows without bound, we have:</p><p>The proof is similar to that of Theorem 3 and is thus omitted here.</p><p>We can see that the statistics in MDI-FDR do not exploit the knowledge of the distribution of change types. So we also propose a heuristic statistic with a Bayesian flavor as:</p><p>where G (k) n ( j, 0) is a statistic similar to <ref type="bibr">(10)</ref> as:</p><p>where</p><p>We use the thresholds {Q s } in <ref type="bibr">(12)</ref>. This thus leads to a heuristic decision procedure called MDI-Heuristic in Algorithm 4. Unfortunately we have not established FDR control for MDI-Heuristic, but our numerical experiments in Section VI-B suggest that this heuristic decision procedure performs well. A more thorough theoretic analysis of MDI-Heuristic is left for future research. 1) Initially set q = 1, I q = [K], n = 1, and n 0 = 0.</p><p>2) Sample the active data streams {X (k) n } k&#8712;I q ,n&gt;n q-1 , and update the statistics {S (k)  n } k&#8712;I q ,n&gt;n q-1 using (69). 3) Sort the statistics in ascending order as {S (i(n,l )) n }, where i(n, l ) denotes the index of the l-th ordered statistic at time epoch n. 4) Repeat 2)-3) with n increased by 1 each time, until n equals</p><p>a) If n q &lt; N, stop the q-th stage and declare change points at n q for the following data streams:</p><p>, where</p><p>Meanwhile, declare change types as</p><p>for these declared data streams above. Update I q+1 to exclude the indices of these declared data streams. If |I q+1 | = 0, then stop; otherwise set q = q + 1, and go to 2) to begin the next stage. b) Otherwise, n q = N, declare that all the active data streams in I q have no change points, and stop.</p><p>Finally, we also consider decision procedures that control the FWER, which is defined as Pr{V &#8805; 1}. Here we provide a decision procedure based on Bonferroni's procedure, denoted as MDI-Bonferroni, as described in Algorithm 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. SIMULATION RESULTS</head><p>In this section, we present the numerical results of Monte Carlo simulations designed to compare the proposed decision procedures controlling the FDR and the FWER. In order to make a fair comparison, we set the upper bounds of the FDR and the FWER the same, as &#945; = 0.1. The deadline N is 2000.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Multiple Change Detection</head><p>Regarding the prior probability distribution of change points, we set the probability of that there is no finite change point as &#960; &#8734; = 0.2, and set the distribution of finite change points as a (conditional) geometric distribution with parameter p = 0.1.  GAUSSIAN DATA STREAMS Algorithm 5: MDI-Bonferroni. 1) Initially set q = 1, I q = [K], n = 1, and n 0 = 0.</p><p>2) Sample the active data streams {X (k) n } k&#8712;I q ,n&gt;n q-1 , and update the statistics {S (k)  n } k&#8712;I q ,n&gt;n q-1 using (50). 3) Repeat 2) with n increased by 1 each time, until n equals</p><p>(74) a) If n q &lt; N, stop the q-th stage and declare change points at n q for data streams that satisfy (74). Meanwhile, declare change types as</p><p>for these declared data streams above. Update I q+1 to exclude the indices of these declared data streams. If |I q+1 | = 0, then stop; otherwise, set q = q + 1, and go to 2) to begin the next stage. b) Otherwise, n q = N, declare that all the active data streams in I q have no change points, and stop.</p><p>streams, with pre/post-change probability distributions:</p><p>We see that all the decision procedures well satisfy the upper bound &#945;. Fig. <ref type="figure">2</ref> compares the ADD performance of the decision procedures. We see that the ADD is not visibly affected by the number of data streams K for MD-FDR, but tends to grow logarithmically with K for MD-Bonferroni and MD-Hochberg. This observation confirms the analytical result in Theorem 3, which indicates that FDR control decision procedures tend to be more scalable compared with FWER control decision procedures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Multiple Change Detection and Isolation</head><p>We again consider the prior probability distribution of change points as a mixture of a singleton at infinity with probability  &#960; &#8734; and a geometric distribution with parameter p = 0.05. We further set the post-change type as a uniform distribution on {1, . . . , J}, and consider K i.i.d. data streams with pre/postchange probability distributions:</p><p>We run simulations for four configurations as follows:</p><p>Tables III through VI display the results, wherein we include R, the average number of change points declared.</p><p>Comparing Tables III and IV, we see that as the gap between &#956; 1 and &#956; 2 decreases, the FDR/FWER and the ADD increase, reflecting that the task of isolation of post-change type becomes challenging. Comparing Tables III and V, we see that a larger probability of infinite change point considerably worsens the FDR/FWER, whereas still controlled by the prescribed upper bound. From Table VI, we see that an increase in the number of post-change types decreases the FDR/FWER, but at the cost of increasing the ADD. This is because in the decision procedure the thresholds increase with J and the error probability is with respect to the worst case. This suggests that when J is large, it may be possible to set smaller thresholds to control the error rates at a similar level but with reduced decision delays. </p><p>n , where Z (k) n is the faded received primary signal at the k-th cognitive user. We assume a time-invariant channel gain h (k) of the k-the frequency channel, and</p><p>n to be zero-mean circularly symmetric complex Gaussian with variance P (k) . The primary user is initially inactive, and at an unknown time slot t (k) it becomes active for the k-th spectrum channel. Thus there is a change in the distribution of data samples from CN(0, &#963; 2 ) to CN(0, &#963; 2 + P (k) ) at the unknown time slot t (k) for the k-th data stream. Here, the post-change distributions are different for each data stream, which is reasonable because the channel gains cannot be the same among all spectrum channels. As for the distribution of change points, in wireless communications, the arrival time of the primary user's traffic can be simply assumed to follow a geometric distribution with parameter p. In practice, because the communication signal is a complex number, we usually take the absolute square of the received signal, i.e., X (k) n 2 as the input of the corresponding detection procedures. The change in the distribution of X (k) n 2 should be from an exponential distribution Exp(&#963; 2 ) to another one Exp(&#963; 2 + P (k) ). Thus our proposed decision procedure MD-FDR can be suitable for this problem, even though the post-change distributions are different for each data stream, which is mentioned in Section IV. It is natural to interpret the FDR as the severity of the "wasted" spectrum (i.e., those unused channels mistakenly detected as occupied by primary users), and the ADD as the amount of mean interference duration induced by the cognitive users. We can apply the decision procedure MD-FDR to this multichannel dynamic spectrum sensing problem to detect the retransmitting of primary users as quickly as possible to give primary users more protection while controlling the waste of spectrum resources.</p><p>We present simulation results in Fig. <ref type="figure">5</ref> to compare the performance of the proposed decision procedures controlling the FDR and the FWER with different numbers of spectrum channels. The upper bounds of the FDR and the FWER are set to be the same, as &#945; = 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, respectively. The parameter of the prior distribution of change points is set as p = 0.05. As for the variances of pre/post-change probability distributions, we set &#963; 2 = 2 and P (k) is drawn from <ref type="bibr">[1,</ref><ref type="bibr">2]</ref> uniformly for different K. We can see that the ADD decreases with the increase of FDR/FWER for all decision procedures, and the FDR control decision procedures actually have a smaller mean inference duration compared with the FWER control decision procedures. The ADD of FWER control decision procedures has a larger increase for large K compared with that of FDR control decision procedures. This is due to the fact that, for large K the FWER metric will lead to a more stringent constraint on the false alarm rate and then make the ADD increase, which is consistent with the results in Section IV-B and Section VI-A. The ADD of MD-FDR varies only slightly with K because there are more different post-change probability distributions with the increase of K and the ADD is associated with different KL divergences.</p><p>Note that we only consider the simplest situation in which the parameters of pre/post-change distributions are known. In practice, because of the unknown primary signal statistics and channel gains, it is difficult to make the parameters of the postchange distributions precise. Thus, we may consider that P (k) belongs to a certain set of possible values, which is similar to the model in Section V.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VIII. CONCLUSION</head><p>To cope with the emerging situation of analyzing largescale real-time streaming data, in this work, we have examined the problem of sequentially detecting changes in parallel data streams, and brought the error metric of FDR into the problem formulation. Our proposed FDR-oriented decision procedure is a sequential variant of the classical BH procedure, where the pvalue is replaced with a sequential detection statistic. Thanks to the characteristics of FDR-oriented decision procedures, when it comes to change detection problems, the average decision delay is significantly reduced compared with conventional FWERoriented decision procedures. Our theoretical findings are also corroborated by numerical experiments and case studies related to cognitive radios.</p><p>Note that we have not established the optimality of our proposed decision procedure for the problem of sequentially detecting changes in multiple data streams. It appears to be difficult to establish a lower bound for the ADD of any decision rule in the class of rules with a given FDR upper bound. This is an important yet unsolved issue for future research.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: Princeton University. Downloaded on September 27,2020 at 00:03:29 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
