<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Modeling bivariate geyser eruption system with covariate-adjusted recurrent event process</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>04/06/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10250788</idno>
					<idno type="doi">10.1080/02664763.2021.1910937</idno>
					<title level='j'>Journal of Applied Statistics</title>
<idno>0266-4763</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Zhongnan Jin</author><author>Lu Lu</author><author>Khaled Bedair</author><author>Yili Hong</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Geyser eruption is one of the most popular signature attractions at the Yellowstone National Park. The interdependence of geyser eruptions and impacts of covariates are of interest to researchers in geyser studies. In this paper, we propose a parametric covariate-adjusted recurrent event model for estimating the eruption gap time. We describe a general bivariate recurrent event process, where a bivariate lognormal distribution and a Gumbel copula with different marginal distributions are used to model an interdependent dual-type event system. The maximum likelihood approach is used to estimate model parameters. The proposed method is applied to analyzing the Yellowstone geyser eruption data for a bivariate geyser system and offers a deeper understanding of the event occurrence mechanism of individual events as well as the system as a whole. A comprehensive simulation study is conducted to evaluate the performance of the proposed method.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction 1.Background</head><p>Geyser eruption is one of the signature attractions at the Yellowstone National Park, which is home to two-thirds of the worlds' geysers. Tourists around the world crave to witness this fascinating natural phenomenon. Many researchers are interested in studying geyser eruptions and the underlying mechanisms. <ref type="bibr">Fournier (1969)</ref> built a physical model to describe the time interval between eruptions for the Old Faithful geyser, which is one of the most famous geysers in the Yellowstone National Park. <ref type="bibr">Rinehart (1972)</ref> showed that the Old Faithful Geyser activities are affected by earth tidal forces, barometric pressure, and tectonic stresses. However, geyser eruptions have not been studied by statistical methods. This paper develops a statistical model for analyzing geyser eruption data from a bivariate geyser system. This work will benefit the geyser study community for understanding and effectively modeling geyser eruption activities.</p><p>Typical geyser eruption is a repeating process and hence can be modeled with a recurrent process for events repeatedly occurring over time. Recurrent processes have had broad applications in diverse areas. For example, they have been widely used for studying vehicle failures in warranty studies <ref type="bibr">(Lawless, 1995)</ref>, relapse biomarkers in cancer research <ref type="bibr">(Schaubel and Cai, 2004)</ref>, and sports injury analysis <ref type="bibr">(Ullah et al., 2014)</ref>. Typically the time interval between two consecutive events, which is also referred to as the gap time, is studied to model the event frequency in a recurrent process. The proportional intensity models <ref type="bibr">(Cox, 1972, and</ref><ref type="bibr">Andersen and</ref><ref type="bibr">Gill, 1982)</ref> are popular for modeling event occurrences of a single type of event. In more sophisticated studies, there are multiple types of recurrent events observed in a single system. The occurrence of any type of event will result in a system event. In addition, in a multi-type recurrent event process, the gap time for different event types could be correlated. For example, the occurrence of one type of event could cause other types of events to occur more frequently. In this case, a multivariate recurrent process should be considered to model the interdependence of multiple event types in the same system.</p><p>In many event analyses, covariates are found to be useful for modeling the event occurrence time and frequency. Many recurrent processes are affected by process conditions. For example, some mechanical failures could occur at a higher frequency under a higher temperature, humidity, or pressure. Incorporating covariates into the recurrent process models could improve the model performance and provide a more precise estimation of the event time and frequency. Models of this type are referred to as the covariate-adjusted recurrent event process models.</p><p>In this paper, we focus on modeling and analysis of geyser eruptions for a two geyser system in the Yellowstone National Park. In a multi-type recurrent event system, the system events can result from either type of events, and hence any consecutive events could be associated with the same or different event types. In order to describe this bivariate recurrent event process, we need to not only understand the marginal behavior of each type of event, but also understand the interdependence between the two types of events. We consider a bivariate distribution for the gap times between successive events for a bivariate event system.</p><p>To improve the estimation, we leverage the covariate information on the eruption duration and develop a covariate-adjusted bivariate recurrent event process model for estimating the eruption gap time of a two geyser system formed by West Triplet Geyser and Grotto Geyser in the Yellowstone National Park.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Related Literature</head><p>Recurrent event processes are extensively studied in the areas of reliability, public health, and medical studies. The nonparametric estimation of gap time distribution based on multivariate failure time data was introduced in <ref type="bibr">Schaubel and Cai (2004)</ref>. <ref type="bibr">Dauxois and Sencey (2009)</ref> considered the risks of two nosocomial infections for patients admitted to hospitals. <ref type="bibr">Bouaziz et al. (2013)</ref> provided a nonparametric method to estimate the intensity function of a recurrent process. Other than hazard functions and intensity functions, survival status in time is often of interest as well. <ref type="bibr">Huang and Liu (2007)</ref> investigated the disease free survival rate in a recurrent heart failure study. <ref type="bibr">Zeng and Lin (2009)</ref> and <ref type="bibr">Garre et al. (2008)</ref> focused on studying the terminal events in recurrent systems. <ref type="bibr">Meyer and Romeo (2015)</ref> presented Bayesian analysis of recurrent event using copulas. An earlier review regarding recurrent events can be found in <ref type="bibr">Lawless (1995)</ref> and a review on modeling of repairable systems can be found in <ref type="bibr">Lindqvist (2006)</ref>. Classical books on recurrent event data analysis include <ref type="bibr">Daley and Vere-Jones (2003)</ref>, <ref type="bibr">Cook and Lawless (2007)</ref>, and <ref type="bibr">Duchateau and Janssen (2008)</ref>.</p><p>In medical research, <ref type="bibr">Liu and Huang (2009)</ref> presented repeated measurements of biomarker to determine the HIV survival status in a recurrent event system. <ref type="bibr">Sun et al. (2006)</ref> applied covariate-adjusted additive hazard model for the data, which is involving recurrent gap times. <ref type="bibr">Prasad and Rao (2002)</ref> used a proportional hazard function with covariate adjustment in a repairable system. In another application of recurrent event data, <ref type="bibr">Huzurbazar and Williams (2010)</ref> incorporated covariates in a flowgraph model. <ref type="bibr">Yang et al. (2013)</ref> introduced multivariate lognormal assumption on event gap times of different event types. <ref type="bibr">Yang et al. (2017)</ref> considered a parametric model for the multi-type event recurrent event data without covariates and developed copula function on gap times for the recurrent process in a car body manufacturing process.</p><p>Existing research does not consider correlated renewal process with covariate adjustments.</p><p>Motivated by the geyser data, we propose the CARP model to study the eruption gap time for a two-geyser system. Application-wise, geyser eruption is rarely studied by statistical models. For the geyser eruption study, the modeling and analysis presented in this paper are new to geyser research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Overview</head><p>The rest of this paper is organized as follows. Section 2 provides more details on the geyser eruption data from the Yellowstone National Park. Section 3 discusses the model formulation for the multi-type recurrent event system. Section 4 describes the maximum likelihood approach for estimating the model parameters. A simulation study is described in Section 5 to evaluate the proposed method with model comparisons made under different parameter settings. The modeling and analysis of the Yellowstone geyser data are detailed in Section 6.</p><p>Section 7 contains some concluding remarks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Geyser Data</head><p>We use the publicly available Yellowstone geyser eruption data, which were collected in 2008 by the Geyser Observation and Study Association (GOSA). By using underground sensors, water levels were measured continuously, and occurrences of geyser eruptions were detected automatically. For each geyser, the GOSA data include the starting time and duration of each eruption. We choose to analyze the data from the West Triplet Geyser and the Grotto Geyser during the study period between June 2008 and November 2008. This particular dataset and study period were chosen to ensure that completely uninterrupted recurrent event data are available over a relatively long time span to allow for the dependence modeling for the two geysers.</p><p>We merge their eruption records in a temporal order as illustrated in Table <ref type="table">1</ref>. The data include the date and time when an eruption occurred, the eruption duration, and which geyser had the eruption. During the study period, the West Triplet Geyser erupted more frequently than the Grotto Geyser. Also, the eruptions of the West Triplet Geyser also lasted longer than those of the Grotto Geyser on average. More specifically, the average eruption gap time for the West Triplet Geyser is 6.8 hours with a standard deviation of 2.8 hours, while the Grotto Geyser had an average eruption gap time of 9.3 hours with a standard deviation of 8.6 hours. For the eruption duration, the Grotto Geyser eruptions lasted generally longer than the West Triplet Geyser, with the average duration of the Grotto Geyser eruptions being 3.5 hours with a standard deviation of 5.2 hours and the eruption duration of the West Triplet Geyser averaged at 0.7 hours with a 0.5 hours standard deviation.</p><p>Figures <ref type="figure">1a</ref> and<ref type="figure">1b</ref> display the side-by-side boxplots of the time between eruptions and the duration of eruptions, respectively, for the West Triplet and Grotto geysers. We can see compared with the West Triplet Geyser, the Grotto Geyser has a much larger variation of the eruption gap time with a number of extremely long gaps between eruptions and also more variation in the eruption frequency. On the other hand, the West Triplet Geyser has considerably shorter and less variable eruption durations than the Grotto Geyser. In addition, Figure <ref type="figure">2</ref> shows the plots of the time between eruptions versus the duration time of the previous eruption for both geysers. We can observe a moderate correlation between these two variables for the West Triplet Geyser and a strong correlation for the Grotto geyser.</p><p>Hence, we decided to utilize the duration of the previous eruption for both geysers to help model the eruption gap time.</p><p>Figure <ref type="figure">3</ref> illustrates the data obtained for the two-geyser system. In this figure, W kj denotes the gap time of the kth eruption (i.e., the time interval between the (k 1)th and kth eruptions) for geyser j where j = 1 for the West Triplet Geyser and j = 2 for the Grotto Geyser. The covariate x i denotes the eruption duration for the ith eruption in the bivariate geyser system, where i = 1, &#215; &#215; &#215;, n, n = n 1 + n 2 is the total number of eruptions for both geysers, and n j denotes the number of eruptions for the jth geyser. In particular, n 1 = 580, n 2 = 421, and n = 1001 for the dataset we analyzed. Note here the time between eruptions are labeled separately for each individual geyser. To model the bivariate geyser system, we will introduce new notation for jointly describing the event time, event type and covariate information in Section 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Data Setup and Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Data Setup</head><p>Suppose that in a bivariate recurrent process with n total events, the systematic event time is described by the variable T i for i = 1, &#215; &#215; &#215;, n. We use T 0 to denote the starting time or the system installation time, and T n denotes the last event time in the system. In a bivariate system, there are two types of events, and hence we use an indicator variable &#916; i }1, 2| to    represent the type of event. For an event that occurs at time t i , the covariate vector on the event duration is denoted by X i , which is a vector of the previous duration for both types of events. Therefore, each event can be represented by }T i , &#916; i , X i | , where i = 1, &#215; &#215; &#215;, n. We use }t i , &#948; i , x i | to denote the observations of the ith event which occurs at time t i and it is from event type &#948; i with covariates measured as x i . Note in an observed recurrent process with n total events, the last event is observed at time t n , where t n &#8805; &#954; with &#954; being the pre-defined study termination time.</p><p>This sequence of bivariate events can be expressed as a counting process }N (t) : t &#8764; 0| , where N (t) denotes the cumulative number of events at time t regardless of the event type.</p><p>Similarly, we define the counting process for each individual event type j as }N j (t) : t &#8764; 0| , where j = 1 or 2. In addition, we denote the event history of the system up to a time point s &#8805; &#954; by X s = }N (t) : t &#8805; s| . Similarly, the covariate history can be expressed as { s = }x t : t &#8805; s| . For further discussion, we use H s = }X s , { s | to denote the history including both event and covariate information.</p><p>For a bivariate recurrent process system, event time variable T i defined above satisfies</p><p>Similarly, for events of type j, the event time variables are defined as T lj , where l = 1, . . . , n j and n j is the number of events from type j. In a bivariate recurrent system, we have the relation n 1 + n 2 = n. The time variable T lj is also presented in temporal order. For the events of type j, j = 1 or 2, we have</p><p>Based on the ordered event times, the event gap time for the jth type as the interval between two consecutive events can be calculated as W lj = T l+1,j T lj , where l = 0, 1, . . . , n j 1. Now consider the joint bivariate recurrent system. For the ith event in the system, the two-dimensional event gap time variable is defined as</p><p>where l ij = N j (t i ) is the cumulative number of events for jth event type by time t i . For example, at the system installation time where t = 0, the event gap time vector is W 0 = (W 01 , W 02 ) . If the first event occurs at time t 1 and is from event type 1, then the event gap time vector is denoted as W 1 = (W 11 , W 02 ) . In this case, l 11 = 1 since one event of type 1 has occurred as of time t 1 , while l 12 = 0 as no event of type 2 has occurred by that time point.</p><p>To link the event gap time W i with the system event time T i , we introduce an age variable at time t which is defined as the time between the time point t and the time of the latest event of each event type. In a bivariate recurrent process, the age variable is a vector of two components denoted as</p><p>where A j (t) = t T N j (t), j for j = 1 and 2. For an observed event at time t i , the observed age vector is denoted as</p><p>, where a j (t) = t t N j (t), j . For example, at the system installation time at t = 0, the age vector is a 0 = (0, 0) . If the first event occurs from type 1 at time t 1 , then a 1 = (0, t 1 ) . The age variable then connects between the system event time and the event gap time W i through the relation W i &#8764; A i . Here, the vector comparison is defined to be an element-wise comparison. In other words, W i &#8764; A i suggests W l ij ,j &#8764; A j (T i ) for both j = 1 and 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Model</head><p>In this section, we introduce the proposed covariate-adjusted recurrent process (CARP).</p><p>After an event occurred at time t i (or after the system installation at time t 0 = 0), the distribution of the event gap time is given by</p><p>(1)</p><p>is the joint cumulative distribution function (cdf) of the event gap time conditioned on the age and covariates. At any event time, the information we have about the event type that</p><p>has not yet occurred is captured through its age and the conditional probability that is conditioned on the event eruption time is greater than or equal to the age since the last eruption. At each event time, the age is set to be zero for the occurred event type. The event gap time variable W i is adjusted by covariates x i which is further discussed in Section 3.3 in more detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Dependence Modeling and Covariate Adjustment</head><p>Dependence between events from different types in a bivariate system is modeled by implementing distributional assumptions on variable W i = (W i1 , W i2 ) . In this paper, we use a bivariate lognormal distribution and a copula function to model the random vector W i , where in both models, covariates x i are used for adjustment. We refer to the CARP models under these assumptions as the CARP-MLN and CARP-copula models, respectively. Here MLN is short for multivariate lognormal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The CARP-MLN Model</head><p>For the multivariate lognormal distribution,</p><p>where the location parameter in the bivariate lognormal distribution is expressed as a linear form of covariates x i . That is,</p><p>In the linear expression above, &#956; 0 is a vector of baseline location parameters and B is a 2 &#8804; 2 coefficient matrix. In the bivariate lognormal assumption, we use a covariance matrix &#931; to capture the event dependence between events from two event types. Specifically, the diagonal elements in &#931; represent marginal variances while the off diagonal elements stand for the covariances. When using the bivariate lognormal distribution, the covariance matrix is defined as &#931; = CC to ensure &#931; to be positive definite, where</p><p>The correlation &#961; is determined by &#963; 2 and &#951; as</p><p>Note the above covariance matrix offers great flexibility to model different correlation relationships of varied size and direction. The sign of the &#951; value determines if the two types of events have a positive or negative correlation. In a bivariate lognormal distribution, the marginal distribution of each dimension also follows a lognormal distribution. Therefore, when the observed marginal distributions do not seem to follow the lognormal distributions, or their dependency cannot be characterized by the covariance matrix &#931;, the assumed bivariate lognormal distribution is not appropriate. In this case, the alternative strategy is to define W 1 and W 2 by separate distributions and combine them through a more flexible copula function. This is referred to as the CARP-copula model, which will be introduced in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>The CARP-copula Model</head><p>Let F 1 and F 2 denote the marginal cdfs of W 1 and W 2 . There always exists a copula function C such that the joint cdf of the two dimensional variable (W 1 , W 2 ) &#8242; can be written as</p><p>where for any unitary uniform variable U j , j = 1, 2, the bivariate copula is defined as</p><p>This is also known as the Sklar's Theorem. By using a copula function, we have the flexibility to choose marginal distributions separately for each event type. For instance, a Gamma distribution and a Weibull distribution can be used as marginal distributions for the two types of events, respectively. With selected F 1 and F 2 , one can combine the marginal distributions by using different copula functions.</p><p>Note the copula approach allows us to model the marginal distributions separately and then combine them through an appropriate copula function for modeling the dependence structure. For the geyser system, we choose to use parametric distributions for modeling the marginal distributions. However, the method can be easily generalized to using nonparametric methods for modeling the marginal distributions and hence offers great flexibility to be adapted for broad applications.</p><p>In the literature, a variety of copula functions has been introduced to capture different dependence patterns among the marginal distributions. A Gaussian copula uses a multivariate normal distribution of transformed marginal distributions based on the inverse cdf of the standard normal distribution. The Archimedean copulas are an associative class of copula functions that model the multivariate dependence through a single parameter. The power variance function copulas including Clayton, Gumbel and Inverse Gaussian are among the most popular ones that are flexible for modeling various dependence structures (e.g., <ref type="bibr">Romeo et al., 2018)</ref>. In this paper, we use the Gumbel copula function which is popular for modeling stronger dependence in the positive tail. We refer to the CARP model with the use of Gumbel copula as CARP-copula model for the rest of the paper. In fact, the CARP-MLN model is a special case of the CARP-copula model, where in CARP-MLN, the Gaussian copula is applied and lognormal marginal distributions are selected.</p><p>The CARP-MLN model characterizes the dependence among event types through the covariance matrix &#931;, while the CARP-copula model quantifies dependence using the copula parameter. Particularly, in the Gumbel copula model, the parameter is denoted as &#945;. As a result, in the CARP-MLN and CARP-copula models, we have different parameters to characterize event dependence. In order to compare dependence from different models, we introduce the Kendall's tau.</p><p>In the CARP-copula model, the Kendall's tau is expressed as &#964; = 1 1/&#945;, where &#945; is the copula coefficient in the Gumbel copula. In the CARP-MLN model, the Kendall's tau is calculated as &#964; = (2/&#960;) arcsin(&#951;/ &#963; 2 2 + &#951; 2 ), where &#951; and &#963; 2 can be found in (4). Similar to the CARP-MLN model, for the CARP-copula model, we also use a linear form of the covariates x i to represent location parameters in marginal distributions as in (3). In the CARP-MLN model, we use a two dimensional vector &#956;(</p><p>the location parameter of the lognormal distribution. While in the CARP-copula, &#956; 1 (x i ) and &#956; 2 (x i ) stand for the location parameters for the first and second marginal distributions, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Properties of CARP</head><p>For event gap time variable W i , we define the survival function (sf), cdf and hazard function as follows. We denote the joint sf of W i as,</p><p>The joint cdf is</p><p>and the corresponding joint probability density function (pdf) is denoted by f W (v). According to (2), the joint pdf of the event gap time variable given all historical events W i H t i is given by</p><p>where c j = t l ij ,j + W j , j = 1, 2. The denominator in (7) takes age condition into account, while the numerator builds the relationship among the event gap time W j , the event time t l ij ,j and the age a j (c j ). Covariates x i are used to adjust the gap time W i as discussed in Section 3.3.</p><p>For further discussions, let H t -be the event history up to time t. Note that for event type j, T N j (t -),j gives the most recent event time by time t, and the age variable prior to time t is denoted as A - j (t) = t T N j (t -),j , which calculates the cumulative running time upon time t since the last event. We use a -(t) = [a - 1 (t), a - 2 (t)] to denote the age vector for the two event processes prior to time t. Hence, prior to an event time t i , the age vector is denoted as a - i = a -(t i ). For example, at the initial time 0, the vector is a - 0 = (0, 0) . If the event type is &#948; 1 = 1 at t 1 , then a - 1 = (t 1 , t 1 ) . Note that a 1 updates the age to be zero for the corresponding event type at an event time, while a - 1 does not. This notation is used to derive the likelihood in Section 4, is also used to define the hazard function below.</p><p>In the literature, the sub-intensity function (i.e., cause-specific event intensity function) is often used to characterize an event process. In particular, the sub-intensity function for event type j is defined as</p><p>where T is the event time, and &#916; is the event type. The sub-cumulative intensity function is H j (t) = t 0 h j (s)ds. The hazard function and cumulative hazard function for the system are calculated as the sum of corresponding functions for the two event types,</p><p>The sub-intensity function in ( <ref type="formula">8</ref>) is calculated as</p><p>and v = ( v 1 , v 2 ) is a vector with two components. More detail about the calculation of D j ( v) under different models will be discussed in Section 4.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Parameter Estimation</head><p>The maximum likelihood (ML) approach is used to estimate model parameters. Parameters in the model include parameters in the joint distribution function, the copula function and the linear covariate transformation function. Given all the event history H &#964; , the likelihood function is constructed as follows:</p><p>where</p><p>Let &#952; denote the vector of all the parameters in the model. The estimated parameters &#952; are asymptotically normally distributed based on the large sample ML theory <ref type="bibr">(Casella and Berger, 2002)</ref>. The calculation of the likelihood L i for the proposed CARP models is shown below.</p><p>For any observed recurrent process with n total events, the likelihood contribution for i = 1, &#215; &#215; &#215;, n in (11) is given by</p><p>In ( <ref type="formula">12</ref>), the quantity D &#948; i (a - i ), which is introduced in (9), is the partial derivative of the bivariate sf S(a - i ). The covariates x i are used to adjust the distribution of W i . Likelihood calculations so far are the same for both the CARP-MLN and CARP-copula models. However, we need different ways to calculate D j (a - i ) for the two CARP models based on how the survival functions are calculated, which are detailed below.</p><p>For the CARP-MLN model, the sf is calculated in a closed form as discussed in ( <ref type="formula">6</ref>), and the partial derivative with regard to the jth event type can be written as</p><p>where f j [a - j (t i )] is the jth marginal density function from a bivariate lognormal distribution. In ( <ref type="formula">13</ref>), the conditional probability can be calculated from the conditional normal distribu-tion with a logarithm transformation. Calculation details can be found in Appendix A.1.</p><p>For the CARP-copula model, the likelihood function is calculated based on the relationship between the bivariate sf S(a - i ) and the cdf F (a - i ). In bivariate cases, the sf and cdf have the following relationship:</p><p>where the joint cdf can be calculated by the copula as in ( <ref type="formula">5</ref>). The calculated likelihood will vary for different choices of the marginal distribution and the copula function. In our case, we chose the Gumbel copula as described in Section 3.3. The ML estimates &#952; are obtained by maximizing the likelihood function in (10).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Simulation Study</head><p>This section describes the simulation study we conducted for evaluating the proposed method.</p><p>We simulated bivariate recurrent system data based on different models and parameter values.</p><p>The performance of the proposed method was then evaluated based on the simulated data.</p><p>Different parameter values were considered in the simulation for both the CARP-MLN and the CARP-copula models to understand the impact of the model parameters. We evaluated the model goodness of fit by calculating the average AIC and the mean squared error (MSE) of the estimated model parameters. The goal is to demonstrate the performance of the proposed method and the improvement by using the covariate adjustment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Simulation Setting</head><p>We used both the CARP-MLN and CARP-copula models to simulate the data. For each generated data set, both models were fitted to the data and results were evaluated. The model used to generate data is referred to as the true model, while the models used to fit the data are referred to as the fitted models. We explored changing the sample sizes (n), the Kendall's tau value, the scale parameters (&#963; j , j = 1, 2), and the linear transformation matrix (B) to understand their impacts on the analysis. First, to understand the impact of sample size, we varied the sample size at n =200, 500, 1000 and 2000 when generating the data using each model. For the CARP-copula model, the lognormal marginal distributions were used for both event types along with the Gumbel copula function. The location and scale parameters for lognormal marginal distributions in the true model were specified at (&#956; 1 = 1, &#963; 1 = 0.25) and (&#956; 2 = 1.5, &#963; 2 = 0.25). For the linear transformation matrix B, we used the 2 &#215; 2 matrix below</p><p>We chose to set the Gumbel copula parameter &#945; at 1.5. When generating the data based on the CARP-MLN model, we used the same location and scale parameters for the marginal lognormal distributions, and adjusted the correlation parameter &#951; to obtain the same Kendall's tau as in the CARP-copula model.</p><p>Second, to evaluate the effect of the Gumbel copula coefficient &#945;, we varied its value at 1, 1.12, 1.5 and 2.22 by changing the Kendall's tau parameter in the true models, while keeping all other parameters the same with the sample size fixed at n = 1000. Considering that the CARP-MLN and CARP-copula models quantify the Kendall's tau differently, we used &#951; at 0, 0.0443, 0.1445 and 0.299 for the CARP-MLN model, so that the Kendall's tau from both</p><p>CARPs are varied at 0, 0.11, 0.33 and 0.55.</p><p>We also explored the impact of the scale parameters by letting &#963; 1 and &#963; 2 vary at 0.35, 0.3, 0.25 to 0.20 for the marginal lognormal distributions of the CARP-Copula model while keeping sample size at n = 1000 and the Kendall's tau was kept at 0.33. Similarly for the CARP-MLN model, covariance matrices were adjusted to align with the marginal distributions used in the CARP-Copula model.</p><p>Lastly, the effect of covariate adjustment was studied by using different types of B. True models used both non-zero and zero B matrices. In the non-zero B case, we used the coefficient matrix in ( <ref type="formula">14</ref>) to generate data. In the zero B case, B = 0 was used. The Kendall's tau was set to be 0.33, while other parameters are the same as ones in the sample size case. A sample size of 1000 is used across true models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Simulation Results</head><p>The simulation results under different true models and fitted models are summarized in this section. For each true model, the model performance was summarized over 1000 simulated data sets. The simulation size was chosen to ensure a reliable analysis result while balancing the computing time needed to evaluate a broad number of scenarios. Further increasing the simulation size would not result in a meaningful change in the evaluated summary statistics and the drawn conclusions. Under each fitted model, the average AIC was computed. In order to evaluate the performance of parameter estimates, we calculated the MSE of the location parameter &#181;, scale parameter &#963; and the linear transformation parameter B for Table <ref type="table">2</ref> shows the average AIC of the fitted models summarized over the 1000 simulated data sets generated using both the CARP-copula and CARP-MLN models at different sample sizes. Both the CARP-MLN and CARP-copula models were applied to each simulated data set. We can see at each fixed sample size, the fitted CARP model that matches the true model used for simulating the data generally outperforms the other model by having a smaller average AIC value, except for the smallest size case at n = 200 where the difference in the average AIC values are extremely small. The advantage of using the true model becomes    Table <ref type="table">3</ref> shows the average AIC values of the fitted models when the Kendall's tau for capturing the dependence between two event types varies at the levels of 0, 0.11, 0.33 and 0.55. When the Kendall's tau is not zero which suggests some level of dependence between two marginal variables, using a fitted model that matches the true model for generating the data will result in a smaller average AIC value with a bigger improvement achieved when stronger dependence exists between the two types of events. When the Kendall's tau is zero indicating the marginal distributions are independent, the average AIC values are generally similar regardless which model is used to fit the data.</p><p>Table <ref type="table">4</ref> shows the comparison based on using different scale parameters in the true underlying model for both the CARP-copula and CARP-MLN models. Again, across all evaluated scale parameter values, we observe consistently smaller average AIC values when the fitted model matches the true model used for data generation.</p><p>Lastly, Table <ref type="table">5</ref> compares the results of using different coefficient matrix B. We simulated data with both non-zero and zero B using both the CARP-MLN and CARP-copula models.</p><p>For each simulated data, we fitted both the CARP-MLN and CARP-copula models with non-zero and zero B. Similar patterns can be observed. When the true model was used as the fitted model, it results in the smallest AIC value. When non-zero B was used to generate the data, the use of covariate adjustment led to substantial improvement in the average AIC of the fitted model compared to using zero B. On the other hand, when the data were generated with zero B, using the non-zero B fitted models produced similar AIC values as the zero B models. Therefore, the covariate adjustment is generally recommended due to its potential to substantially improving the model performance by leveraging the additional covariate information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Analysis of the Geyser Data</head><p>In this section, we present the analysis for the bivariate geyser system in the Yellowstone National Park using the proposed CARP models. Two adjacent geysers including the West Triplet and the Grotto Geyser are considered for our analysis. Geyser eruptions are highly related to underground water levels, which can be affected by a nearby geyser eruption. Also, it is believed that the eruption duration could affect the gap time until the next eruption. In particular, the longer the current eruption lasts, the longer it will take for the next eruption to occur. This is because a longer eruption usually indicates more water consumption during the eruption and hence a longer water gathering time is expected to reach the next eruption.</p><p>We use both the CARP-MLN and the CARP-copula models with lognormal marginal dis-  where the correlation estimation is calculated as &#951;/ &#8730; &#963; 2 2 + &#951; 2 = 0.05. This indicates a small negative correlation between the event gap times of the West Triplet and Grotto Geysers. In other words, a longer eruption gap time for West Triplet Geyser could be associated with a shorter time interval for the next eruption of the Grotto Geyser. The Kendall's tau provides  We can see a better agreement between the estimated and the observed cumulative intensity functions based on the CARP-copula model compared with the CARP-MLN model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Concluding Remarks</head><p>This paper introduces two CARP models, CARP-MLN and CARP-Copula, for modeling a bivariate recurrent process. With the covariate adjustment, the CARP models provide improvements over traditional models for capturing the interdependence of the bivariate event system. When the underlying data are close to a bivariate lognormal distribution, both models work similarly well. However, when the real data are inconsistent with a bivariate lognormal distribution, the CARP-copula model is recommended to improve the model performance.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A.2 Confidence Interval for Kendall's tau</head><p>For lognormal cases, the Kendall's tau estimator can be expressed as</p><p>where the asymptotic distribution is known from the ML estimator. Using the Delta method,</p><p>where</p><p>For the Gumbel copula case, &#964; = 1 1 &#945; .</p></div></body>
		</text>
</TEI>
