<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Semi-parametric Learning of Structured Temporal Point Processes</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10225417</idno>
					<idno type="doi"></idno>
					<title level='j'>Journal of machine learning research</title>
<idno>1533-7928</idno>
<biblScope unit="volume">21</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Ganggang Xu</author><author>Ming Wang</author><author>Jiangze Bian</author><author>Hui Huang</author><author>Timothy Burch</author><author>Sandro Andrade</author><author>Jingfei Zhang</author><author>Yongtao Guan</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We propose a general framework of using a multi-level log-Gaussian Cox process to model repeatedly observed point processes with complex structures; such type of data has become increasingly available in various areas including medical research, social sciences, economics, and finance due to technological advances. A novel nonparametric approach is developed to efficiently and consistently estimate the covariance functions of the latent Gaussian processes at all levels. To predict the functional principal component scores, we propose a consistent estimation procedure by maximizing the conditional likelihood of super-positions of point processes. We further extend our procedure to the bivariate point process case in which potential correlations between the processes can be assessed. Asymptotic properties of the proposed estimators are investigated, and the effectiveness of our procedures is illustrated through a simulation study and an application to a stock trading dataset.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>With the recent advancement of data collection technologies, large-scale event time data recorded with high temporal resolutions have become increasingly available through various trading, social media, and retail platforms. Analyzing such data is important in understanding user behaviors. One major challenge with such data is that they often have large sample sizes and complex structures. As an example, we analyze a dataset that draws from stock trading transactions (recorded at the second level) by more than 300,000 Chinese trading accounts over approximately three years. Figure <ref type="figure">1</ref> shows the estimated overall trading intensity within a transaction day and the ratios of the estimated trading intensities at the account and day levels relative to the overall trading intensity, based on 1,000 sample accounts over 672 days. As shown in the figure, the trading intensities vary considerably within each day, between accounts, and across days. Our main interest is to develop a methodology to investigate the impacts of the di&#8629;erent sources of variations for event time data that are repeatedly observed over time and from multiple accounts, as is the case in our example data. To that end, we view the event times from each account on a given day as a realization from a so-called log-Gaussian Cox process (LGCP, <ref type="bibr">M&#248;ller et al., 1998;</ref><ref type="bibr">Simpson et al., 2016)</ref>. A log-Gaussian Cox process is a Poisson process that has a latent intensity function the logarithm of which is a Gaussian process. We decompose the log-latent intensity as the sum of three independent Gaussian processes, reflecting variations at the account, day, and account-day levels, respectively. To maintain modeling flexibility, we approximate these latent Gaussian processes nonparametrically using the Karhunen-Lo&#232;ve expansion <ref type="bibr">(Watanabe, 1965)</ref>, in which each process is characterized as a linear combination of orthogonal eigenfunctions (as illustrated in ( <ref type="formula">6</ref>)). This approach does not require parametric assumptions such as specific forms of the mean and the covariance function of the latent Gaussian process that are commonly used in existing literature (see <ref type="bibr">Yu et al., 2009;</ref><ref type="bibr">Simpson et al., 2016, etc.)</ref>. Instead, we impose some less restrictive moment and smoothness conditions on the latent Gaussian processes, which are outlined in detail in Section 4. In this sense, the proposed approach o&#8629;ers greater flexibility in modeling complex event time data.</p><p>Although the use of the Karhunen-Lo&#232;ve expansion <ref type="bibr">(Watanabe, 1965)</ref> in the current setting may appear straightforward, there are nontrivial challenges. In particular, one challenge is how to nonparametrically estimate the covariance function for each latent Gaussian process based on which the eigenfunctions and associated principal component scores can be estimated. Instead of trying to estimate the covariance functions directly, we first propose four nonparametric estimators to estimate di&#8629;erent aspects of the covariance structure, and then define the estimator for each covariance function as a ratio involving a subset of these four estimators. To predict the principal component scores, we develop a conditional likelihood estimation approach. Both procedures are novel compared to their counterparts in the existing literature (see Section 2 for details). Furthermore, we allow correlation among the random principal component scores over time. In contrast, most existing literature assumes the principal component scores to be independent. Although allowing this correlation creates complexity in studying the theoretical properties of the proposed estimators, this complexity is accommodated in our framework (see Section 4 for details).</p><p>Our proposed modeling framework is extremely flexible and can be readily extended to model event time data with even more complex structures. In our example data, each event is either a buy or sell transaction. Therefore, we model the observed event times from each account on a given day as a realization from a bivariate point process instead of a univariate point process. To understand the correlation between the buying and selling activities, we extend our proposed modeling framework to a bivariate log-Gaussian Cox process model (see Section 3 for details). In addition to the transaction types, the data contain information on the brokerage branch to which each account belongs. A brokerage branch is a geographic o ce at which an individual establishes a stock trading account, and its location is typically close to the account holder's home address. It may be interesting to study whether the trading activities from accounts in the same branch are similar, possibly due to branch-related factors such as local income levels, economic activity, news, social interaction, etc. To answer this question, we can introduce an additional Gaussian process when decomposing the log-latent intensity function to reflect the branch level e&#8629;ect. In Appendix C, we illustrate how our proposed modeling framework can be easily generalized to this new setting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Literature review</head><p>There has been a growing interest in using point process models to analyze event time data in the machine learning community (see <ref type="bibr">Zhao et al., 2015;</ref><ref type="bibr">Karimi et al., 2016;</ref><ref type="bibr">Farajtabar et al., 2016</ref><ref type="bibr">Farajtabar et al., , 2017;;</ref><ref type="bibr">Hosseini et al., 2017;</ref><ref type="bibr">Xiao et al., 2017;</ref><ref type="bibr">Xu et al., 2017;</ref><ref type="bibr">Upadhyay et al., 2018;</ref><ref type="bibr">Zarezade et al., 2018, etc.)</ref>. Most existing work is built upon parametric point process models such as the Hawkes process <ref type="bibr">(Hawkes and Oakes, 1974)</ref>. Several multivariate Hawkes process models have been proposed to jointly model sequences of observed events in continuous time (see <ref type="bibr">Farajtabar et al., 2016</ref><ref type="bibr">Farajtabar et al., , 2017;;</ref><ref type="bibr">Zarezade et al., 2018;</ref><ref type="bibr">Achab et al., 2018, etc.)</ref>. In principle, for our example data, the multivariate Hawkes process can be used to directly model potential interactions between trading activity among di&#8629;erent accounts. However, a direct application of the multivariate Hawkes process to our example data is di cult at least for two reasons. First, the multivariate Hawkes process depends on a set of background rate functions that characterizes the overall rate of events for di&#8629;erent accounts and days, which can be rather complex and heterogeneous, as suggested by Figure <ref type="figure">1</ref>. It can be challenging to estimate such a large number of background rate functions simultaneously. Second, it is unclear how one should define the triggering kernel across di&#8629;erent accounts and days. The triggering kernel is typically a function of the distance between a given time point and a past event time. Because the stock market does not operate on a continuous-time schedule, it is not straightforward to define the distance between two time points from di&#8629;erent days and possibly also from di&#8629;erent weeks.</p><p>From a di&#8629;erent perspective, our example data are replicated time series of point processes, in that the trading times of each account form a series of point processes and the di&#8629;erent accounts constitute the replicates. Such data are di&#8629;erent from that typically considered for Hawkes processes, and consequently would require a di&#8629;erent modeling approach. Nevertheless, for the stock trading data under consideration, one can still fit a univariate Hawkes process model to the combined temporal point process from each account, (see Section 2.2 for more details). Our simulation study and the real data analysis demonstrate that such a univariate Hawkes process may fail to capture important aspects of the observed multi-level temporal point process data.</p><p>In contrast to the multivariate Hawkes process, the proposed method aims at modeling the correlations between trading activities from di&#8629;erent accounts by identifying the contributing latent factors. For example, under our model (1), trading activities from di&#8629;erent accounts on the same transaction day are correlated due to a common underlying day e&#8629;ect. The latent intensities used in our proposed modeling framework are random functions. Our approach, therefore, connects the literature on log-Gaussian Cox processes and multi-level functional data analysis, where the term "multi-level" refers to the decomposition of the log-latent intensity functions at the account, day, and account-day levels. Even though modeling repeatedly observed random functions is well studied in the functional data analysis literature (e.g., <ref type="bibr">Yao et al., 2005;</ref><ref type="bibr">Hall et al., 2006;</ref><ref type="bibr">Di et al., 2009;</ref><ref type="bibr">Yu et al., 2009;</ref><ref type="bibr">Li et al., 2010</ref><ref type="bibr">Li et al., , 2013;;</ref><ref type="bibr">Chen and M&#252;ller, 2012;</ref><ref type="bibr">Chen et al., 2017)</ref>, existing techniques in classical functional data analysis do not apply to our setting because the latent intensity functions are not observed.</p><p>A few recent works consider functional data analysis for point processes (e.g., <ref type="bibr">Bouzas et al., 2006;</ref><ref type="bibr">Illian et al., 2006;</ref><ref type="bibr">Wu et al., 2013;</ref><ref type="bibr">Gervini, 2016;</ref><ref type="bibr">Xu et al., 2020)</ref>. The modeling techniques used in these work, which ranges from the functional principal component analysis of stochastic density processes <ref type="bibr">(Wu et al., 2013)</ref> to the independent component analysis of latent intensity functions <ref type="bibr">(Gervini, 2016)</ref>, require independent replicates of the event times. However, as seen from Figure <ref type="figure">1</ref>, the event times from the same account or the same day may be correlated due to shared common trading patterns at the account or day levels. It is unclear whether the existing approaches can be generalized to this more complex setting. In contrast, by using multi-level log-Gaussian Cox processes, we can decompose the log-latent intensity functions of the point processes to meaningfully account for di&#8629;erent sources of variations and to study these variations separately.</p><p>Lastly, we note that although there are many work on multivariate point processes (e.g., <ref type="bibr">Van Lieshout and Baddeley, 1999;</ref><ref type="bibr">Baddeley et al., 2000;</ref><ref type="bibr">Lemonnier and Vayatis, 2014;</ref><ref type="bibr">Jalilian et al., 2015;</ref><ref type="bibr">Bacry and Muzy, 2016;</ref><ref type="bibr">Farajtabar et al., 2016</ref><ref type="bibr">Farajtabar et al., , 2017;;</ref><ref type="bibr">Achab et al., 2018)</ref>, the work is sparse in functional settings with multiple levels. Our proposal, therefore, fills an existing gap in the literature on multivariate point processes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Organization of the paper</head><p>The remainder of the article is organized as follows. Section 2 introduces the proposed procedure for univariate log-Gaussian Cox processes, which includes the model formulation, estimation, and principal component score prediction. Section 3 extends the procedure to bivariate log-Gaussian Cox processes. Section 4 studies the theoretical properties of the proposed estimators. Specifically, we allow autocorrelations within the random principal component scores in our theoretical framework. Section 5 demonstrates the e cacy of the proposed methods through a simulation study. Section 6 applies the proposed method to the stock trading dataset. All proofs are contained in the Supplementary Material. Throughout the rest of the article, we refer to our proposed approach as the multi-level functional principal component analysis (MFPCA).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">MFPCA for Univariate log-Gaussian Cox processes</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Model formulation</head><p>To illustrate our model formulation, we describe our model specifications in the context of the stock trading data introduced in Section 1. However, the proposed framework is general and can be used for other applications. One such example is datasets that collect time-stamps of user/consumer activities (e.g., liking or sharing a social media post, making a purchase, posting a review of a service product, etc.) for a large group of users over a certain period. Moreover, a post can be either an original post created by the user or a repost of another user's post, just like each trading event can be either a buying or selling event.</p><p>Our proposed method can be applied to investigate the sources of variations in the activity of social media users, and such analyses can provide useful insights into user/consumer activity patterns, which in turn can be used as features in downstream business strategy development such as advertisement placement and content recommendations.</p><p>Suppose that we observe daily trading times from n accounts during m days. The trading times observed from account i on day j can be written as N ij = {T ij,l , l = 1, . . . , # {N ij }}, where 0 &#63743; T ij,1 &lt; T ij,2 &lt; &#8226; &#8226; &#8226; &lt; T ij,#{N ij } &#63743; T for some given T &gt; 0, and # {N ij } is a nonnegative integer random variable, i = 1, . . . , n, j = 1, . . . , m. We assume that each N ij is generated from an inhomogeneous Poisson process conditional on a latent intensity function ij (t), t 2 [0, T ], where</p><p>(1)</p><p>In (1), 0 (t) is a deterministic baseline function representing the average daily trading pattern, and X i (t), Y j (t) and Z ij (t) are Gaussian processes characterizing deviations from the baseline for the ith account, the jth day, and the residual deviation, respectively. Model (1) assumes that stock trading activities are a&#8629;ected by both account and day related factors.</p><p>The latent process Z ij (t) reflects that di&#8629;erent individuals may adjust their trading behaviors di&#8629;erently in reaction to market conditions or news on a given day. We assume that at any given time t, X i (t), Y j (t) and Z ij (t) are independent normal random variables with mean 0 and bounded variances 2 X (t), 2 Y (t) and 2 Z (t). We further assume that X i (t) and X i 0 (t) are independent for i 6 = i 0 , and that Z ij and Z i 0 j 0 are independent for j 6 = j 0 . However, Y j (t) and Y j 0 (t) can be correlated even if j 6 = j 0 ; this enables us to account for potential temporal dependences over consecutive transaction days.</p><p>The motivation for model (1) can be better understood in the context of our example stock trading data. An investor's trading activity on a given day is jointly impacted by various factors which include personal characteristics (e.g., social-economic status), and the overall stock market condition that is a&#8629;ected by the day-related events such as macroeconomic conditions, market-wide news, government policy interventions, etc. An important goal of our modeling approach is to extract account-related contributions to trading activity while controlling for the impacts of day-related factors. Such extraction can help a researcher investigate potential di&#8629;erences in trader behavior. Using model (1), we can accomplish this goal by studying the di&#8629;erences among fitted X i (&#8226;)'s for i = 1, &#8226; &#8226; &#8226; , n. On the other hand, exploiting the variations at the day level could help a researcher understand how evolving macroeconomic conditions, government interventions, etc., a&#8629;ect an investor's trading behavior. This can be accomplished by studying how Y j (&#8226;)'s evolve over time as j increases from 1 to m. Section 6 contains more detail on these issues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Benchmark models</head><p>To the best of our knowledge, there has been no existing method that directly models multilevel temporal point process data as described in Section 2.1. For the stock trading data, one simple strategy is to create a single-level process for each account i by combining N ij , j = 1, &#8226; &#8226; &#8226; , m, sequentially, as illustrated in Figure <ref type="figure">2</ref>. As a result, the combined process, denoted as N c i , becomes a temporal point process defined on the extended time interval [0, mT ], and can be modeled using existing methods. The first model we consider is the Hawkes process. Denote N c i (B) as the number of events in B &#8674; R from account i, and define the corresponding intensity function as</p><p>where H i,t is the event time history up to time t for the account i, i = 1, &#8226; &#8226; &#8226; , n. The intensity function of a Hawkes process takes the form</p><p>where &#181; c i (&#8226;) &gt; 0 is the background intensity, and ! i (&#8226;) is the transfer function, for i = 1, &#8226; &#8226; &#8226; , n. The transfer functions are typically assumed to be nonnegative, suggesting that past events increase the chance of occurrence for future events, commonly known as the "self-exciting" property. For the stock trading data, it is reasonable to assume that the background intensities are the same for a given account at di&#8629;erent days, which can be approximated by cyclic cubic B-splines as follows:</p><p>where B k (&#8226;)'s are cyclic B-spline basis functions defined on [0, 1], and btc is the largest integer less than or equal to t. For the transfer function, we use the popular exponential triggering kernel</p><p>All parameters can be estimated using standard maximum likelihood estimation <ref type="bibr">(Rizoiu et al., 2017)</ref>.</p><p>The second benchmark model we consider is the Log-Gaussian Cox process with a parametric covariance function (P-LGCP). Specifically, we assume that for each N c i , there exists a latent Gaussian process X c i (t) with a mean 0, and an exponential covariance function</p><p>where the background intensity c i (&#8226;)'s are also approximated by cyclic B-spline basis functions as suggested in (3). All parameters in model ( <ref type="formula">4</ref>) can be obtained by the two-step estimation procedure proposed in <ref type="bibr">Waagepetersen and Guan (2009)</ref>.</p><p>A closer look at Figure <ref type="figure">2</ref> suggests some potential drawbacks of models ( <ref type="formula">2</ref>) and (4). For each N c i , the latent process X i (&#8226;) appears repeatedly for m days, and hence its impacts can be incorporated to the nonparametric periodic background intensity function (3) for both models (2) and (4). Therefore, variations at the account level can still be captured to a certain degree by modeling N c i 's separately using the benchmark models. However, such a strategy is not suitable for evaluating the impacts of Y j (&#8226;)'s. Specifically, for the Hawkes process model (2), the self-exciting property may not completely characterize the clustering patterns because the event occurrences are also triggered by day-related external factors Y j (&#8226;)'s, which impact all accounts at any given day j. The same issue exists with the P-LGCP model (4). In our simulation study and real data analysis, both benchmark models fail to capture variations arise from Y j (&#8226;)'s. Our simulation study further suggests that they also yield worse predictions of intensities ij (&#8226;)'s when the observed point process data are generated from the multi-level model (1). See Sections 5.2 and 6.4 for more details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Multi-level principal component analysis (MFPCA)</head><p>In this subsection, we describe the proposed estimation framework for model (1). Following the formula of the moment generating function of a Gaussian random variable, the marginal intensity function can be calculated as</p><p>where the expectation is taken jointly over X i (t), Y j (t) and</p><p>be the covariance function of X i (t). Suppose that R X (s, t) has the following spectral decomposition: <ref type="figure"/>and<ref type="figure">Y</ref> k 's and Z k 's are the corresponding orthogonal eigenfunctions. Using the Karhunen-Lo&#232;ve expansion <ref type="bibr">(Watanabe, 1965)</ref>, we can express the random functions X i (t), Y j (t) and Z ij (t), i = 1, . . . , n, j = 1, . . . , m, as</p><p>where &#8672; X ik , &#8672; Y jk , and &#8672; Z ijk are normal random variables with mean 0 and variances &#8984; X k , &#8984; Y k and &#8984; Z k . The expressions in (6) have infinite parameter space dimensionality and are infeasible for estimations in practice. Instead, a common approach is to approximate (6) by using the leading principal components, i.e.,</p><p>where p X , p Y and p Z are su ciently large to include dominant modes of variation, and the notation '&#8673;' indicates potential truncation error when using only finite p X , p Y and p Z . We write</p><p>. . , n, j = 1, . . . , m, and refer to them as the functional principal component scores.</p><p>In our model, we assume that the functional principal component scores at the account, day and account-day levels are independent of each other, i.e., &#8672; X i , &#8672; Y j and &#8672; Z ij are uncorrelated. We assume that &#8672; X ik and &#8672; X i 0 k 0 are independent if (i, k) 6 = (i 0 , k 0 ), that &#8672; Y jk and &#8672; Y j 0 k 0 are independent if k 6 = k 0 , and that &#8672; Z ijk and &#8672; Z i 0 j 0 k 0 are independent if (i, j, k) 6 = (i 0 , j 0 , k 0 ). However, &#8672; Y jk and &#8672; Y j 0 k can be correlated, i.e., the principal component scores at the day level can be autocorrelated. For example, &#8672; Y jk 's may follow an autoregressive correlation structure, i.e., &#8672; Y jk = P q Y l=1 &#10003; lk &#8672; Y (j l)k + " jk , for some positive integer q Y , where " jk 's are independent random errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Covariance estimation</head><p>To estimate the eigenvalues and eigenfunctions (&#8984;</p><p>Existing methods (see, e.g. <ref type="bibr">Di et al., 2009)</ref> cannot be used to estimate these functions since the intensity functions are unobserved (i.e., latent functions).</p><p>The marginal second-order intensity &#8674; ij,i 0 j 0 ,2 (s, t) = E[ ij (s) i 0 j 0 (t)] can be shown as</p><p>for i, i 0 = 1, . . . , n, j, j 0 = 1, . . . , m. To ease presentation, we first focus on the case where Y j (s) and Y j 0 (t) are independent if j 6 = j 0 . We point out that our method is valid even when this assumption does not hold. As a matter of fact, our theoretical investigation is carried out without this independence assumption. Under this assumption, it follows from (5) and the moment generating function of a Gaussian random variable that <ref type="bibr">s, t)</ref>] and D(s, t) &#8984; &#8674;(s)&#8674;(t). Note that A(s, t), B(s, t) and C(s, t) contain information about the correlation at the account-day, account and day levels, respectively. The covariance functions R X (s, t), R Y (s, t) and R Z (s, t) can be easily calculated once we obtain A(s, t), B(s, t), C(s, t) and D(s, t). Next, we develop estimators for these functions, which are later used to derive estimators for the covariance functions.</p><p>We first consider the estimation of &#8674; ij,i 0 j 0 ,2 (s, t). By Campbell's theorem (e.g., <ref type="bibr">M&#248;ller and Waagepetersen, 2003)</ref>, it holds for any measurable function</p><p>where the expectation is over the point processes N ij and N i 0 j 0 . Let K(&#8226;) be a kernel function and K h (t) = h 1 K(t/h). We can estimate &#8674; ij,i 0 j 0 ,2 (s, t) under the four di&#8629;erent scenarios in</p><p>dx is an edge correction term. To understand the construction of these estimators, consider b A h (s, t) as an example. By defining</p><p>If the bivariate function A(&#8226;, &#8226;) is smooth and h is su ciently small such that A(u, v) &#8673; A(s, t) for any (u, v) in the small neighborhood around (s, t) defined by the function </p><p>for s, t 2 [0, T ]. As suggested by Theorem 1 in Section 4.1, one may need to use di&#8629;erent bandwidths h x , h y and h z for estimating R X (s, t), R Y (s, t), R Z (s, t) when n, m are of di&#8629;erent magnitudes.</p><p>From the estimated covariance functions, the eigenvalues and eigenfunction can be estimated using methods proposed in <ref type="bibr">Yao et al. (2005)</ref></p><p>where b X k 's are subject to constraints</p><p>)'s can be estimated similarly. The validity of solving (11) for eigenvalues and eigenfunctions is ensured by the uniform convergence of the covariance function b R W,h (s, t) to R W (s, t) for W = X, Y, or Z under mild conditions, as given by Theorem 1 in Section 4.1. We refer the reader to the proof of Theorem 2 in <ref type="bibr">Yao et al. (2005)</ref> for more technical details.</p><p>Remark 1 The computational cost of the estimators given in (10) is of order O(nm), or more precisely, of the order of the total number of events in all N ij 's. We use b A h (s, t) as an example to illustrate this point. Note that</p><p>.</p><p>The computational cost to calculate each of the three single summations in the above is of the order of the number of events in N ij . Thus, the computational cost to obtain b A h (s, t) is of the order of the total number of events in all N ij 's, which is of order O(nm). Similarly, we can show that the same is true for the other three terms. Therefore, the overall computational complexity of (10) is of order O(nm), which is confirmed by our simulation studies in Section 5.1. See Figure <ref type="figure">5</ref> for more details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">Predict functional principal component scores</head><p>Having estimated the eigenvalues and eigenfunctions, we next predict the functional principal component scores. Markov Chain Monte Carlo based approaches have been proposed for predicting principal component scores <ref type="bibr">(Di et al., 2009)</ref>. However, such approaches are not applicable in our setting because our model cannot be formulated as a linear mixed model (see <ref type="bibr">Di et al., 2009)</ref>. In this section, we develop a new procedure based on conditional likelihood. We will focus on predicting the principal component scores at the account and day levels, i.e., &#8672; X i 's and &#8672; Y j 's. We give an approach to estimate &#8672; Z ij 's in Appendix B. To predict the functional principal component scores for X i (t), define</p><p>In the stock trading data, N X i and N X i are the aggregated trading times over m days for account i and for all accounts excluding account i, respectively. Conditional on the latent function X i (&#8226;), the marginal intensity functions of N X i and</p><p>where ij (t) is as defined in (1), the first expectation is taken over Y j (t)'s and Z ij (t)'s, and the second expectation is taken further over X i 0 (t)'s with i 0 6 = i. Using the approximation of X i (t) in (7) with estimated eigenfunctions b X k (t)'s and conditional on the presence of an event at t, the probability for this event to be from account i is</p><p>where b X k (t)'s are obtained from solving (11) and b 2 X (t) = RX,h (t, t) is obtained from (10). We then predict &#8672; X i by maximizing the following conditional likelihood:</p><p>Similarly, the functional principal component scores for Y j (t) can be obtained by maximizing the conditional likelihood</p><p>where N Y j and N Y j are the aggregated trading times over n accounts on day j and for all days excluding day j, respectively, and</p><p>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.6">Goodness-of-fit measure</head><p>In this section, we propose a goodness-of-fit measure for model (1). To demonstrate the main idea, we consider the aggregated trading times over m days for di&#8629;erent accounts, denoted as N X 1 , &#8226; &#8226; &#8226; , N X n , as defined in Section 2.5. Let 0 = r 0 &lt; r 1 &lt; &#8226; &#8226; &#8226; &lt; r L = T be a sequence of time points that form a partition of the interval [0, T ], and denote by #{N X i }(r l 1 , r l ) the total number of events from N X i that fall between the sub-interval (r l 1 , r l ], l = 1, &#8226; &#8226; &#8226; , L. Focus on the sub-time interval (r l 1 , r l ], the proportion of observed events in [ n i=1 N X i (r l 1 , r l ) that are from the ith account, i.e. from N X i (r l 1 , r l ), can be estimated as</p><p>When the number of days m is large, pi (r l 1 , r l ) under model ( <ref type="formula">1</ref>) is a good estimator for</p><p>where &#8674;X i [t|X i (&#8226;)]'s are as defined in (12). Note that &#8674;X</p><p>&#8676; due to (5) and ( <ref type="formula">12</ref>), i = 1, . . . , n, where &#8674;(t) is defined in (5). Using the estimated model (1) with a bandwidth h, we can then approximate the p i (r l 1 , r l )'s as</p><p>Note that we have deliberately emphasized the dependence of pi (r l 1 , r l ; h) on the bandwidth h. The goodness-of-fit of model ( <ref type="formula">1</ref>) at the X level within the sub-time interval (r l 1 , r l ] can be measured by the discrepancy between the two discrete probability distributions b</p><p>We propose the following goodness-of-fit measure based on the Kullback-Leibler divergence</p><p>for l = 1, &#8226; &#8226; &#8226; , L. In practice, the function D X (r; h) is a step function on [0, T ] that can be used as a graphical tool to check whether model (1) with the bandwidth h fits the data adequately well at the X-level. A goodness-of-fit measure at the Y level, denoted as D Y (r; h), can be defined in the same way using the aggregated data over n accounts. Finally, our numerical studies suggest that the choice of partition 0 = r 0 &lt; r 1 &lt; &#8226; &#8226; &#8226; &lt; r L = T is of limited importance and we recommend using equal quantiles of the pooled event times</p><p>[ n i=1 N X i with L = 20. Such a choice ensures that there are 5% observed event times within each sub-interval (r l 1 , r l ) for l = 1, &#8226; &#8226; &#8226; , L.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.7">Bandwidth selection</head><p>Next, we describe a procedure to select the bandwidth in (9) based on the goodness-of-fit measure proposed in Section 2.6. As suggested by Theorem 1, the optimal bandwidths at X, Y, Z levels might be di&#8629;erent, and they depend on n and m. As such, our proposed method chooses di&#8629;erent bandwidths for R X (&#8226;, &#8226;), R Y (&#8226;, &#8226;) and R Z (&#8226;, &#8226;), respectively. We first consider the bandwidth selection for R X (&#8226;, &#8226;). Randomly partition the</p><p>The main idea is to use data in S k = [ k 0 6 =k S k 0 to fit model (1) and then measure its goodness-of-fit to the hold out data in S k using the approach proposed in Section 2.6. The cross-validation score can then be defined as</p><p>where b P X,S k l and PX,S k l,h are as defined in (18) using data included in S k and S k , receptively.</p><p>The optimal bandwidth &#293;x is then chosen by minimizing CV X (h). Following the same procedure, we can define the cross-validation score CV Y (h) using data aggregated over all accounts, N Y 1 , &#8226; &#8226; &#8226; , N Y m , as defined in Section 2.5 and choose the optimal &#293;y accordingly. The bandwidth selection for R Z (&#8226;, &#8226;) cannot be carried out in the same way since we can not predict &#8672; Z ij 's consistently. Alternatively, we simply choose the &#293;z for R Z (&#8226;, &#8226;) as the one that provides good fits at both X and Y levels by minimizing the added cross-validation score</p><p>Our simulation studies suggest that such a choice yields satisfactory performances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">MFPCA for Bivariate log-Gaussian Cox processes</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Model formulation</head><p>In the stock trading data, we observe two types of events from each account, i.e., buy or sell trades. Therefore, we can view the observed event times as realizations from bivariate point processes. By modeling the underlying bivariate point processes, we may gain insight into the temporal covariation of accounts' buying and selling activities.</p><p>Let N 1,ij and N 2,ij denote the buying and selling times observed from account i on day j, i = 1, . . . , n and j = 1, . . . , m. As in Section 2, we assume that N r,ij is generated from an inhomogeneous Poisson process conditional on a latent intensity function r,ij (t), where</p><p>Similar to the univariate case, r,0 (t) is a fixed baseline function representing the average daily pattern, X r,i (t) and Y r,j (t) are random functions reflecting deviations from the baseline at the account and day levels, respectively, and Z r,ij (t) is the residual deviation, r = 1, 2.</p><p>We assume that at any given t, X r,i (t), Y r,j (t) and Z r,ij (t) are independent Gaussian random variables with mean 0 and variances 2 r,X (t), 2 r,Y (t) and 2 r,Z (t), r = 1, 2. Similar to the development in the univariate case, we can write X r,i (t), Y r,j (t) and Z r,ij (t) using the Karhunen-Lo&#232;ve expansion as:</p><p>where p (r)</p><p>Y and p (r)</p><p>Z are su ciently large positive integers, &#8672; X r,ik , &#8672; Y r,jk and &#8672; Z r,ijk are normal random variables with mean 0 and variances &#8984; X r,k , &#8984; Y r,k and &#8984; Z r,k , and X r,k (t), Y r,k (t) and Z r,k (t) are orthogonal eigenfunctions. Define &#8672; X r,i = (&#8672; X r,i1 , . . . , &#8672; X r,ip</p><p>) T , r = 1, 2. At each r, we assume the same correlation structure within and between &#8672; X r,i , &#8672; Y r,j and &#8672; Z r,ij as in the univariate case. To introduce dependence between two point processes, we further assume that (&#8672;</p><p>) each follow a multivariate Gaussian distribution with covariance matrices &#8963; X , &#8963; Y and &#8963; Z , respectively. Under such assumptions, we can write</p><p>where</p><p>), W = X, Y, Z and r = 1, 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Parameter estimation</head><p>The eigenvalues and eigenfunctions (&#8984; X r,k , X r,k (t))'s, (&#8984; Y r,k , Y r,k (t))'s and (&#8984; Z r,k , Z r,k (t))'s can be estimated using the approach detailed in Section 2.4; the principal component scores can be predicted using the approach described in Section 2.5. The primary goal in this section is the estimation of the covariance matrices &#8963; X 12 , &#8963; Y 12 and &#8963; Z 12 .</p><p>We first describe cross covariances estimation at the X level. To that end, first define</p><p>This implies that</p><p>Because X 1,k (s) and X 2,k 0 (t) can both be estimated, an estimate of X 12,kk 0 can be obtained if Q X (s, t) can be estimated. This can be achieved by generalizing the approach used to estimate the covariance function R X (s, t) in Section 2.4. The cross covariances at the Y and the Z levels can be estimated similarly. The detailed formulas are in Appendix A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Asymptotic Properties</head><p>In this section, we investigate the theoretical properties of our proposed estimators when both the number of user accounts n and the number of days m increase. We show that the covariance and cross-covariance function estimators suggested in ( <ref type="formula">10</ref>) and ( <ref type="formula">26</ref>) converge uniformly in probability to their theoretical counterparts as n, m ! 1. Consequently, all eigenfunctions and eigenvalues at di&#8629;erent levels as well as correlations among functional principal component scores of buying and selling activities can be consistently estimated. We show that with su ciently large m and n, the functional principal component scores for each account, i.e., &#8672; X i and each day, i.e., &#8672; Y j , can be consistently estimated. Asymptotic convergence rates are derived for all consistent estimators. All proofs are collected in the Supplementary Material.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Consistency of covariance function estimators</head><p>We start our investigation on the asymptotic properties of the covariance function estimators in (10). Before we state our theorem, we first introduce some regularity conditions.</p><p>[C1] There exist constants c 0 &gt; 0 and</p><p>&#63743; C 0 for any s, t 2 [0, T ] and i, i 0 = 1, . . . , n; j, j 0 = 1, . . . , m.</p><p>[C2] There exists a constant</p><p>[C3] Assume that (a) h x ! 0, nh 4</p><p>x ! 1 and nh 6</p><p>x &lt; 1; (b) h y ! 0, mh 4 y ! 1 and mh 6 y &lt; 1; and (c) </p><p>Assume that there exists a constant C M such that, with a probability tending to 1,</p><p>for any s 1 , s 2 , t 1 , t 2 2 [0, T ], and W = X or Y .</p><p>Condition C1 imposes some constraints on the random daily intensity functions ij (&#8226;)'s. Specifically, condition C1(a) requires that, on average, there should be only a finite number of transactions for any user on a given day. Condition C1(b) is satisfied if the baseline intensity 0 (&#8226;) is bounded from above and below since we have assumed that sup s2[0,T ] 2 W (s) &lt; 1 for W = X, Y, and Z. From a practical point of view, this means that the average user trading frequency in a stock market should be finite and bounded away from 0 at any time of day. Condition C1(c) imposes some mild smoothness conditions on 0 (t) and the covariance functions R W (s, t), W = X, Y , and Z, so that they have bounded absolute partial derivatives. Condition C2 essentially assumes that there are only short term longitudinal correlations among trading intensities in the stock market. Conditions C3-C4 are technical conditions on the choice of bandwidth and kernel functions that are analogous to those in, for example, <ref type="bibr">Yao et al. (2005)</ref>. Condition C5 imposes some mild restrictions on the smoothness of the sample paths of the latent stochastic processes X i (&#8226;)'s and Y j (&#8226;)'s, and they closely resembles the commonly employed stochastic equicontinuity assumption in the literature <ref type="bibr">(Newey, 1991)</ref> The following theorem states that with the above conditions, the covariance estimators in (10) are uniformly convergent in probability to the true covariance functions.</p><p>Theorem 1 Under conditions C1-C5, we have that as n, m ! 1,</p><p>The proof is given in the Supplementary Material. Theorem 1 states that the convergence rates of b R X,hx (s, t) and b R Y,hy (s, t) are determined by the number of replicates at the X level (i.e., n) and the Y level (i.e., m), respectively. However, the convergence rate of b R Z,hz (s, t) is determined by n and m jointly. These observations are confirmed by our simulation studies in Section 5.1.</p><p>The proof of Theorem 1 utilizes similar technical devices used in <ref type="bibr">Yao et al. (2005)</ref>, but several new challenges need to be addressed. Firstly, unlike <ref type="bibr">Yao et al. (2005)</ref>, the intensity functions are latent functions that are not directly observable. Therefore, we need to study properties of a series of carefully designed random sums given in (9), instead of simply smoothing observed functional data. As a result, new conditions on the observed point processes need to be imposed as suggested in condition C1. Secondly, Theorem 1 investigates multi-level latent functions while <ref type="bibr">Yao et al. (2005)</ref> only studies single-level functional data. To the best of our knowledge, there are limited rigorous theoretical investigations on the asymptotic properties of multi-level functional data analysis despite its popularity. Therefore, our work makes a useful contribution to the literature in this respect. Finally, the functional data studied in <ref type="bibr">Yao et al. (2005)</ref> are independent random trajectories while the observed point processes in our work are not. For example, all N ij 's for di&#8629;erent i's are dependent due to the common component Y j in their latent intensities defined in (1). In addition, we also allow dependence among the latent processes Y j (t)'s in Theorem 1 as specified in condition C2. The involvement of dependence further complicates our analysis.</p><p>The following corollary is a direct consequence of Theorem 1.</p><p>Corollary 1 Under conditions C1-C5, we have that, as n, m ! 1,</p><p>The proof of Corollary 1 follows readily from the proof of Theorem 2 in <ref type="bibr">Yao et al. (2005)</ref> and is thus omitted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Prediction accuracies of functional principal component scores</head><p>Theorem 1 and Corollary 1 ensure the accuracy of the estimated covariance functions and their corresponding eigenfunctions. It is therefore reasonable to expect the conditional likelihoods ( <ref type="formula">13</ref>) and ( <ref type="formula">14</ref>) to yield consistent estimators &#8672; X i and &#8672; Y j , which characterize the trading pattern at the account and day levels, respectively. Denote k &#8226; k as the Euclidean norm. Our theoretical investigation relies on the following conditions.</p><p>[S1] E</p><p>[S5] The number of functional principal components, p W , satisfies (a) (n 1 +m 1 )log(m+</p><p>Conditions S1-S5 are technical conditions on the latent Gaussian processes. Condition S1 imposes mild moment conditions on the latent Gaussian processes. Condition S2 is trivially true if all eigenfunctions in X (&#8226;) and Y (&#8226;) are uniformly bounded over the domain [0, 1]. Condition S3 ensures that the Hessian matrix of the conditional likelihoods ( <ref type="formula">13</ref>) and ( <ref type="formula">14</ref>) are strictly positive definite with a probability approaching one. Condition S4 is only needed for Y j (&#8226;) so that there are only a "small" number of Y j 0 (&#8226;) strongly correlated with Y j (&#8226;), analogous to condition C2. Condition S5 requires that the decaying rates of eigenvalues of covariance functions should be su ciently fast as p W grows and that p W should not be too large compared to sample sizes n, m. These additional conditions are needed as latent random functions are not observed in our framework. Moreover, all components in &#8672; X i or &#8672; Y j need to be estimated simultaneously. This is di&#8629;erent from classical functional data analysis where the principal component scores can be estimated separately, see, e.g., <ref type="bibr">Yao et al. (2005)</ref>; <ref type="bibr">Chen and M&#252;ller (2012)</ref>.</p><p>Theorem 2 Let b &#8672; X i and b &#8672; Y j be maximizers of the conditional likelihoods (13) and (14), respectively. Under conditions C1-C5 and S1-S5, and assuming that all eigenvalues of covariance functions R X (s, t) and R Y (s, t) are of multiplicity 1, we have that as p X , p Y , n, m ! 1,</p><p>The proof is given in the Supplementary Material.</p><p>Theorem 2 suggests that the convergence rates of b &#8672; X i and b &#8672; Y j are controlled by (i) the estimation errors of the covariance functions and their eigenfunctions; and (ii) the truncation errors of the Karhunen-Lo&#232;ve expansion in (7), which are reflected in terms e p X and e p Y . Note that the lengths of the vectors &#8672; X i and &#8672; Y i increase as p X and p Y grow, and hence the overall convergence rate in Euclidean norm k &#8226; k will be slower by a factor of p</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Consistency of cross-covariance function estimators</head><p>To study the convergence rate of the cross-covariance function estimators given in ( <ref type="formula">26</ref>), we need some additional assumptions on the marginal cross second-order intensity functions and the cross covariances among the day level latent processes.</p><p>[C1*] There exist constants c 0 &gt; and</p><p>Assume that there exist a constant C M such that, with a probability tending to 1,</p><p>Theorem 3 Under conditions C1*-C3* and C3-C4, we have that, as n, m ! 1,</p><p>The proof of Theorem 3 is almost identical to that of Theorem 1 and thus is omitted. The only di&#8629;erence is that one has to impose three additional assumptions C1 &#8676; -C3 &#8676; to regulate the strengths of dependence between the two types of point processes and smoothness of latent processes.</p><p>The following corollary is an immediate consequence of Theorem 3.</p><p>Corollary 2 Under conditions C1*-C3* and C3-C4, if we further assume that all eigenvalues of covariance functions R r,W (s, t), W = X, Y, or Z, are of multiplicity 1, we have that as</p><p>where the convergence is entry-wise for the matrix.</p><p>The proof is a straightforward application of the Slutsky's Theorem combining the results of Corollary 1 and Theorem 3. We thus omit the proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Simulation</head><p>In this section, we demonstrate the e&#8629;ectiveness of the proposed approach through simulation studies. For the univariate log-Gaussian process, the data are simulated from the univariate point processes model presented in Section 2.1 using the following intensity model:</p><p>and simulate principal component scores &#8672; X ik , &#8672; Y jk and &#8672; Z ijk as independent normal random variables with means 0 and variances equal to the eigenvalues</p><p>sin(4&#8673;t)}, which are used to mimic those in our real data. Throughout this section, the Epanechnikov kernel function is used in all simulation studies and summary statistics based on B = 500 simulation runs are calculated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Estimation of the univariate two-level model</head><p>We first evaluate the accuracies of the proposed MFPCA procedure in estimating the eigenvalues and eigenfunctions for univariate point processes. In this subsection, we fix</p><p>2 and allow n, m to vary. The bandwidths &#293;x , &#293;y , and &#293;z are selected by the data-driven procedure proposed in Section 2.7.</p><p>Figure <ref type="figure">3</ref> shows the estimation accuracies of the eigenfunctions based the mean absolute deviation (MAD) defined as</p><p>where b W k (t) (b) 's are the estimated eigenfunctions in the bth simulation. Figure <ref type="figure">3</ref> (a) and Figure <ref type="figure">3 (d)</ref> show that the estimation accuracies of X k (t)'s improve as n increases, but not necessarily so as m increases. Figure <ref type="figure">3</ref>   Next, Figure <ref type="figure">4</ref> illustrates the estimation accuracies of the eigenvalues using the relative root mean squared errors (RMSE) defined as</p><p>The results are similar to our observations in Figure <ref type="figure">3</ref> and thus we omit further discussion.  <ref type="formula">13</ref>)-( <ref type="formula">14</ref>) and the true ones under various settings. It is seen that at both the X and Y levels, the correlation coe cients are generally rather high in all case scenarios, suggesting excellent prediction performances. We also observe that compared to the first PC scores, the second PC scores are generally more di cult to predict. Finally, Figure <ref type="figure">5</ref> (c) and (f) illustrate the averaged CPU time (in seconds) for estimating the covariance functions R W (s, t), W = X, Y, Z, with  a fixed bandwidth. All simulation runs were carried out in the software R on a cluster of 100 Linux machines with a total of 100 CPU cores, with each core running at approximately 2 GFLOPS. As discussed in the Remark 1 in Section 2.4, the computational costs of the proposed estimation method indeed grows linearly with m and n and is of the order O(nm).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Comparisons with benchmark models</head><p>In this subsection, we investigate the benefits of using the two-level model compared to benchmark models suggested in Section 2.2, namely, the Hawkes process model and the parametric Log-Gaussian Cox process model (P-LGCP). In addition, we consider a singlelevel model assuming that the point process N ij has the latent intensity ij (t) of the form</p><p>We simulate the data from model ( <ref type="formula">23</ref>) with a fixed</p><p>Under this setting, the two-level model reduces to the single-level model when q = 0, and the di&#8629;erence between these two models becomes larger as q grows. All other settings remain the same as those in Section 5.1. To fit the single-level model, we apply the method proposed in <ref type="bibr">Wu et al. (2013)</ref> to the aggregated point processes</p><p>, to predict the random intensity function in <ref type="bibr">(24)</ref>. For comparisons between various fitted models, we consider three metrics:  (1) the overall mean absolute deviation: OMAD =</p><p>(2) the overall goodness-of-fit at the X level: Fit X = R 1 0 D X (r; h)dr with D X (r; h) defined in (18);</p><p>(3) the overall goodness-of-fit at the Y level: Fit Y that is similarly defined as Fit X .</p><p>For the Hawkes and the P-LGCP model, the estimated background intensity &#956;i (&#8226;)'s as suggested in (3) are used as &#710; ij (&#8226;)'s in the definition of OMAD. The summary statistics based on B = 500 simulation runs are summarized in Figure <ref type="figure">6</ref>. When computing Fit X and Fit Y for the Hawkes and the P-LGCP model, the model-based probabilities defined in ( <ref type="formula">16</ref>) do not have closed-forms as given in (17). Therefore, for each simulation run, we numerically approximate the model-based probabilities by the average of the empirical probabilities (15) computed using 100 replicate datasets simulated from the fitted models. Figure <ref type="figure">6</ref> (a) shows that as q increases, the estimation accuracies of ij 's using the single-level model as well as the two benchmark models become much worse than the two-level model. However, the single-level model can indeed still capture the variations at the X level as indicated by Figure <ref type="figure">6</ref> (b), where the averaged Fit X for both single-level and two-level models are well within the 95% quantile bands of Fit X (Truth).</p><p>In contrast, while the Hawkes and the P-LGCP model can capture the general trend of the variations at the X level, their performances are significantly worse than those of the other two models. One possible explanation is that the parametric assumptions imposed on the benchmark models may not be suitable for the underlying data, as we have discussed in Section 2.2. Since the single-level model ( <ref type="formula">24</ref>) completely ignores the variations at the Y level, it essentially assumes pj (r l 1 , r l ; h y ) = 1/m for j = 1, &#8226; &#8226; &#8226; , m in the definition of Fit Y , which provides rather poor goodness-of-fit at the Y level as q grows. The Fit Y 's computed using simulated data from the fitted Hawkes and P-LGCP models have almost the same trend as q increases in Figure <ref type="figure">6</ref>, suggesting that both models fail to capture variations at the Y level. In summary, when data are generated from the two-level model ( <ref type="formula">23</ref>) but fitted with a single-level model ( <ref type="formula">24</ref>) or benchmark models given in Section 2.2, variations at the X level can be captured to a certain degree while variations at the Y level may be poorly characterized. This leads to rater inaccurate estimates of the underlying latent intensity functions ij (t)'s. ) is computed by replacing pi (r l 1 , r l ; h) in ( <ref type="formula">18</ref>) with the theoretical probability (16).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Estimation of the bivariate log-Gaussian Cox process</head><p>Next, we evaluate the estimation performance for the bivariate point process model presented in Section 3.1. The same eigenvalues and eigenfunctions are used as in the univariate model ( <ref type="formula">23</ref>). The principal scores are generated from the covariance structure in ( <ref type="formula">22</ref>), with </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Application to Stock Trading Data</head><p>The example data to which we apply our methodology contains stock market buy and sell transactions from 331,599 investor accounts at a leading brokerage house in mainland China during the period covering January 4th, 2007 to September 30th, 2009. These accounts are from 98 branches distributed over 25 provinces in the central and eastern coastal areas of mainland China.</p><p>As background on the Chinese stock market, during 2006-2017 its market capitalization grew more than five-fold to become the second-largest in the world (Carpenter and Whitelaw, 2017). Among the world's ten largest stock exchanges, two are operated in mainland China (the Shanghai Stock Exchange and the Shenzhen Stock Exchange)<ref type="foot">foot_0</ref> . The Chinese market began to boom in November 2006 <ref type="bibr">(Andrade et al., 2013)</ref>, just before the start of our sample period. The Shanghai Stock Exchange (SSE) Composite Index doubled from around 2,000 points to more than 4,000 points in less than six months. Many observers characterized the market as experiencing a bubble, and in an attempt to cool down the market, on May 30th, 2007, the Chinese government tripled the transaction tax on trades from 0.1% to 0.3%.<ref type="foot">foot_1</ref> Although the market experienced several major sell-o&#8629;s following the tax increase, the SSE market index recovered and kept increasing until reaching a historic high of 6,124 points on October 16th, 2007. Afterward, the market index plunged and in April 2008 the transaction tax was reduced to its prior level of 0.1%. The market continued to decline, with the SSE index in mid-September 2008 down approximately 70% from its October 2007 high. To stimulate buying, on September 18th, 2008, the Chinese Government announced the abolition of the tax on buying (maintaining it only for sales) among other measures to bolster the market. The market gradually recovered and the SSE composite index closed at 2,779 points on September 30th, 2009.</p><p>The period we study contains 672 trading days, and on average, each account in our dataset has 150 buy or sell transactions during this period. Each investor account has a randomly-assigned identification number and there is insu cient information for us to discern actual investor identity. There is a large variation in the trading frequencies across accounts where the total number of transactions ranges from 0 to 874,263. In our analysis, we have removed accounts with fewer than 30 or more than 2,000 total transactions, which results in retaining approximately 47% of the accounts in the broader dataset, or equivalently, a total of 157, 203 accounts. We do not consider accounts with fewer than 30 transactions because it would be di cult to estimate the account-level e&#8629;ect for such accounts. Accounts with more than 2,000 trades are likely institutions that submit trades on behalf of other investors and account for only around 0.2% of the total data. Our results are not materially a&#8629;ected by changing the minimum cuto&#8629; to 10 or by changing the maximum cuto&#8629; to 1,000.</p><p>We next discuss the application of our proposed model to the stock trading data, interpreting what is learned where appropriate. In subsection 6.5, we discuss potential further applications for interested finance researchers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Eigenfunctions</head><p>We first apply our proposed MFPCA approach for univariate point processes to study the variations of an investor's buying and selling activities separately. The intensity functions of the processes underlying the buying and selling activity are modeled using the two-level models (20) with p X = p Y = p Z = 5 and bandwidths for the Epanechnikov kernel at di&#8629;erent levels selected by the data-driven method proposed in Section 2.7. Figure <ref type="figure">7</ref> shows the first three eigenfunctions at the account (X), day (Y ), and account-day (Z) levels for both the buying and selling processes. The first three eigenfunctions at the X, Y and Z levels explain about 96.30%, 93.44%, and 67.37% of the variation at the respective levels for the buying process, and 96.62%, 93.25%, 67.44% of the variation for the selling process.</p><p>In Figure <ref type="figure">7</ref>, we can see that the eigenfunctions for the buying and selling processes are very similar. In what follows, we therefore focus on interpreting the eigenfunctions for the buying process alone. Note that the first eigenfunctions of all three levels are flat. This suggests that the overall buying frequency is the most important factor in explaining the variability at each level. At the account level, the second eigenfunction decreases throughout the day except for a small jump at the beginning of the third trading hour. This characterizes trading accounts that display buying activity that steadily increases (or decreases) throughout the day. The third eigenfunction at the account level is approximately flat at zero until an abrupt jump around mid-day; the eigenfunction then decreases from mid-day to the end of the day. This characterizes trading accounts that are more active in the afternoon and whose afternoon trading activities follow an increasing or decreasing pattern. At the day level, the second eigenfunction shows a roughly decreasing trend in the morning. This characterizes days that have increasing or decreasing morning trading activity.</p><p>At the account-day level, the second eigenfunction is relatively steady in the morning and afternoon with a sharp drop around noon. This characterizes contrasting morning versus afternoon trading behaviors. The third eigenfunctions at both the day and the account-day levels rise until mid-day and then decrease. This characterizes trading behavior that is concentrated around mid-day.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Principal component scores</head><p>Figure <ref type="figure">8</ref> shows scatter plots of the principal component scores in the first three directions for the buying process at the account level. We can see from these plots that compared to accounts with lower scores in the first direction, those with higher scores tend to have a smaller variation in the second and third directions. Note that a larger principal component score in the first direction indicates higher overall buying frequency. Thus, accounts with higher buying frequency tend to buy more evenly throughout the day than with the specific patterns as described by the second and third directions. Richards and Willows (2018) find that only a small proportion of investors with certain common characteristics trade frequently, which may partially explain the homogeneity of their intra-day trading patterns we observe. However, there has been little research in the finance literature that studies cross-sectional di&#8629;erences in time-of-day trading preferences between investors with higher trading frequencies and those with lower trading frequencies, as suggested by Figure <ref type="figure">8</ref>. We comment on this finding later below.</p><p>Figure <ref type="figure">9</ref> shows principal component scores in the first two directions for the buying process at the day level. Because the first two eigenfunctions characterize the overall trading frequency and morning trading activities at the day level (see Figure <ref type="figure">7</ref>), a larger score in the first direction indicates higher trading frequency, whereas a larger positive (negative) score in the second direction indicates increasing (decreasing) morning trading activity during that day. From the first plot, we can see that the trading frequency increases from the start of our observation window until May 30th, 2007, when the transaction tax is tripled. At that point trading frequency begins to decline, showing that the tax increase was successful in dampening trading activity as the government desired. This finding is also consistent with the findings in transaction tax research <ref type="bibr">(Umlauf, 1993;</ref><ref type="bibr">Mannaro et al., 2008)</ref>.</p><p>Despite noticeable variability, trading frequency continues to decline until September 19th, 2008, the first trading day after the buy-side transaction tax is eliminated. Thereafter, again consistent with the implications of prior research on transaction taxes, trading frequency begins a new trend of increasing intensity. In the second plot, we observe a period of overall decreasing morning trading activity following the tripling of the transaction tax on May 30th, 2007. The largest scores are achieved around September 18th, 2008 when the buy-side transaction tax is abolished, coupled with low scores in the first direction. This implies low trading activity overall that is concentrated early in the trading day. On September 19th, almost all the stocks reach the maximum allowable 10% daily increase shortly after the opening of the market, and there were very few trading activities after that.<ref type="foot">foot_2</ref> This is consistent with the values of the scores in the first two directions. The scores in the second direction are generally small in the other periods.</p><p>Figure 8: Scatter plots of buying scores from di&#8629;erent directions at the account level based on 10000 randomly selected accounts. From left to right: scores of directions 1 vs 2, 1 vs 3 and 2 vs 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Correlations between buying and selling activities</head><p>To study the temporal covariation of an investor's buying and selling activity, we apply our proposed MFPCA approach for bivariate point processes. Table <ref type="table">2</ref> shows the correlation matrices of scores from the buying and selling processes together with the associated standard error estimates obtained by bootstrapping the predicted scores at the account and day levels. The bootstrap procedure is admittedly an ad hoc approach but can nevertheless provide some measure of uncertainty for these estimates. It is not used for the account-day level, however, because of the di culty in estimating the scores reliably at that level (some accounts have too few transactions). Based on the results, we observe strong positive correlations between scores in the same direction and at the same level, i.e., there are large diagonal elements in each of the three correlation matrices. These correlations can be interpreted easily given the interpretations of the associated eigenfunctions. For example, the positive correlation between the first buying and selling directions at the account level (0.9643) suggests that accounts with higher buying frequency also have higher selling frequency. This high correlation indicates that most trades are security selection rather than asset-class-allocation trades, i.e., investors are typically rebalancing an existing stock portfolio rather than increasing or decreasing its overall size. Because it is very unlikely that the typical Chinese individual investor (as opposed to institutional investor) has access to superior information about the stocks they trade, this finding is consistent with the finance literature's documentation that individual investors, to their detriment in terms of performance and trading costs, are overconfident and trade too frequently <ref type="bibr">(Odean, 1999;</ref><ref type="bibr">Barber and</ref><ref type="bibr">Odean, 2000, 2001;</ref><ref type="bibr">Barber et al., 2008)</ref>. The o&#8629;-diagonal elements in the correlation matrix can also o&#8629;er insights into the correlation between buying and selling activity. For example, at the account level, the negative correlation between the second buying and the third selling directions (-0.1418) implies that accounts with decreasing buying frequencies in a transaction day are also more likely to sell earlier in the day than later. For the day level, the positive correlation between the second buying and the third selling directions (0.3285) implies that transaction days with decreasing buying activity in the morning tend to have higher selling activity around the middle of trading day.</p><p>&#8963; X &#8963; Y &#8963; Z 0.9643 -0.0005 0.-0.0244 0.7873 0.1053 -0.0167 0.8110 0.1913 -0.0287 (0.1062) (0.0128) (0.0182) (0.0134) (0.0929) (0.0767) 0.0108 0.3982 -0.1418 0.0884 0.4585 0.3285 -0.1636 0.5227 -0.2671 (0.0127) (0.1157) (0.0140) (0.0835) (0.0896) (0.1152) -0.0180 -0.0727 0.7690 -0.0784 -0.293 0.7600 -0.0421 0.2415 0.5936 (0.0130) (0.0335) (0.0560) (0.0425) (0.0928) (0.0234) Table <ref type="table">2</ref>: Correlation matrices between scores (&#8672; B , &#8672; S ) from the buying and selling processes at each level. Each matrix includes the first three directions. Bootstrapped standard errors are shown in the parentheses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4">Goodness of fit</head><p>Finally, we check the goodness-of-fit of the two-level model (1) to the stock trading data.</p><p>The results for the buying and selling processes are rather similar, so we only present some diagnostic plots for the buying process. Figure <ref type="figure">10</ref> (a) and (c) illustrate the goodness-of-fit measures computed on a random subset of 10, 000 accounts applying the definition (18) at the account (X) and day (Y ) levels with di&#8629;erent numbers of principal components. For both account and day levels, we observe that the model estimated using only one principal component is not able to capture all of the variation across di&#8629;erent accounts or days. As the number of principal components increases to 5, however, the model fits become significantly better for both levels.</p><p>As a comparison, we compute the same goodness-of-fit measures for the Hawkes and the P-LGCP model suggested in Section 2.2, with the model-based probabilities numerically evaluated by the average of the empirical probabilities (15) based on 500 replicate datasets simulated from the fitted models. In Figure <ref type="figure">10</ref>(a), we can see that at the account level, the Hawkes process model performs much worse than all LGCP models. The P-LGCP model performs similar to the MFPCA model with p X = 3 principal components, although its goodness-of-fit at the boundaries is relatively worse. Overall, the MFPCA model with 5 principal components shows uniformly superior performance and achieves rather small KLdivergence, indicating that the model captures account level variation su ciently well. At the day level, Figure <ref type="figure">10</ref>(d) suggests that both benchmark models perform similarly, yielding model fits that are significantly worse than those of the MFPCA model. This is expected due to the reasons discussed in Section 2.2. The KL-divergence values for the MFPCA model with 5 principal components are also rather small, especially when compared to those at the account level, which indicates an adequate fit at the day level. have found about trading behavior. There are, however, unresolved questions in the finance literature that could benefit from the methodology we propose.</p><p>There is a large literature in behavioral finance and social sciences concerning interpretations of trading behaviors. One topic of prior research is how trading activity di&#8629;ers due to investor characteristics such as age, gender, or geographic location (e.g., <ref type="bibr">Odean, 1999;</ref><ref type="bibr">Barber and Odean, 2001;</ref><ref type="bibr">Samanez-Larkin et al., 2010;</ref><ref type="bibr">Henninger et al., 2010, etc.</ref>). Another is how information that induces trading disseminates across a large population (e.g., <ref type="bibr">Feng and Seasholes, 2004)</ref>. However, most existing work is based on aggregated trading activity (e.g., the total number of trades in a day), and thus fails to gain potentially important insights from intraday trading patterns. Furthermore, when studying the impacts of account-related characteristics on trading behavior, trading statistics based on aggregated data across time may fail to account for the confounding e&#8629;ects of day-level factors. As data that include intraday trading information become more available, the proposed MF-PCA framework provides a powerful tool with which to decompose the variation of such trading data in meaningful ways and to summarize high-frequency trading activity using multiple principal component scores. The PC scores provide a multi-dimensional view of intraday trading activity, which in turn allows researchers to more thoroughly investigate and describe the factors that determine trading decisions.</p><p>A second line research our proposed method can benefit is that which attempts to exploit abnormal patterns in trading activity to devise successful trading strategies (e.g., <ref type="bibr">Andrade et al., 2008)</ref>. Standard outlier detection tools can be applied to the accountlevel principal component scores our methodology incorporates to identify investors with abnormal intraday trading activity. This may lead to improvements in trading models by identifying more predictive trading signals. At the day level, it may also be interesting to monitor the daily trading status of the stock market to predict the formation of asset price bubbles (e.g., <ref type="bibr">Andrade et al., 2013)</ref>, which may better enable policymakers to implement countermeasures. Such monitoring could be accomplished by applying change point <ref type="bibr">detection tools (e.g., Polunchenko and Tartakovsky, 2012)</ref> to the multivariate time series of principal component scores at the day level to detect abnormal changes in the stock market, which might indicate bubble formation in its early stages.</p><p>In view of ( <ref type="formula">28</ref> </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>See the World Federation of Exchanges (https://www.world-exchanges.org).</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>See Andrade et al. for evidence supporting a stock market bubble during this time-period, as well as further discussion of the tripling in the transaction tax.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>During the time period we study, the Chinese markets impose a maximum 10% increase or decrease in a stock's price. Trading in the stock closes for the rest of the day if that limit is reached, which does happen on occasion.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_3"><p>2007-01-01 2007-08-29 2008-04-25 2008-12-21 2009-08-18</p></note>
		</body>
		</text>
</TEI>
