<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Point process modeling of drug overdoses with heterogeneous and missing data</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>03/01/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10276842</idno>
					<idno type="doi">10.1214/20-AOAS1384</idno>
					<title level='j'>The Annals of Applied Statistics</title>
<idno>1932-6157</idno>
<biblScope unit="volume">15</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Xueying Liu</author><author>Jeremy Carter</author><author>Brad Ray</author><author>George Mohler</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Opioid overdose rates have increased in the United States over the past decade and reflect a major public health crisis. Modeling and prediction of drug and opioid hotspots, where a high percentage of events fall in a small percentage of space-time, could help better focus limited social and health services. In this work we present a spatialtemporal point process model for drug overdose clustering. The data input into the model comes from two heterogeneous sources: 1) high volume emergency medical calls for service (EMS) records containing location and time, but no information on the type of non-fatal overdose and 2) fatal overdose toxicology reports from the coroner containing location and high-dimensional information from the toxicology screen on the drugs present at the time of death. We first use non-negative matrix factorization to cluster toxicology reports into drug overdose categories and we then develop an EM algorithm for integrating the two heterogeneous data sets, where the mark corresponding to overdose category is inferred for the EMS data and the high volume EMS data is used to more accurately predict drug overdose death hotspots. We apply the algorithm to drug overdose data from Indianapolis, showing that the point process defined on the integrated data out-performs point processes that use only coroner data (AUC improvement .81 to .85). We also investigate the extent to which overdoses are contagious, as a function of the type of overdose, while controlling for exogenous fluctuations in the background rate that might also contribute to clustering. We find that drug and opioid overdose deaths exhibit significant excitation, with branching ratio ranging from .72 to .98.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>1. Introduction. Over 500,000 drug overdose deaths have occurred in the United States since 2000 and over 70,000 of these deaths occurred in 2017 <ref type="bibr">[29]</ref>. Opioids are a leading cause in these deaths and these trends are characterized by three distinct time periods <ref type="bibr">[3]</ref>. In the 1990s overdose deaths were driven by prescription opioid-related deaths <ref type="bibr">[4]</ref>, whereas reduced availability of prescriptions led to an increase of heroin-related deaths beginning in the 2010s <ref type="bibr">[4,</ref><ref type="bibr">27,</ref><ref type="bibr">31]</ref>. Illicit fentanyl, a synthetic opioid 50 to 100 times more potent than morphine <ref type="bibr">[6]</ref>, has become a major cause of opioid-related deaths since around 2013. It is estimated that in 2016 around half of opioidrelated deaths contained fentanyl <ref type="bibr">[11]</ref>, and fentanyl mixed into heroin and cocaine is likely contributing to many of these overdose deaths <ref type="bibr">[12,</ref><ref type="bibr">16]</ref>.</p><p>Criminology and public health disciplines have leveraged spatio-temporal event modeling in attempts to predict social harm for e&#8629;ective interventions <ref type="bibr">[20,</ref><ref type="bibr">33,</ref><ref type="bibr">36]</ref>. Fifty percent of crime has been shown to concentrate within just 5 percent of an urban geography <ref type="bibr">[35]</ref>. Geographic concentrations of drugrelated emergency medical calls for service <ref type="bibr">[8]</ref>, drug activity <ref type="bibr">[9]</ref>, and opioid overdose deaths mirror those of crime <ref type="bibr">[2]</ref>. In particular, over half of opioid overdose deaths in Indianapolis occur in less than 5% of the city <ref type="bibr">[2]</ref>.</p><p>Patterns of repeat and near-repeat crime in space and time further suggest that not only does crime concentrate in place but that such events are an artifact of a contagion e&#8629;ect resulting from an initiating criminal event <ref type="bibr">[32]</ref>. Similar observations have also explained the di&#8629;usion of homicide events <ref type="bibr">[37]</ref>. Experiments of predictive policing models using spatio-temporal Hawkes and self-exciting point processes demonstrates that such empirical realities can be harnessed to direct police resources to reduce crime <ref type="bibr">[24]</ref>. Thus, the interdependence and chronological occurrence of event types in crime and public health lend promise to how to best predict other social harm events, such as opioid overdoses.</p><p>In this work we consider the modeling of two datasets of space-time drug and opioid overdose events in Indianapolis. The first dataset consists of emergency medical calls for service (EMS) events. These events are non-fatal overdoses and include a date, time and location, but no information on the cause of the overdose. The second dataset consists of overdose death events (including location) and are accompanied by a toxicology report that screens for substances present or absent in the overdose event. We develop a marked point process model for the heterogeneous dataset that uses non-negative matrix factorization to reduce the dimension of the toxicology reports to several categories. We then use an Expectation-Maximization algorithm to jointly estimate model parameters of a Hawkes process and simultaneously infer the missing overdose category for the nonfatal overdose EMS data.</p><p>We show that the point process defined on the integrated, heterogeneous data out-performs point processes that use only homogeneous coroner data. We also investigate the extent to which overdoses are contagious, as a function of the type of overdose, while controlling for exogenous fluctuations in the background rate that might also contribute to clustering. We find that opioid overdose deaths exhibit significant excitation, with branching ratio ranging from .72 to .98.</p><p>The outline of the paper is as follows. In Section II, we provide an overview of our modeling framework. In Section III, we run several experiments on synthetic data to validate the model and also on Indianapolis drug overdose data to demonstrate model accuracy on the application. We discuss several policy implications and directions for future research in Section IV.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methods.</head><p>2.1. Self-exciting point processes. In this work we consider a self-exciting point process of the form, <ref type="bibr">[22]</ref>:</p><p>where g(x, y, t) is a triggering kernel modeling the extent to which risk following an event increases and spreads in space and time. The background Poisson process modeling spontaneous events is assumed separable in space and time, where u(x, y) models spatial variation in the background rate and &#9003;(t) may reflect temporal variation arising from time of day, weather, seasonality, etc. The point process may be viewed as a branching process (or superposition of Poisson processes), where the background Poisson process with intensity &#181; 0 &#9003;(t)u(x, y) yields the first generation and then each event (x i , y i , t i ) triggers a new generation according to the Poisson process g(x x i , y y i , t t i ).</p><p>We allow for self-excitation in the model to capture spatio-temporal clustering of overdoses present in the data. For example, a particular supply of heroin may contain an unusually high amount of fentanyl, leading to a cluster of overdoses in a neighborhood where the drug is sold and within a short time period.</p><p>Model 1 can be estimated via an Expectation-Maximization algorithm <ref type="bibr">[34,</ref><ref type="bibr">23]</ref>, leveraging the branching process representation of the model. Let L be a matrix where l ij = 1 if event i is triggered by event j in the branching process and l ii = 1 if event i is a spontaneous event from the background process. Then the complete data log-likelihood is given by,</p><p>Thus estimation decouples into two density estimation problems, one for the background intensity and one for the triggering kernel. Because the complete data is not observed, we introduce a matrix P with entries p ij representing the probability that event i is triggered by event j.</p><p>Given an initial guess P 0 of matrix P , a non-parametric density estimation procedure can be used to estimate u and v from {t k , x k , y k , p kk } N k=1 , providing estimates u 0 , v 0 in the maximization step of the algorithm.</p><p>More specifically, we estimate u and v using leave-one-out kernel density estimation, ( <ref type="formula">5</ref>)</p><p>,</p><p>where We assume the triggering kernel is given by a separable function that is exponential in time (Figure <ref type="figure">1</ref>) with parameter ! and Gaussian in space with parameter <ref type="bibr">[18]</ref>,</p><p>We then obtain an estimate for the parameters using weighted sample averages from the data</p><p>In the estimation step, we estimate the probability that event i is a background event via the formula, ( <ref type="formula">9</ref>)</p><p>and the probability that event i is triggered by event j as, <ref type="bibr">(10)</ref> p ij = g(x i x j , y i y j , t i t j ) (x i , y i , t i ) , <ref type="bibr">[39]</ref>. We then iterate for n = 1, ..., N em between the expectation and maximization steps until a convergence criteria is met:</p><p>1. Estimate u n , v n , and g n using ( <ref type="formula">5</ref>) and (8). 2. Update P n from u n , v n , and g n using ( <ref type="formula">9</ref>) and (10).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.2.</head><p>Modeling with heterogeneous event data. In this work we assume that we are given two datasets A and B, though our modeling framework extends more generally to three or more. Event dataset A contains low dimensional, unmarked space-time events, whereas dataset B contains spacetime events with high-dimensional marks. In our application, drug overdoses that do not result in death comprise dataset A, whereas those overdoses that do result in death are accompanied by a high-dimensional mark, namely the toxicology screen conducted by the coroner. Event dataset B therefore contains a much smaller number of events compared to A.</p><p>Next we use non-negative matrix factorization (NMF) <ref type="bibr">[14]</ref> to reduce the dimension of the high-dimensional mark of dataet B into an indicator for K groups. Each toxicology report consists of an indicator (presence or absence) for each one of 133 drugs the test screens. These reports then are input into a overdose-drug matrix analogous to a document-term matrix in text analysis using NMF. We then use NMF to factor overdose-drug matrices into the product of two non-negative matrices, one of them representing the relationship between drugs and topic clusters and the other one representing the relationship between topic clusters and specific overdose events in the latent topic space. The second matrix yields the cluster membership of each event (the cluster is the argmax of the column corresponding to each event).</p><p>2.3. Estimation of a marked point process with missing data. Merging dataset A and B, we now have marked event data (x i , y i , t i , k i ) where the mark k i is one of k = 1, ..., K clusters and is unknown for event data coming from A but is known for event data from B.</p><p>Model (1) can be extended by adding in the group labels</p><p>where g k is modelled as follows:</p><p>(12)</p><p>Here we assume each cluster k has its own parameters (! k , &#181; k 0 , k , K k 0 ). We then extend the branching structure matrix P to a set of K matrices, P k , with initial guess P k 0 and entries:</p><p>1 K , if i = j and event i from A 1, if i = j, event i from B and belongs to group k 0, otherwise Then P k can be updated similarly for each cluster k = 1, &#8226; &#8226; &#8226; , K:</p><p>where for each event i from dataset A, we have that</p><p>= 1, and for event i from dataset B we have that p k ij = 0 for all events j where t i t j and k is not the group to which event i belongs.</p><p>The parameters are then estimated using P k :</p><p>and the EM algorithm is iterated to convergence. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results.</head><p>, bg(4), respectively.</p><p>3.1. Synthetic Data. To validate our methodology, we simulate point process data where data set B has K = 4 groups with parameters given by those in Table <ref type="table">1</ref> and<ref type="table">2</ref>. The background rate for each group is heterogeneous in space, with di&#8629;erent rates in each quadrant in the unit square and homogeneous in time. Figure <ref type="figure">2</ref>  True parameters of synthetic data.</p><p>events are simulated: di&#8629;erent background rates are assigned to each of the four di&#8629;erent regions. Table <ref type="table">2</ref> contains the true parameters for each group.</p><p>We then simulate the missing data process by assigning 30% of the data to dataset A (no label) and 70% to B. We find that the EM algorithm detailed above converges within 50 iterations.</p><p>We simulate 50 synthetic datasets and then estimate the true parameters, where the results are displayed in Figure <ref type="figure">3</ref>. In the figure, the histograms of w, K 0 , and &#181; correspond to the estimates from the EM algorithm, where the red reference lines represent the average of the 50 results and the true value of the parameters are in blue. We find that our model is able to accurately recover both the true parameters and the event cluster membership up to the standard errors of the estimators.</p><p>In Table <ref type="table">3</ref>, we display the estimated number of events of each group (along with their actual values) when A has 30% of events as well as when 90% of events are assigned to A (and thus unknown). We find in both experiments that the model is able to recover the cluster sizes accurately.</p><p>In Figure <ref type="figure">4</ref>  <ref type="table">3</ref> Number of events from each group vs estimated number while dataset A is 30% (left) and 90% (right) of all data. dividually against the combined model. We also analyze the di&#8629;erence in performance versus the percentage of events assigned to dataset A. Here we find that the model estimated on both datasets always has higher likelihood than the models estimated only on one dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Emergency Data and Toxicology Report.</head><p>Next we analyze a dataset of drug overdose data from Marion County, Indiana (Indianapolis). The data spans the time period from January 14, 2010 to December 30, 2016. The fatal drug overdose dataset with toxicology reports (dataset B) consists of 969 events and the non-fatal, emergency medical calls for service dataset is 24 times bigger, with 22,049 unlabelled events. We use NMF as described above to cluster the toxicology report data. We use coherence <ref type="bibr">[30]</ref> to select the number of clusters, which we find to be K = 4 for our data (see Figure <ref type="figure">5</ref>). In Table <ref type="table">4</ref> we show the top 24 most frequent drugs and their frequencies present in the fatal overdose dataset and in Table <ref type="table">5</ref> we display the top 5 most frequent drugs found in each NMF group. In Table <ref type="table">5</ref> we find that the first group consists of illicit drugs (6-MAM and heroin), whereas group 2 consists of mostly opioids that can be obtained via a prescription. Group 3 overdoses involve alcohol, whereas group 4 is fentanyl related overdoses. Next we fit the point process model to the fatal and non-fatal overdose data. In Figure <ref type="figure">6</ref> we plot a heatmap of the inferred background events in space, disaggregated by group, along with the temporal trend of background events in Figure <ref type="figure">7</ref>. We find that in time, the frequency of prescription opioid overdoses went down in Indianapolis, whereas illicit opioid overdoses, including the fentanyl group, increased over the same time period. In space, the illict drug hotspots are focused downtown, whereas the prescription opioid hotspots are more spread out in the city.</p><p>In Table <ref type="table">8</ref> we display the estimated point process parameters. We see that for each group self-excitation plays a large role, where the branching ratio ranges from .72 to .98. In Table <ref type="table">6</ref> and 7 we compare the log-likelihood values of the combined heterogeneous point process to baseline models estimated only on EMS or overdose death data. Here we find that including the EMS data improves the AIC values of the model for opioid overdose death, and the overdose death data improves the AIC of the model for EMS events.</p><p>To assess the model with a metric that better mirrors how interventions might work, we run the following experiment. For each day in January 15, 2010 to December 30, 2016, we estimate the point process intensity in each of 50x50 grid cells covering Indianapolis. We then rank the cells by the intensity and assign labels for whether an overdose occurs (1) or does not occur (0) during the next day. We then compute the area under the curve (AUC) of this ranking for the baseline and the proposed method. In practice, a point process model could be used to rank the top hotspots where overdoses are likely to occur and then those areas could be the focus of targeted interventions, such as distribution of naloxone that reverses the e&#8629;ects of an overdose.</p><p>In Table <ref type="table">6</ref>   <ref type="table">7</ref> Di&#8629;erent measurement results on Opioid overdose death data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion.</head><p>Heterogeneous data integration for model improvement promotes several policy and intervention benefits. Research using emergency medical services data has shown that persons who experience repeat nonfatal drug overdoses have a significantly higher mortality rate as compared to individuals without repeat events <ref type="bibr">[26]</ref>. As our results suggest, toxicology data can be leveraged to model overdose di&#8629;usion across space and time, and di&#8629;usion varies across geographies. Taken together, integration of large-scale event data and overdose di&#8629;usion can sharpen policy interventions designed to reduce substance abuse and substance-related deaths. One such policy example is the deployment of nasal naloxone by police and EMS agencies which mitigates overdose e&#8629;ects <ref type="bibr">[5]</ref>. Integration of heterogeneous data sources also help to contextualize and better understand the nuances of how social harms may a&#8629;ect di&#8629;erent populations of people. As our study illustrates, prescription drug overdoses occur at higher rates in areas further from downtown Indianapolis, while illicit drug overdoses are more concentrated around the urban core of the city. These results underscore societal di&#8629;erences of opioid drug use. Consistent with community explanations of crime and social disadvantage <ref type="bibr">[28]</ref>, we observe that illicit drugs, which are more likely to result in mortality, may disproportionately impact minority communities. Current evidence indicates these trends are driven by heroin and synthetic opioid-related deaths as well as growing use of fentanyl-laced cocaine among African Americans <ref type="bibr">[1,</ref><ref type="bibr">10]</ref>. Moreover, these trends persist despite evidence that African Americans are less likely to be prescribed opioids for pain relative to Caucasians <ref type="bibr">[17]</ref>, which has been identified as a primary pathway to illicit opioid use <ref type="bibr">[15]</ref>. Together, current evidence suggests the epidemiology of opioid use, especially illicit opioid use, is not well-defined for racial-ethnic minorities. Heterogeneous data integration is likely the most appropriate path forward to improve our understanding of this issue.</p><p>Our work here is also related to the analysis of free text data that accompanies crime reports <ref type="bibr">[13,</ref><ref type="bibr">25,</ref><ref type="bibr">19]</ref> and other types of incidents, for example railway accidents <ref type="bibr">[7]</ref>. While the majority of point process focused studies of crime and social harm use only location, time, and incident category as input into the model, we believe future research e&#8629;orts on incorporating auxilliary, high-dimensional information into these models may yield improvements in model accuracy and also provide insight into the underlying causal mechanisms in space-time event contagion.</p><p>We do note that disentangling contagion patterns from other types of spatio-temporal clustering is challenging due to seasonal and exogeneous trends <ref type="bibr">[38,</ref><ref type="bibr">21]</ref>. Future work should also focus on investigating the extent to which drug overdose triggering found in the present study can be detected across cities and model specifications.  </p></div></body>
		</text>
</TEI>
