<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks</title></titleStmt>
			<publicationStmt>
				<publisher>Proceedings of Machine Learning Research</publisher>
				<date>08/11/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10514114</idno>
					<idno type="doi"></idno>
					<title level='j'>Proceedings of the 8th Machine Learning for Healthcare Conference</title>
<idno>2640-3498</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Xiaobin Shen</author><author>Jonathan Elmer</author><author>George H Chen</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Patients resuscitated from cardiac arrest who enter a coma are at high risk of death. Forecasting neurological outcomes of these patients (i.e., the task of neurological prognostication) could help with treatment decisions: which patients are likely to awaken from their coma and should be kept on life-sustaining therapies, and which are so ill that they would unlikely benefit from treatment? In this paper, we propose, to the best of our knowledge, the first dynamic framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data: our framework makes predictions for a patient over time as more EEG data become available, and different training patients’ available EEG time series could vary in length. Predictions themselves are phrased in terms of either time-to-event outcomes (time-to-awakening or time-to-death) or as the patient’s probability of awakening or of dying across multiple time horizons (e.g., within the next 24, 48, or 72 hours). Our framework is based on using any dynamic survival analysis model that supports competing risks in the form of estimating patient-level cumulative incidence functions. We consider three competing risks as to what happens first to a patient: awakening, being withdrawn from life-sustaining therapies (and thus deterministically dying), or dying (by other causes). For some patients, we do not know which of these happened first since they were still in a coma when data collection stopped (i.e., their outcome is censored). Competing risks models readily accommodate such patients. We demonstrate our framework by benchmarking three existing dynamic survival analysis models that support competing risks on a real dataset of 922 post-cardiac-arrest coma patients. Our main experimental findings are that: (1) the classical Fine and Gray model which only uses a patient’s static features and summary statistics from the patient’s latest hour’s worth of EEG data is highly competitive, achieving accuracy scores as high as the recently developed Dynamic-DeepHit model that uses substantially more of the patient’s EEG data; and (2) in an ablation study, we show that our choice of modeling three competing risks results in a model that is at least as accurate while learning more information than simpler models (using two competing risks or a standard survival analysis setup with no competing risks).]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Cardiac arrest is one of the leading causes of death and disability worldwide, resulting in approximately 300,000 to 450,000 deaths annually in the U.S. alone (NIH<ref type="foot">foot_0</ref> , 2022) and a 43% reported rate of suffering from cognitive impairment <ref type="bibr">(Byron-Alhassan et al., 2021)</ref>. In this paper, we specifically focus on cardiac arrest patients who enter a coma and are admitted to the ICU, where they are placed on life-sustaining therapies (e.g., mechanical ventilation, cardiac support devices). Here, forecasting the neurological outcome of patients (i.e., the task of neurological prognostication) is important: if physicians perceive a patient to have poor neurological prognosis, then they may discontinue life-sustaining therapies for the patient, which deterministically ends the patient's life. Withdrawal of life-sustaining therapies accounts for 48% of all nonsurvivors, with 31% occurring in medically unstable patients and 17% in medically stable patients <ref type="bibr">(Matthews et al., 2017)</ref>. This raises the possibility that some patients may have survived if different decisions had been made regarding their care. Neurological prognostication, therefore, is crucial in determining the appropriate treatment plan for each patient.</p><p>In recent years, a promising direction for neurological prognostication has been to take advantage of brain activity measurements using electroencephalography (EEG) <ref type="bibr">(Abend et al., 2010;</ref><ref type="bibr">Glass et al., 2013;</ref><ref type="bibr">Friberg et al., 2015)</ref>. A number of recent studies have demonstrated these EEG signals are predictive of patients' neurological outcomes (e.g., <ref type="bibr">Thenayan et al. 2010;</ref><ref type="bibr">Rittenberger et al. 2012;</ref><ref type="bibr">Rossetti et al. 2012;</ref><ref type="bibr">S&#248;holm et al. 2014;</ref><ref type="bibr">Oh et al. 2015;</ref><ref type="bibr">Elmer et al. 2016b;</ref><ref type="bibr">Westhall et al. 2016;</ref><ref type="bibr">Shekhar et al. 2023)</ref>. In this paper, we use both EEG data recorded over time and some patient characteristics collected upon hospital admission.</p><p>While prognostication is essential, how it has been modeled in existing literature has some major limitations. First, binary classification has been the most common approach for modeling prognostication for post-cardiac arrest coma patients, where the two classes are poor neurological recovery or death (taken to be the "positive" class) and favorable neurological recovery with certain levels of consciousness at hospital discharge (the "negative" class) (e.g., <ref type="bibr">Rossetti et al. 2016;</ref><ref type="bibr">Admiraal et al. 2021;</ref><ref type="bibr">Moseby-Knappe et al. 2020</ref>). The goal is to achieve a high true positive rate (TPR) with a false positive rate (FPR) close to 0 (see, for instance, the overview by <ref type="bibr">Geocadin et al. (2019)</ref>). However, the existence of patients for whom life-sustaining therapies were withdrawn complicates this binary classification setup: including these particular patients in training data is problematic because we do not know what their neurological outcomes would have been if they had been kept on life-sustaining therapies (i.e., we do not know which of the two classes they belong to). Some existing studies simply exclude such patients (e.g., <ref type="bibr">De-Arteaga et al. 2019)</ref>. However, by excluding these patients, we ignore potentially useful information: these patients likely have characteristics that led to physicians withdrawing them from life-sustaining therapies.</p><p>Another major limitation of existing work is ignoring the dynamic nature of both the EEG signals as well as physicians' decision-making and only focusing on using EEG data of a fixed period of time to make a prediction at a single point in time. For example, <ref type="bibr">De-Arteaga et al. (2019)</ref> only use EEG data between hours 34 to 36 after ICU admission to make a single prediction of the chance of favorable recovery for each patient; any patient with missing EEG data between hours 34 to 36 would be excluded, and EEG information outside of this time window would not be used by their prediction model. <ref type="bibr">Meanwhile, Admiraal et al. (2021)</ref> make a single prediction using EEG data 24 hours after cardiac arrest. In practice, physicians make decisions regarding treatment plans at different times after the patients have been admitted to the ICU (e.g., adjusting medications such as the dosage of vasopressors over time, or deciding whether to withdraw life-sustaining therapies), potentially using all information collected of the patient up until present time. To the best of our knowledge, no existing method has been developed for neurological prognostication of these coma patients that is truly dynamic, where the training patients' time series can vary in length, and where we make predictions at any point in time.</p><p>Our main contribution in this paper is to propose a framework for neurological prognostication of post-cardiac-arrest coma patients that addresses both of the above major limitations. In particular, our framework is dynamic and directly models specific outcomes of interest as competing risks in the sense that one outcome happening (e.g., withdrawal from life-sustaining therapies) prevents other outcomes from happening (e.g., awakening, dying of other causes). Moreover, our framework allows for outcomes to be censored, meaning that for some patients, we do not get to see their eventual outcome since they were still in a coma by the time data collection stopped. That they were still alive at the end of data collection could be due to specific patient characteristics that make them more likely to be alive rather than to have been pulled off life-sustaining therapies or to have died of other causes.</p><p>Our framework builds on a class of existing dynamic survival analysis models that support competing risks; we refer to this class of models as dynamic competing risks (DCR) models (we precisely define this class of models in Section 2). An example of such a model is Dynamic-DeepHit <ref type="bibr">(Lee et al., 2019)</ref>. Roughly, a DCR model predicts, for any test patient with measurements up to time t, the probability of the patient experiencing each of the different competing events at any time after time t.</p><p>Even though we learn a DCR model with three competing risks for the neurological prognostication problem, at prediction time, one of these competing risks is often not of primary interest: whether a patient will be withdrawn from life-sustaining therapies. Even if we did predict who would be withdrawn from life-sustaining therapies, we would effectively be predicting how past decisions were made by physicians and not what the true neurological outcomes of these patients would have been. Ideally, predictions of the latter should be what we use to assist with treatment decisions. For this reason, we show how to derive a binary classifier from the DCR model's predicted output that aims to predict whether a patient will awaken or die (of causes aside from withdrawal from life-sustaining therapies) within any user-specified time horizon (Section 3.1). Conceptually, whereas some existing work on neurological prognostication excluded patients who were withdrawn from life-sustaining therapy altogether from their analysis (e.g., <ref type="bibr">De-Arteaga et al. 2019)</ref>, we are including such patients when training a DCR model, and only when using our classifier, we condition on (in a probabilistic sense) the test patient not being withdrawn from life-sustaining therapies in the future. Especially as this classifier is meant to help with decisions such as whether a patient should be withdrawn from life-sustaining therapies, a reasonable assumption is to condition on this event not happening yet. This binary classifier that we derive from the DCR model can classify variable-length input time series using any user-specified time horizon without needing to re-train the DCR model. We further develop a patient-specific heat map visualization for the classifier that is straightforward to interpret (Section 3.2).</p><p>In experiments on real data (cohort selection and other dataset details are in Section 4), we benchmark three DCR models to compare how accurate they are, and we also conduct an ablation study to show why our choice of modeling the competing risks setting with three competing risks is better than using two or one instead (Section 5). Note that the one competing risk setting reduces to a dynamic survival analysis setup without competing risks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Generalizable Insights about Machine Learning in the Context of Healthcare</head><p>For neurological prognostication of post-cardiac-arrest coma patients using EEG data, our paper is, to the best of our knowledge, the first to consider a dynamic problem setup. We believe that we have framed this prognostication problem in a manner that is more useful for clinical decision support compared to how it has been framed in existing literature.</p><p>As our dynamic problem setup and accompanying evaluation metrics have not previously appeared in literature for our specific clinical application, our two main experimental findings are novel: (1) in benchmarking three DCR models, we find that the classical competing risks model by <ref type="bibr">Fine and Gray (1999)</ref> that only uses static patient features and the summary information from the last hour of a patient's EEG data is highly competitive, achieving accuracy scores as high as the recently developed Dynamic-DeepHit model <ref type="bibr">(Lee et al., 2019)</ref> that uses substantially more EEG data; and (2) our ablation study shows that our choice of modeling three competing risks results in a model that is at least as accurate while providing more information than a model that uses two competing risks or a dynamic survival analysis model without competing risks. These findings suggest that researchers working on the same clinical application may want to also consider modeling at least the three competing risks we consider (or even finer-grain versions of some of them, such as accounting for more causes of death), and also trying the classical Fine and Gray model as a baseline.</p><p>From a technical standpoint, our paper does not introduce a new model. Instead, our paper demonstrates how to effectively use any existing DCR model to address a specific clinical problem. The novelty is thus in the application of existing DCR models and also in our proposal of a classifier (derived from a DCR model) and an accompanying heat map visualization for this classifier. The crucial insight of the classifier that we develop is to condition on a particular event-an action taken by a physician that inevitably ends the patient's life-not happening in the future of the patient because the output of the classifier is meant to help with deciding on whether to take this action (where a reasonable assumption is that we have not taken the action as doing so has a permanent consequence). We suspect that this same idea would be relevant in various other clinical problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head><p>Our paper builds on a specific class of existing dynamic survival analysis models, which we refer to as dynamic competing risks (DCR) models. We review this class of models in Section 2.1, where we also state the dynamic problem setting that our framework uses. We briefly give a concrete example of a DCR model (Dynamic-DeepHit by <ref type="bibr">Lee et al. (2019)</ref>) in Section 2.2. Note that throughout this paper, for any positive integer m, we frequently use the notation [m] := {1, 2, . . . , m}. We typically use uppercase variables to refer to random variables whereas lowercase variables refer to constants, realized values of random variables, or dummy indices (e.g., training data indices).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Dynamic Problem Setup and DCR Models</head><p>Training data We assume that we have a training dataset consisting of n patients. For each training patient i &#8712; [n], we observe a times series with a total of L i time steps, where at each time step, we observe d features. Specifically, we observe the feature vectors</p><p>i &#8712; R d is the i-th patient's feature vector at time step &#8467; &#8712; [L i ] (time steps are sorted chronologically, so time step L i is the last time step observed for training patient i). Moreover, time step &#8467; &#8712; [L i ] happens at a time that is recorded as a real number T (&#8467;) i &#8712; R, meaning that the amount of time that elapses between time steps &#8467; and &#8467; + 1 is</p><p>A common assumption is to set the initial time step's time to be T</p><p>(1) i = 0. Note that for the d features that are tracked over time, it is possible that some always stay the same (e.g., age upon hospital admission). For ease of exposition, we do not introduce additional notation that separates static from time-varying features although separating these two types of features could be done in practice.</p><p>As the above notation suggests, patients' time series can vary in length (e.g., one patient could have EEG data recorded every second for 6 hours, whereas another could have EEG data recorded every second for 12 hours). For the real data we consider later, the time series are regularly sampled (i.e., the amount of time that elapses between consecutive time steps is the same) but in general, DCR models can handle irregularly sampled time series.</p><p>In terms of ground truth information, for each training patient i &#8712; [n], we assume that we observe two quantities:</p><p>&#8226; (Event indicator) We observe which of k different competing events happened first to the i-th patient, or alternatively we could also observe that none of the competing events happened by the time training data collection stopped. This information is stored in the event indicator K i &#8712; {0, 1, 2, . . . , k}. For example, in the neurological prognostication problem, we have k = 3 and the competing events are awakening (K i = 1), dying of causes aside from withdrawal from life-sustaining therapies (K i = 2), and withdrawal from life-sustaining therapies (K i = 3). The special value of K i = 0 means that by the time data collection stopped, none of the competing events happened. &#8226; (Event time) We also observe the time Y i &#8712; R for when the first competing event happened or, if none of them happened, then Y i is the time when data collection stopped for the i-th patient (i.e., the time of "censoring"). Note that at any time step &#8467; &#8712; [L i ], the time until the first competing event or censoring happens is Y i -T</p><p>i . In summary, for training patient i &#8712; [n], we observe an event indicator K i &#8712; {0, 1, . . . , k}, an event time T i &#8712; R, and an input time series of feature vectors</p><p>As shorthand notation for referring to the entire observed time series, we write Z</p><p>We model the training data (Z 1 , K 1 , Y 1 ), . . . , (Z n , K n , Y n ) to be i.i.d. To formally state how each training point is generated, we define a few probability distributions: Q T denotes an underlying probability distribution over variable-length time series, Q E (Z) denotes an underlying conditional probability distribution over nonnegative time durations across all k competing events given a specific time series Z (i.e., a random sample from Q E (Z) yields a vector in R k where the j-th entry of the vector is a random time duration until competing event j &#8712; [k] happens), and Q C (Z) denotes the underlying conditional probability distribution over nonnegative time durations until censoring. Specifically, the i-th training point (Z i , K i , Y i ) is generated as follows:</p><p>1. We sample time series Z i (with L i time steps and last time T (L i ) i using our earlier notation) from Q T . 2. We sample the true nonnegative time durations (&#926; i,1 , &#926; i,2 , . . . , &#926; i,k ) from Q E (Z i ) (so that &#926; i,1 is the time until the 1st competing event happens, &#926; i,2 is the time until the 2nd competing event happens, etc). 3. We sample the true nonnegative time duration &#926; i,0 (time until censoring) from a conditional distribution Q C (Z i ). 4. We set K i := arg min j=0,1,...,k &#926; i,j , and</p><p>Note that the competing events are "exhaustive" in the sense that with probability 1, either one of them happens or censoring happens.</p><p>Prediction target We model a test patient's data using the same distributions that we introduced for training data. However, for the test patient, our goal will never be to predict whether the test patient is censored. In particular, we model a test patient with time series Z, event indicator K (always a value in {1, . . . , k}), and event time Y as follows:</p><p>1. We sample time series Z = (X (1) , . . . , X (L) ), (T (1) , . . . , T (L) ) (using notation similar to that of training data) from Q T . Note that Z has L time steps. 2. We sample the true nonnegative time durations (&#926; 1 , &#926; 2 , . . . , &#926; k ) from Q E (Z). 3. We set K := arg min j=1,...,k &#926; j , and Y := T (L) + &#926; K . Even though time series Z is generated so that it has L time steps, in how we set up the prediction task next, we do not observe all time steps immediately. Instead, we progressively observe more of Z over time, similar to what would happen in a real clinical context. In particular, we state our prediction task to depend on time t &#8712; R and use the random variable Z (&#8804;t) to denote time series Z limited to information up until time t. Specifically, we aim to predict the so-called cumulative incidence function (CIF) of event j &#8712; [k], which is the probability of event j happening within time duration &#8710; &#8805; 0 starting from time t &#8712; R, given a time series observed up until time t. Formally, we write the CIF as</p><p>where t is the time that we are making a prediction at, and z is any specific realization of random variable Z (again, the superscript " (&#8804;t) " restricts time to be up until t).</p><p>Note that we have intentionally stated the CIF in the dynamic setting with variablelength time series, where we can make predictions at different points in time. The classical version of the CIF <ref type="bibr">(Gray, 1988;</ref><ref type="bibr">Fine and Gray, 1999)</ref> is stated in the "static" setting without time series and can be viewed as a special case of the dynamic setup we have described, where all time series sampled from Q T have exactly one time step, the recorded time of this first time step is always just taken to be 0, and we only ever evaluate equation (1) at t = 0. Separately, if the number of competing risks is equal to k = 1, then the entire setup we have described would instead be for dynamic survival analysis without competing risks. In fact, one could show that the static setting with one competing risk simply reduces to the classical right-censored survival analysis setup (e.g., see the random censoring setup described in Section 3.2 of the textbook by <ref type="bibr">Kalbfleisch and Prentice (2002)</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dynamic competing risks (DCR) models</head><p>The class of DCR models that our framework for neurological prognostication builds on is any model that can predict CIFs as given in equation (1). For example, Dynamic-DeepHit <ref type="bibr">(Lee et al., 2019)</ref> and SurvLatent ODE <ref type="bibr">(Moon et al., 2022)</ref> are DCR models. Note that any classical competing risks model (e.g., Fine and Gray 1999) that does not actually handle variable-length time series could be made into a DCR model in a simple manner: simply only use the last time step's feature vector to predict. Some existing dynamic survival analysis models (such as DDRSA (Venkata and Bhattacharyya, 2022)) that were not originally developed to support competing events could be modified to estimate CIFs as well. To give a sense of how a DCR model works, we review Dynamic-DeepHit next. Note that we specifically review a DCR model that directly models variable-length time series without manual feature engineering or resorting to, for instance, only ever using the last time step of an input time series.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Example of a DCR Model: Dynamic-DeepHit</head><p>We provide an overview of Dynamic-DeepHit <ref type="bibr">(Lee et al., 2019)</ref>, deferring details to the original paper.<ref type="foot">foot_1</ref> Importantly, Dynamic-DeepHit discretizes possible values for time duration &#8710; in equation ( <ref type="formula">1</ref></p><p>We assume that &#8710; 1 &gt; 0 and that &#8710; m is an upper bound on possible durations encountered across all events j &#8712; [k] (i.e., all k events happen within time duration &#8710; m ). In what follows, we regularly use the variables u, v &#8712; [m] to denote indices of the discretized values of &#8710;. Dynamic-DeepHit estimates a probability mass function variant of the CIF for event j &#8712; [k] given by</p><p>where we define &#8710; 0 := 0 to handle the case when we plug in u = 1. Before explaining why the above function behaves like a probability mass function, we point out that one can readily verify that the CIF for event j from equation (1) satisfies the equality</p><p>Thus, so long as we can estimate O j (&#8710; u | z, t) in equation ( <ref type="formula">2</ref>), then we obtain an estimate of the CIF in equation ( <ref type="formula">1</ref>) albeit only along a discrete time grid &#8710; &#8712; {&#8710; 1 , . . . , &#8710; m }.</p><p>As for why O j (&#8710; u | z, t) behaves like a probability mass function, note that</p><p>where we have used equation ( <ref type="formula">3</ref>) and the assumption that &#8710; m is chosen as an upper bound on possible durations across all k competing events. With this motivation, Dynamic-DeepHit models O j (&#8710; u | z, t) in equation ( <ref type="formula">2</ref>) using a neural network. For the i-th training time series Z</p><p>where O i,j,u is shown on the right side of Figure <ref type="figure">1</ref>; we collect all the O i,j,u values specific to the i-th patient in the vector O i &#8712; R k&#8226;m . We compute O i from Z i as follows:</p><p>1. We first feed the input time series Z i into a user-specified RNN (with p output features per time step, where the choice of p is up to the user), where we slightly transform what the input looks like per time step. Specifically at time step &#8467; &#8712; [L i ], the input to the RNN is taken to be (X</p><p>i ), i.e., we provide both a feature vector and a time duration to get to the next time step. However, the last time step's RNN input is taken to be (X</p><p>This first step is shown on the left side of Figure <ref type="figure">1</ref>. 2. The next step is to summarize the variable-length time series (H</p><p>and f attention is a user-specified feed-forward neural network, such as a multilayer perceptron (MLP), that outputs a single real number. This second step is shown in the middle of Figure <ref type="figure">1</ref>. 3. We use vector C i as an input to k different MLPs (one per competing risk) that each outputs m numbers, and the overall output is passed through a softmax layer to produce the final output O i (the softmax enforces the constraint in equation ( <ref type="formula">4</ref>)).</p><p>Note that this last step corresponds to the original DeepHit model <ref type="bibr">(Lee et al., 2018)</ref> that is meant for handling input data that are fixed-length feature vectors rather than variable-length time series. This third step is shown on the right side of Figure <ref type="figure">1</ref>. There is one last neural network component that is not shown in Figure <ref type="figure">1</ref> as it is not used to compute the output vector O i : Dynamic-DeepHit also requires that at time step &#8467; &#8712; [L i ], the RNN on the left side of Figure <ref type="figure">1</ref> can output an estimate X (&#8467;+1) i of the next time step's feature vector X (&#8467;+1) i</p><p>. There are different ways to achieve this. For example, at time step</p><p>(along with the time duration to get to the next time step) into a user-specified MLP f next-time-step with d output features to produce the estimate X (&#8467;+1) i . Alternatively, for the RNN in Figure <ref type="figure">1</ref>, we could choose it to be a type of RNN that already distinguishes between hidden state vectors and output state vectors (e.g., LSTMs <ref type="bibr">(Hochreiter and Schmidhuber, 1997)</ref>), in which case we let the hidden state vectors be what we denoted as the H (&#8467;) i variables, and we use the output state vectors to predict the next steps' feature vectors (the output state vector would need to consist of d entries).</p><p>Training The final loss used to train a Dynamic-DeepHit model is the sum of three terms, two of which make up the original DeepHit loss (a negative log likelihood term and a ranking loss term), and the last term asks that each next feature vector estimate X (&#8467;+1) i is close to X (&#8467;+1) i (using, for instance, squared Euclidean distance).</p><p>Prediction At test time, we could feed any observed test time series (of arbitrarily nonzero length) as input to the neural network in Figure <ref type="figure">1</ref> to produce an estimate of the probability &lt; l a t e x i t s h a 1 _ b a s e 6 4 = " 4 3 s t</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RNN Temporal attention</head><p>&lt; l a t e x i t s h a 1 _ b a s e 6 4 = " K R K + p r P D 6 j R 8 5 8 K z h n s 2 / S 9 e N R M = " &gt;</p><p>MLP for event 2 MLP for event &lt; l a t e x i t s h a 1 _ b a s e 6 4 = " R d 0 K F e i U D D f r Z / S 4 T E t v N Y w T x Z I = " &gt; A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h p 5 K I X 8 e C F 4 8 t 2 F p o Q 9 l s J + 3 a z S b s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3   Each MLP has &lt; l a t e x i t s h a 1 _ b a s e 6 4 = " m a + w D R + U u N x 6 n m p c I 6 w h T j I 5 N h M = " &gt; A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h p 5 K I X 8 e C F 4 8 t 2 F p o Q 9 l s J + 3 a 3 S T s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3</p><p>&lt; l a t e x i t s h a 1 _ b a s e 6 4 = " m a + w D R + U u N x 6 n m p c I 6 w h T j I 5 N h M = " &gt; A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h p 5 K I X 8 e C F 4 8 t 2 F p o Q 9 l s J + 3 a 3 S T s b o Q S + g u 8 e F D E q z / J m / / G b Z u D t j 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3    mass function of the CIF in equation ( <ref type="formula">2</ref>) that we can then use to estimate the CIF with using equation ( <ref type="formula">3</ref>). We could trivially accommodate the setting where we see more of a test time series over time since the neural network accepts a variable-length input time series.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Framework for Neurological Prognostication</head><p>Any DCR model could be applied to the problem of neurological prognostication for postcardiac-arrest coma patients. As stated in Section 2, we can take the number of competing risks to be k = 3 corresponding to awakening (K = 1), dying (not of withdrawal from life-sustaining therapies) (K = 2), or withdrawal from life-sustaining therapies (and thus dying as a result) (K = 3). A key goal of our framework is to help clinicians interpret the information contained in the CIFs (equation ( <ref type="formula">1</ref>)) predicted by a DCR model. Using the three competing risks stated above, we derive a binary probabilistic classifier from an already trained DCR model (Section 3.1). The resulting classifier can then be used to produce a patient-specific prediction heat map visualization that aims to be straightforward for a clinician to interpret (Section 3.2). Separately, standard binary classification evaluation metrics could be used for the derived classifier, which supplement survival analysis evaluation metrics that already exist for DCR models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">A Derived Binary Classifier</head><p>As discussed in Section 1, predicting whether a patient will be withdrawn from life-sustaining therapies is often not of primary interest since this is a human-made decision that deterministically ends a patient's life, and we do not actually know for sure whether the patient would have instead awakened or died of other causes in the ICU. For any test patient's time series up to time t, we now derive a binary probabilistic classifier that conditions on the event that the test patient is never withdrawn from life-sustaining therapies (we refer to this event as the "non-withdrawal event"). As we aim to develop a decision support tool to help physicians decide on whether to withdraw life-sustaining therapies, a reasonable assumption is that the patients are kept on these therapies, especially since withdrawal of these therapies has a permanent effect (in Section 6, we comment on whether this conditioning makes sense).</p><p>After conditioning on the non-withdrawal event, we then compute the probabilities of the remaining two competing events happening within a time duration &#8710; &gt; 0. This conditional probability can be computed as follows, reusing notation from Section 2:</p><p>awaken or death (not by withdrawal from life-sustaining therapies)</p><p>.</p><p>(5)</p><p>Thus, if we have trained a DCR model that has an estimate F j (&#8710; | z, t) for each CIF F j (&#8710; | z, t), then we can directly plug in these CIF estimates into the right-hand side above to yield the estimated conditional probability</p><p>We could similarly estimate the probability for death (not by withdrawal from life-sustaining therapies) by</p><p>We thus have a binary probabilistic classifier between the two classes "awaken" and "death (not by withdrawal from life-sustaining therapies)" defined for a specific time t and time duration &#8710;: if the ratio P death (not withdrawal) (&#8710;|z,t)</p><p>is below a threshold value of 1, then we predict "awaken". The threshold of 1 could of course be tuned (e.g., to achieve some desired tradeoff between TPR and FPR on some validation set). Note that it only makes sense to plug in times t that are at least the earliest time encountered in time series z.</p><p>Importantly, the binary classifier we just described was derived using estimated CIFs. Existing DCR models like Dynamic-DeepHit estimate CIFs in a manner that would include patients of all three competing risks as well as those who were censored. In particular, we do not have to, for example, exclude patients who are censored or who were withdrawn from life-sustaining therapies from the analysis. Moreover, this classifier can be constructed for any time t after the earliest time in the observed test time series Z = z and for any choice of user-specified time duration &#8710; &gt; 0, without any re-training of the underlying DCR model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Technical remark</head><p>The estimated probabilities P awaken (&#8710; | z, t) in equation ( <ref type="formula">6</ref>) and P death (not withdrawal) (&#8710; | z, t) in equation ( <ref type="formula">7</ref>) do not, in general, sum to 1. This is intentional. Again, each probability is derived based on equation ( <ref type="formula">5</ref>), where the final denominator is the probability that a patient with time series z up to time t experiences an eventual outcome that is either K = 1 or K = 2. From an interpretation standpoint, an appealing aspect of how P awaken (&#8710; | z, t) (and similarly P death (not withdrawal) (&#8710; | z, t)) is defined is as follows. For a fixed time t, consider two time durations &#8710; and &#8710; &#8242; , where &#8710; &lt; &#8710; &#8242; (for example, &#8710; = 24 hours and &#8710; &#8242; = 48 hours). Then intuitively it makes sense that the probability of someone awakening within time duration &#8710; &#8242; should be larger than the probability of someone awakening within the time duration &#8710;, since &#8710; &#8242; &gt; &#8710;. In other words, we would like P awaken (&#8710; | z, t) &#8804; P awaken (&#8710; &#8242; | z, t). This property indeed holds for how we have defined equations (6) (and a similar result holds for P death (not withdrawal) ). This property would not be guaranteed to hold if instead we had changed the denominators of equations ( <ref type="formula">6</ref>) and ( <ref type="formula">7</ref>) to F 1 (&#8710; | z, t) + F 2 (&#8710; | z, t), which would ensure that the probabilities sum to 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Patient-Specific Heat Map Visualization</head><p>We propose a heat map visualization specific to any patient that shows how the predicted probability of awakening ( P awaken (&#8710; | z, t) in equation ( <ref type="formula">6</ref>)) changes for the patient as we observe more of the patient's time series, as shown in the fourth column plot of each row in Figure <ref type="figure">2</ref>. The experimental setup that led to this figure is explained in more detail in Section 5. For now, focusing on any one of the fourth-column heat maps of Figure <ref type="figure">2</ref>, we have the horizontal axis correspond to the prediction time t while the vertical axis is the time duration &#8710;. We specifically choose the &#8710;'s to be equivalent of the next 24 to 72 hours, which is of interest according to clinician feedback we have received. Using Patient 1 in Figure <ref type="figure">2</ref> as an example, we observe a drastic decrease in the probability of awakening starting at t = 9 across all &#8710; values evaluated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Cohort</head><p>We examine proprietary hospital data collected in a single medical center from 2010 to 2019 (the specific medical center has been blinded for reviewing purposes). The dataset includes patients who suffered sudden cardiac arrest, were successfully resuscitated, and survived to hospital admission at the medical center. EEG is initiated and monitored for these patients continuously as a routine standard of care for several days after cardiac arrest. As stated in the previous sections, we focus on three outcomes: awakening from the coma, dying (not from withdrawal of life-sustaining therapies, such as from brain death or rearrest) and withdrawal from life-sustaining therapies (due to perceived poor neurological prognosis, leading to death). We remark that there is a fourth outcome that is possible but that we exclude from analysis: patients could be withdrawn from life-sustaining therapies for non-neurological reasons such as a do-not-resuscitate order. As our focus is on neurological prognostication, we ignore this outcome in which a decision is made disregarding neurological prognosis. Note that in this study, we only care about the first occurrence of the event that ceases the coma status of a patient, i.e. if a patient awakened at some point and then still died shortly afterward, we would only consider awakening as the outcome of the patient.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset characteristics</head><p>After excluding patients whose cause of death is withdrawal for non-neurological reasons, we have a dataset consisting of 922 patients. For the 922 patients, summary statistics of their characteristics and the time-to-event are shown in an appendix (Table <ref type="table">A</ref>.1). Out of the 922 patients, 271 (29.4%) of them awakened at some point, 189 (20.5%) of them died (not from withdrawal from life-sustaining therapies), 432 (49.6%) of them died from withdrawal of life-sustaining therapies due to poor perceived neurological prognosis, and 30 (3.3%) of them were still in a coma when data collection stopped. single value (i.e., we first downsample the data so that each time step corresponds to one minute). ( <ref type="formula">2</ref>) We then further downsample the minute-resolution EEG signals so that each time step corresponds to an hour, and for each of the 12 EEG features, we take 6 summary statistics (minimum, maximum, mean, 25% percentile, median, and 75% percentile) as the final features to be used (i.e., for each time step corresponding to 1 hour, we end up with a total of 12 &#215; 6 = 72 summary features). We remark that the data after downsampling still captures most of the variation from the raw data and does not have much impact on prediction accuracy. Downsampling is mainly to reduce computation time.</p><p>Note that in how we curated the data, we only use at most 12 hours of EEG data per patient (so that if a patient has more than 12 hours worth of EEG data, we ignore the EEG data after 12 hours). This is a limitation of the dataset curation process that could be changed in future work. From a modeling perspective, DCR models could in principal use arbitrarily long EEG time series, subject to hardware memory constraints in practice.</p><p>Static features Upon hospital admission, a number of features are collected for the patient that we treat as static features, such as demographic information (e.g. age, gender), characteristics of the patient's initial cardiac arrest collapse (e.g., arrest location, initial arrest rhythm, category of the cardiac arrest), initial coma status, and medical history. A full list of the 43 static features we use can be found in Appendix A.</p><p>In summary, accounting for both the time-varying EEG features and the static features, we use a total of d = 72 + 43 = 115 final features per time step when we train a DCR model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experiments</head><p>In this section, we run experiments on the dataset described in the previous section using three DCR models: a classical competing risks model by <ref type="bibr">Fine and Gray (1999)</ref> using only the last observed time step of each input time series (details are in Appendix B), Dynamic-DeepHit <ref type="bibr">(Lee et al., 2019)</ref> and DDRSA <ref type="bibr">(Venkata and Bhattacharyya, 2022)</ref>. Also, note that the original DDRSA model by <ref type="bibr">Venkata and Bhattacharyya (2022)</ref> does not support competing risks and we modify its network structure to accommodate competing risks (details are in Appendix C). For simplicity, we refer to the competing-risk-adapted version of DDRSA as DDRSA despite the modification we make. We specifically aim to:</p><p>&#8226; (Section 5.1) examine how accurate these models using the standard survival analysis metric of concordance index (abbreviated c-index; this is a value between 0 and 1 where higher is better) <ref type="bibr">(Harrell et al., 1982)</ref> as well as AUROC of binary classifiers derived using our approach in Section 3.1, &#8226; (Section 5.2) provide examples of patient-specific visualizations (time series of two summary EEG signals, estimated CIFs, and the heat map visualization we described in Section 3.2) &#8226; (Section 5.3) show that using a setup with fewer than the three competing risks we consider result in models that are not as good.</p><p>Experimental setup We repeat the following basic experiment five times with different random training/validation/test splits of the data. For each experimental repeat, we randomly select 80% of the 922 patients to be in the training set and the remaining 20% in the testing set. For Dynamic-DeepHit and DDRSA, within the training set, a random Binary classification accuracy metric By using the binary classifier derived from a DCR model as described in Section 3.1, we can use binary classification evaluation metrics such as the area under the ROC curve (AUROC). After restricting the test set cohort to patients that we either observe awakening or death (not from withdrawal from life-sustaining therapies), we can compute the AUROC scores shown in Table <ref type="table">2</ref>. Here, note that the Fine summary EEG signals for four example patients, the estimated CIFs, and the heat map visualization we described in Section 3.2; each row/panel in the figure corresponds to one patient. Each entry in the heatmap is the estimated conditional probability of awakening occurring at different times (t = 1, 2, . . . , 12) for the different durations <ref type="bibr">(&#8710; = 24, 48, 72)</ref>. For the four patients shown:</p><p>&#8226; (Figure <ref type="figure">2</ref>(a)) Patient 1 died at hour 17 (not from withdrawal of life-sustaining therapies).</p><p>We observe a drastic change in the patient's EEG signals before and after t = 6, which is reflected in the two sets of estimated CIFs at t = 6 and t = 12 with a higher estimated CIF of the "awakening" event given the first 6 hours of EEG data but a lower "awakening" CIF curve after the first 12 hours of EEG. briefly describe some common patterns considered "good" or "bad" in the caption of Figure <ref type="figure">2</ref>). In the heat map visualization, the predicted probability of awakening is high at all times. &#8226; (Figure <ref type="figure">2(d)</ref>) Patient 4 died (not from withdrawal of life-sustaining therapies) at hour 71, and we did observe very poor EEG signals over the entire 12-hour period. In the heat map, the predicted probability of awakening is low at all entries. Note that the estimated CIF for withdrawal from life-sustaining therapies could be viewed as the model's prediction of how likely a physician (at least according to historical data) would make a decision to withdraw said therapies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Ablation Study</head><p>We conduct an ablation study to show why including the three competing risks in how we framed the problem is better than had we used fewer competing risks. In particular, we repeat the same experiments as above but with only two competing risks (awakening and dying not by withdrawal from life-sustaining therapies), and a single event (awakening). For the purpose of this ablation study, we only focus on the Fine and Gray model and Dynamic-DeepHit as they achieved noticeably higher accuracy than DDRSA in our earlier experiments.</p><p>In the case when we only consider two competing risks, a patient with the outcome label of withdrawal from life-sustaining therapies would be viewed as being censored (along with those who stayed in a coma). Similarly, when we only model a single competing risk, patients with all other outcomes are viewed as being censored. The resulting c-indices are shown for the two-competing-risk case in Table <ref type="table">3</ref> and for the one competing risk case in Table <ref type="table">4</ref>.</p><p>By comparing Tables <ref type="table">1</ref> and<ref type="table">3</ref>, we see that c-indices for death (not by withdrawal of life-sustaining therapies) stay at a similar level for the Fine and Gray model considering the standard deviation interval when we model only two competing risks, while those of Dynamic-DeepHit are clearly higher when we model all three competing risks. While we do not observe a gain in the c-indices for the event of awakening when we move from the single risk setting to those of two or three competing risks, by incorporating more events, we are able to capture more information without decreasing the model's accuracy. For example, if we only trained the single competing risk model and if the patient is not likely to awaken from the coma, the model would not help us distinguish between any of the other possible outcomes of the patient. We also derive the binary classifier under two competing risks. The resulting AUROC scores are shown in Table <ref type="table">5</ref> and ROC plots at t = 6, 12, &#8710; = 24 in Figure <ref type="figure">4</ref>. By comparing Tables <ref type="table">2</ref> and<ref type="table">5</ref>, average AUROCs drop for both the Fine and Gray model and Dynamic-DeepHit when we change from three events to two events, showing that there is a benefit to including the event of withdrawal from life-sustaining therapies. By comparing Figures <ref type="figure">3</ref> and<ref type="figure">4</ref>, focusing on the low FPR regime, we observe the TPRs for Dynamic-DeepHit stay at a similar level under the two event setting while those for the Fine and Gray model become very unstable, again, suggesting the benefit of including the event of withdrawal. The ROC curves under the two event setting for a few other t and &#8710; values can be found in Appendix G. non-withdrawal event vs after conditioning on the event, it is unclear whether the probability of awakening should increase or decrease.</p><p>Fundamentally, two technical hurdles make reasoning about the non-withdrawal event difficult. First, we do not know what would have happened to patients withdrawn from life-sustaining therapies had they been kept on these therapies instead. Second, we do not know how "good" past decisions on withdrawing life-sustaining therapies are. As an extreme example, suppose that 100% of patients who die from non-withdrawal causes are perfectly identified by physicians in advance that they would have no chance of awakening so they are pulled off life-sustaining therapies, and 100% of patients who would awaken are also perfectly identified by physicians so that they are kept on life-sustaining therapies. In this case, if we condition on the non-withdrawal event, then we would always just get that the conditional probability of awakening is 100%, so our binary classifier in Section 3.1 would not be useful. However, an alternative that would still be useful is to compute the probabilities of awakening and of dying (not of withdrawal from life-sustaining therapies) without conditioning on the non-withdrawal event. These probabilities would still be conditional probabilities since we would condition on the test patient's time series. However, we no longer condition on the non-withdrawal event. We now sketch an approach for computing these probabilities, which in turn leads to a binary classifier different from the one we derived in Section 3.1.</p><p>To begin with, let's consider the probability of awakening within time duration &#8710; for a patient with features observed until time t. We write this probability as P awaken (&#8710; | z, t) = P(earliest competing event that happens excluding withdrawal from life-sustaining therapies is awakening, Y &#8804; t + &#8710;|Z (&#8804;t) = z (&#8804;t) , Y &gt; t) = P(earliest competing event that happens is awakening, Y &#8804; t + &#8710;|Z (&#8804;t) = z (&#8804;t) , Y &gt; t) + P(earliest competing event that happens is withdrawal from life-sustaining therapies followed by awakening, Y &#8804; t + &#8710;|Z (&#8804;t) = z (&#8804;t) , Y &gt; t).</p><p>On the right-hand side above, the first term is just the CIF for awakening (i.e., F 1 (&#8710; | z, t) using equation ( <ref type="formula">1</ref>) and the same notation as in Section 3.1). As for the second term, we now make a major assumption that P(earliest competing event that happens is withdrawal from life-sustaining therapies followed by awakening, Y &#8804; t + &#8710;|Z (&#8804;t) = z (&#8804;t) , Y &gt; t)</p><p>where &#945; &#8712; [0, 1] is a constant that does not depend on anything else, and as a reminder F 3 is the CIF for withdrawal from life-sustaining therapies. The above assumption says that among patients whose earliest competing event is being withdrawn from life-sustaining therapies, each of them has probability &#945; (independent of everything else) of having their second earliest competing event be awakening (so that had withdrawal from life-sustaining therapies not been an option, they would have awaken). In practice, we have no way of estimating &#945; but we can try different values for it. Putting together the pieces, we have</p><p>experience unfavorable outcomes after awakening, and it could be important to try to predict when this happens. One simple approach would be to split up the "awaken" event into different types of "awaken" events, although perhaps a better modeling framework would be to consider multiple critical events that could happen, one after another, rather than constraining the setting to only be on first hitting times.</p><p>As for the specific clinical problem we have focused on, there were a number of limitations related to the dataset we curated. First, the EEG data we have is limited to the first 12 hours after ICU admission. If we were to have more complete EEG data, we would potentially be able to provide more accurate prognoses. Second, for ease of computation, we downsample the EEG time series data. From some preliminary analyses, we found that this did not impact the resulting prediction accuracy much. However, more thorough experiments are needed to better understand the impact of our current downsampling procedure on information loss. We remark that our patient heat map visualization can help us also find when our DCR-derived classifier is actually wrong but it is very confident in its wrong answer (we provide some examples of this in Appendix F). In some such cases, from physician feedback, we have learned that the prognostication task even for physicians could be difficult given only EEG data and the static features we use. In particular, by collecting and using additional patient features (e.g., vitals, whether the patient experienced another cardiac arrest), more accurate predictions should be possible.</p><p>Ultimately, although we believe that our paper takes a step toward more realistically framing the problem of neurological prognostication of post-cardiac-arrest coma patients compared to existing literature, our work has not yet led to a "deployment-ready" solution. Moreover, at this point, it is unclear to what extent the patient-specific heat map visualization we proposed actually helps clinical decision support. By continuing to account for physician feedback, we hope that we can produce a decision support system that is practically useful. User studies with clinicians would be required to assess the impact of such a system on clinical decision making. </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>https://www.nhlbi.nih.gov/health/cardiac-arrest</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>Note that<ref type="bibr">Lee et al. (2019)</ref> explicitly keep track of a separate vector per time step indicating which of the d features are missing. Instead of introducing notation for such a "missingness" boolean vector, we can augment our original feature vector to include such missingness indicator variables.</p></note>
		</body>
		</text>
</TEI>
