Predicting the occurrence of a particular event of interest at future time points is the primary goal of survival analysis. The presence of incomplete observations due to time limitations or loss of data traces is known as censoring which brings unique challenges in this domain and differentiates survival analysis from other standard regression methods. The popularly used survival analysis methods such as Cox proportional hazard model and parametric survival regression suffer from some strict assumptions and hypotheses that are not realistic in most of the real-world applications. To overcome the weaknesses of these two types of methods, in this paper, we reformulate the survival analysis problem as a multi-task learning problem and propose a new multi-task learning based formulation to predict the survival time by estimating the survival status at each time interval during the study duration. We propose an indicator matrix to enable the multi-task learning algorithm to handle censored instances and incorporate some of the important characteristics of survival problems such as non-negative non-increasing list structure into our model through max-heap projection. We employ the L2,1-norm penalty which enables the model to learn a shared representation across related tasks and hence select important features and alleviate over-fitting inmore »
Tensor-based Temporal Multi-Task Survival Analysis
Survival analysis aims at predicting time to event of interest along with its probability on longitudinal data. It is commonly used to make predictions for a single specific event of interest at a given time point. However, predicting the occurrence of multiple events simultaneously and dynamically is needed in many applications. An intuitive way to solve this problem is to simply apply the regular survival analysis method independently to each task at each time point. However, it often leads to a suboptimal solution since the underlying dependencies between tasks are ignored, which motivates us to analyze these tasks jointly to select common features shared across all tasks. In this paper, we formulate a temporal Multi-Task learning framework (MTMT) using tensor representation. More specifically, given a survival dataset and a sequence of time points, which are considered as the monitored time points, we model each task at each time point as a regular survival analysis problem and optimize them simultaneously. We demonstrate the performance of MTMT model on two real-world datasets. We show the superior performance of the MTMT model compared to several state-of-the-art models. We also provide the list of important features selected to demonstrate the interpretability of our model.
- Publication Date:
- NSF-PAR ID:
- 10143376
- Journal Name:
- IEEE Transactions on Knowledge and Data Engineering
- Page Range or eLocation-ID:
- 1 to 1
- ISSN:
- 1041-4347
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A key assumption in multi-task learning is that at the inference time the multi-task model only has access to a given data point but not to the data point’s labels from other tasks. This presents an opportunity to extend multi-task learning to utilize data point’s labels from other auxiliary tasks, and this way improves performance on the new task. Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions on the new task. We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks and thus allows us to leverage labels from auxiliary tasks. The knowledge graph consists of two types of nodes: (1) data nodes, where node features are data embeddings computed by the neural network, and (2) task nodes, with the last layer’s weights for each task as node features. The edges in this knowledge graph capture data-task relationships, and the edge label captures the label of a data point on a particular task. Under MetaLink, we reformulate the new task as a link label prediction problem between a data node and a task node. The MetaLinkmore »
-
Urban dispersal events occur when an unexpectedly large number of people leave an area in a relatively short period of time. It is beneficial for the city authorities, such as law enforcement and city management, to have an advance knowledge of such events, as it can help them mitigate the safety risks and handle important challenges such as managing traffic, and so forth. Predicting dispersal events is also beneficial to Taxi drivers and/or ride-sharing services, as it will help them respond to an unexpected demand and gain competitive advantage. Large urban datasets such as detailed trip records and point of interest ( POI ) data make such predictions achievable. The related literature mainly focused on taxi demand prediction. The pattern of the demand was assumed to be repetitive and proposed methods aimed at capturing those patterns. However, dispersal events are, by definition, violations of those patterns and are, understandably, missed by the methods in the literature. We proposed a different approach in our prior work [32]. We showed that dispersal events can be predicted by learning the complex patterns of arrival and other features that precede them in time. We proposed a survival analysis formulation of this problem and proposedmore »
-
Obeid, Iyad Selesnick (Ed.)Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy [1]. As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency [2]. However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours [3]. Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor [4] and Persyst 14 [5]. In this abstract, we describe our efforts to transform a high-performance offline seizure detection system [3] into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing [3]. The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC) [6] features from each EEGmore »
-
Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should producemore »