Abstract We propose a constrained maximum partial likelihood estimator for dimension reduction in integrative (e.g., pan-cancer) survival analysis with high-dimensional predictors. We assume that for each population in the study, the hazard function follows a distinct Cox proportional hazards model. To borrow information across populations, we assume that each of the hazard functions depend only on a small number of linear combinations of the predictors (i.e., “factors”). We estimate these linear combinations using an algorithm based on “distance-to-set” penalties. This allows us to impose both low-rankness and sparsity on the regression coefficient matrix estimator. We derive asymptotic results that reveal that our estimator is more efficient than fitting a separate proportional hazards model for each population. Numerical experiments suggest that our method outperforms competitors under various data generating models. We use our method to perform a pan-cancer survival analysis relating protein expression to survival across 18 distinct cancer types. Our approach identifies six linear combinations, depending on only 20 proteins, which explain survival across the cancer types. Finally, to validate our fitted model, we show that our estimated factors can lead to better prediction than competitors on four external datasets.
more »
« less
Survival dynamical systems: individual-level survival analysis from population-level epidemic models
In this paper, we show that solutions to ordinary differential equations describing the large-population limits of Markovian stochastic epidemic models can be interpreted as survival or cumulative hazard functions when analysing data on individuals sampled from the population. We refer to the individual-level survival and hazard functions derived from population-level equations as a survival dynamical system (SDS). To illustrate how population-level dynamics imply probability laws for individual-level infection and recovery times that can be used for statistical inference, we show numerical examples based on synthetic data. In these examples, we show that an SDS analysis compares favourably with a complete-data maximum-likelihood analysis. Finally, we use the SDS approach to analyse data from a 2009 influenza A(H1N1) outbreak at Washington State University.
more »
« less
- Award ID(s):
- 1853587
- PAR ID:
- 10183970
- Date Published:
- Journal Name:
- Interface Focus
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2042-8898
- Page Range / eLocation ID:
- 20190048
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose a survival analysis approach for discovering and characterizing user behavior and risks for lending protocols in decentralized finance (DeFi). We demonstrate how to gather and prepare DeFi transaction data for survival analysis. We illustrate our approach using transactions in Aave, one of the largest lending protocols. We develop a DeFi survival analysis pipeline that first prepares transaction data for survival analysis through the selection of different index events (or transactions) and associated outcome events. Then we apply survival analysis statistical and visualization methods modified for competing risks when appropriate, such as Kaplan–Meier survival curves, cumulative incidence functions, Cox hazard regression, and Fine-Gray models for sub-distribution hazards to gain insights into usage patterns and risks within the protocol. We show how, by varying the index and outcome events as well as covariates, we can use DeFi survival analysis to answer questions like “How does loan size affect the repayment schedule of the loan?”; “How does loan size affect the likelihood that an account gets liquidated?”; “How does user behavior vary between Aave markets?”; “How has user behavior in Aave varied from quarter to quarter?” The proposed DeFi survival analysis can easily be generalized to other DeFi lending protocols. By defining appropriate index and outcome events, DeFi survival analysis can be applied to any cryptocurrency protocol with transactions.more » « less
-
We present a new method for analysing stochastic epidemic models under minimal assumptions. The method, dubbed dynamic survival analysis (DSA), is based on a simple yet powerful observation, namely that population-level mean-field trajectories described by a system of partial differential equations may also approximate individual-level times of infection and recovery. This idea gives rise to a certain non-Markovian agent-based model and provides an agent-level likelihood function for a random sample of infection and/or recovery times. Extensive numerical analyses on both synthetic and real epidemic data from foot-and-mouth disease in the UK (2001) and COVID-19 in India (2020) show good accuracy and confirm the method’s versatility in likelihood-based parameter estimation. The accompanying software package gives prospective users a practical tool for modelling, analysing and interpreting epidemic data with the help of the DSA approach.more » « less
-
We propose a method to quantify uncertainty around individual survival distribution estimates using right-censored data, compatible with any survival model. Unlike classical confidence intervals, the survival bands produced by this method offer predictive rather than population-level inference, making them useful for personalized risk screening. For example, in a low-risk screening scenario, they can be applied to flag patients whose survival band at 12 months lies entirely above 50\%, while ensuring that at least half of flagged individuals will survive past that time on average. Our approach builds on recent advances in conformal inference and integrates ideas from inverse probability of censoring weighting and multiple testing with false discovery rate control. We provide asymptotic guarantees and show promising performance in finite samples with both simulated and real data.more » « less
-
The Dynamical Survival Analysis (DSA) is a framework for modeling epidemics based on mean field dynamics applied to individual (agent) level history of infection and recovery. Recently, the Dynamical Survival Analysis (DSA) method has been shown to be an effective tool in analyzing complex non-Markovian epidemic processes that are otherwise difficult to handle using standard methods. One of the advantages of Dynamical Survival Analysis (DSA) is its representation of typical epidemic data in a simple although not explicit form that involves solutions of certain differential equations. In this work we describe how a complex non-Markovian Dynamical Survival Analysis (DSA) model may be applied to a specific data set with the help of appropriate numerical and statistical schemes. The ideas are illustrated with a data example of the COVID-19 epidemic in Ohio.more » « less
An official website of the United States government

