In this article, we propose an omnibus test for comparing two survival functions under non-proportional hazards. The test statistic is based on a product-limit estimate of the restricted distance correlation, which is closely related to the distance between survival curves. The strong consistency is established under mild regularity conditions. Our simulation studies show that the new test has satisfactory power under proportional hazard and various non-proportional hazards settings including delayed treatment effect, diminishing effect, and crossing survival curves; therefore, it can be a competitive alternative to the existing omnibus tests such as Kolmogorov-Smirnov test, Cramer-von Mises test, two-stage test, and the maxCombo test based on weighted log-rank statistics. Two extensions of the new test to one-sided alternatives and a Gaussian kernel are also discussed
more »
« less
Fasano-Franceschini Test: an Implementation of a 2-Dimensional Kolmogorov-Smirnov test in R
The univariate Kolmogorov-Smirnov (KS) test is a non-parametric statistical test designed to assess whether two samples come from the same underlying distribution. The versatility of the KS test has made it a cornerstone of statistical analysis across the scientific disciplines. However, the test proposed by Kolmogorov and Smirnov does not naturally extend to multidimensional distributions. Here, we present the fasano.franceschini.test package, an R implementation of the 2-D KS two-sample test as defined by Fasano and Franceschini (Fasano and Franceschini 1987) and provide multiple use cases across the scientific disciplines. The fasano.franceschini.test package provides three improvements over the current 2-D KS test on the Comprehensive R Archive Network (CRAN): (i) the Fasano and Franceschini test has been shown to run in O(n2) versus the Peacock implementation which runs in O(n3); (ii) the package implements a procedure for handling ties in the data; and (iii) the package implements a parallelized permutation procedure for improved significance testing. Ultimately, the fasano.franceschini.test package presents a robust statistical test for analyzing random samples defined in 2-dimensions.
more »
« less
- Award ID(s):
- 1764421
- PAR ID:
- 10336398
- Date Published:
- Journal Name:
- ArXivorg
- ISSN:
- 2331-8422
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Functional dependency can lead to discoveries of new mechanisms not possible via symmetric association. Most asymmetric methods for causal direction inference are not driven by the function-versus-independence question. A recent exact functional test (EFT) was designed to detect functionally dependent patterns model-free with an exact null distribution. However, the EFT lacked a theoretical justification, had not been compared with other asymmetric methods, and was practically slow. Here, we prove the functional optimality of the EFT statistic, demonstrate its advantage in functional inference accuracy over five other methods, and develop a branch-and-bound algorithm with dynamic and quadratic programming to run at orders of magnitude faster than its previous implementation. Our results make it practical to answer the exact functional dependency question arising from discovery-driven artificial intelligence applications. Software that implements EFT is freely available in the R package 'FunChisq' (≥2.5.0) at https://cran.r-project.org/package=FunChisqmore » « less
-
We study active learning methods for single index models of the form $$F({\bm x}) = f(\langle {\bm w}, {\bm x}\rangle)$$, where $$f:\mathbb{R} \to \mathbb{R}$$ and $${\bx,\bm w} \in \mathbb{R}^d$$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting. We provide two main results on agnostic active learning of single index models. First, when $$f$$ is known and Lipschitz, we show that $$\tilde{O}(d)$$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent $${O}(d^{2})$$ bound of \cite{gajjar2023active}. Second, we show that $$\tilde{O}(d)$$ samples suffice even in the more difficult setting when $$f$$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions.more » « less
-
Two-sample tests are widely used in hydrologic and climate studies to investigate whether two samples of a variable of interest could be considered drawn from different populations. Despite this, the information on the power (i.e., the probability of correctly rejecting the null hypothesis) of these tests applied to hydroclimatic variables is limited. Here, this need is addressed considering four popular two-sample tests applied to daily and extreme precipitation, and annual peak flow series. The chosen tests assess differences in location (t-Student and Wilcoxon) and distribution (Kolmogorov–Smirnov and likelihood-ratio). The power was quantified through Monte Carlo simulations relying on pairs of realistic samples of the three variables with equal size, generated with a procedure based on suitable parametric distributions and copulas. After showing that differences in sample skewness are monotonically related to differences in spread, power surfaces were built as a function of the relative changes in location and spread of the samples and utilized to interpret three case studies comparing samples of observed precipitation and discharge series in the U.S. It was found that (1) the t-Student applied to the log-transformed samples has the same power as the Wilcoxon test; (2) location (distribution) tests perform better than distribution (location) tests for small (moderate-to-large) differences in spread and skewness; (3) the power is relatively lower (higher) if the differences in location and spread or skewness have concordant (discordant) sign; and (4) the power increases with the sample size but could be quite low for tests applied to extreme precipitation and discharge records that are commonly short. This work provides useful recommendations for selecting and interpreting two-sample tests in a broad range of hydroclimatic applications.more » « less
-
A s a c om pl e men t t o da ta d edupli cat ion , de lta c om p ress i on fu r- t he r r edu c es t h e dat a vo l u m e by c o m pr e ssi n g n o n - dup li c a t e d ata chunk s r e l a t iv e to t h e i r s i m il a r chunk s (bas e chunk s). H ow ever, ex is t i n g p o s t - d e dup li c a t i o n d e l t a c o m pr e ssi o n a p- p ro a ches fo r bac kup s t or ag e e i t h e r su ffe r f ro m t h e l ow s i m - il a r i t y b e twee n m any de te c ted c hun ks o r m i ss so me po t e n - t i a l s i m il a r c hunks , o r su ffer f r om l ow (ba ckup and r es t ore ) th r oug hpu t du e t o extr a I/ Os f or r e a d i n g b a se c hun ks o r a dd a dd i t i on a l s e r v i c e - d i s r up t ive op e r a t i on s to b a ck up s ys t em s. I n t h i s pa p e r, w e pr opo se L oop D e l t a t o a dd ress the above - m e n t i on e d prob l e m s by an e nha nced em b e ddi n g d e l t a c o m p - r e ss i on sc heme i n d e dup li c a t i on i n a non - i n t ru s ive way. T h e e nha nce d d elt a c o mpr ess ion s che m e co m b in e s f our key t e c h - ni qu e s : (1) du a l - l o c a li t y - b a s e d s i m il a r i t y t r a c k i n g to d e t ect po t e n t i a l si m il a r chun k s b y e x p l o i t i n g both l o g i c a l and ph y - s i c a l l o c a li t y, ( 2 ) l o c a li t y - a wa r e pr e f e t c h i n g to pr efe tc h ba se c hun ks to a vo i d ex t ra I/ Os fo r r e a d i n g ba s e chun ks on t h e w r i t e p at h , (3) c a che -aware fil t e r to avo i d ext r a I/Os f or b a se c hunk s on t he read p at h, a nd (4) i nver sed de l ta co mpressi on t o perf orm de lt a co mpress i o n fo r d at a chunk s t hat a re o th e r wi se f o r b i dd e n to s er ve as ba se c hunk s by r ew r i t i n g t e c hn i qu e s d e s i g n e d t o i m p r ove r es t o re pe rf o rma nc e. E x p e r i m e n t a l re su lts indi ca te t hat L oop D e l t a i ncr ea se s t he c o m pr e ss i o n r a t i o by 1 .2410 .97 t i m e s on t op of d e dup li c a - t i on , wi t hou t no t a b l y a ffe c t i n g th e ba ck up th rou ghpu t, a nd i t i m p r ove s t he res to re p er fo r m an ce b y 1.23.57 t i m emore » « less
An official website of the United States government

