The optimal receiver operating characteristic (ROC) curve, giving
the maximum probability of detection as a function of the
probability of false alarm, is a key information-theoretic indicator
of the difficulty of a binary hypothesis testing problem (BHT). It
is well known that the optimal ROC curve for a given BHT,
corresponding to the likelihood ratio test, is theoretically
determined by the probability distribution of the observed data
under each of the two hypotheses. In some cases, these two
distributions may be unknown or computationally intractable, but
independent samples of the likelihood ratio can be observed. This
raises the problem of estimating the optimal ROC for a BHT from such
samples. The maximum likelihood estimator of the optimal ROC curve
is derived, and it is shown to converge to the true optimal ROC
curve in the \levy\ metric, as the number of observations tends to
infinity. A classical empirical estimator, based on estimating the
two types of error probabilities from two separate sets of samples,
is also considered. The maximum likelihood estimator is observed in
simulation experiments to be considerably more accurate than the
empirical estimator, especially when the number of samples obtained
under one of the two hypotheses is small. The area under the
maximum likelihood estimator is derived; it is a consistent
estimator of the true area under the optimal ROC curve.
more »
« less
Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model
More Like this
-
-
Abstract Motivation Cells in an organism share a common evolutionary history, called cell lineage tree. Cell lineage tree can be inferred from single cell genotypes at genomic variation sites. Cell lineage tree inference from noisy single cell data is a challenging computational problem. Most existing methods for cell lineage tree inference assume uniform uncertainty in genotypes. A key missing aspect is that real single cell data usually has non-uniform uncertainty in individual genotypes. Moreover, existing methods are often sampling based and can be very slow for large data. Results In this article, we propose a new method called ScisTree, which infers cell lineage tree and calls genotypes from noisy single cell genotype data. Different from most existing approaches, ScisTree works with genotype probabilities of individual genotypes (which can be computed by existing single cell genotype callers). ScisTree assumes the infinite sites model. Given uncertain genotypes with individualized probabilities, ScisTree implements a fast heuristic for inferring cell lineage tree and calling the genotypes that allow the so-called perfect phylogeny and maximize the likelihood of the genotypes. Through simulation, we show that ScisTree performs well on the accuracy of inferred trees, and is much more efficient than existing methods. The efficiency of ScisTree enables new applications including imputation of the so-called doublets. Availability and implementation The program ScisTree is available for download at: https://github.com/yufengwudcs/ScisTree. Supplementary information Supplementary data are available at Bioinformatics online.more » « less