Summary Structural learning of Gaussian graphical models in the presence of latent variables has long been a challenging problem. Chandrasekaran et al. (2012) proposed a convex program for estimating a sparse graph plus a low-rank term that adjusts for latent variables; however, this approach poses challenges from both computational and statistical perspectives. We propose an alternative, simple solution: apply a hard-thresholding operator to existing graph selection methods. Conceptually simple and computationally attractive, the approach of thresholding the graphical lasso is shown to be graph selection consistent in the presence of latent variables under a simpler minimum edge strength condition and at an improved statistical rate. The results are extended to estimators for thresholded neighbourhood selection and constrained $$\ell_{1}$$-minimization for inverse matrix estimation as well. We show that our simple thresholded graph estimators yield stronger empirical results than existing methods for the latent variable graphical model problem, and we apply them to a neuroscience case study on estimating functional neural connections.
more »
« less
This content will become publicly available on August 27, 2026
Mitigating Eddington and Malmquist Biases in Latent-Inclination Inference of the Tully-Fisher Relation
The Tully-Fisher relation is a vital distance indicator, but its precise inference is challenged by selection bias, statistical bias, and uncertain inclination corrections. This study presents a Bayesian framework that simultaneously addresses these issues. To eliminate the need for individual inclination corrections, inclination is treated as a latent variable with a known probability distribution. To correct for the distance-dependent Malmqvist bias arising from sample selection, the model incorporates Gaussian scatter in the dependent variable, the distribution of the independent variable, and the observational selection function into the data likelihood. To mitigate the statistical bias -- termed the ``general Eddington bias'' -- caused by Gaussian scatter and the non-uniform distribution of the independent variable, two methods are introduced: (1) analytical bias corrections applied to the dependent variable before likelihood computation, and (2) a dual-scatter model that accounts for Gaussian scatter in the independent variable within the likelihood function. The effectiveness of these methods is demonstrated using simulated datasets. By rigorously addressing selection and statistical biases in a latent-variable regression analysis, this work provides a robust approach for unbiased distance estimates from standardizable candles, which is critical for improving the accuracy of Hubble constant determinations.
more »
« less
- Award ID(s):
- 2103251
- PAR ID:
- 10630958
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- Astrophysical journal
- ISSN:
- 1538-4357
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Recent cosmological analyses measuring distances of type Ia supernovae (SNe Ia) and baryon acoustic oscillations (BAO) have all given similar hints at time-evolving dark energy. To examine whether underestimated SN Ia systematics might be driving these results, Efstathiou (2025) compared overlapping SN events between Pantheon+ and DES-SN5YR (20 per cent SNe are in common), and reported evidence for an $$\sim$$0.04 mag offset between the low- and high-redshift distance measurements of this subsample of events. If this offset is arbitrarily subtracted from the entire DES-SN5YR sample, the preference for evolving dark energy is reduced. In this paper, we show that this offset is mostly due to different corrections for Malmquist bias between the two samples; therefore, an object-to-object comparison can be misleading. Malmquist bias corrections differ between the two analyses for several reasons. First, DES-SN5YR used an improved model of SN Ia luminosity scatter compared to Pantheon+ but the associated scatter-model uncertainties are included in the error budget. Secondly, improvements in host mass estimates in DES-SN5YR also affected SN standardized magnitudes and their bias corrections. Thirdly, and most importantly, the selection functions of the two compilations are significantly different, hence the inferred Malmquist bias corrections. Even if the original scatter model and host properties from Pantheon+ are used instead, the evidence for evolving dark energy from CMB, DESI BAO Year 1 and DES-SN5YR is only reduced from 3.9$$\sigma$$ to 3.3$$\sigma$$, consistent with the error budget. Finally, in this investigation, we identify an underestimated systematic uncertainty related to host galaxy property uncertainties, which could increase the final DES-SN5YR error budget by 3 per cent. In conclusion, we confirm the validity of the published DES-SN5YR results.more » « less
-
This paper studies the unsupervised cross-domain translation problem by proposing a generative framework, in which the probability distribution of each domain is represented by a generative cooperative network that consists of an energy based model and a latent variable model. The use of generative cooperative network enables maximum likelihood learning of the domain model by MCMC teaching, where the energy-based model seeks to fit the data distribution of domain and distills its knowledge to the latent variable model via MCMC. Specifically, in the MCMC teaching process, the latent variable model parameterized by an encoder-decoder maps examples from the source domain to the target domain, while the energy-based model further refines the mapped results by Langevin revision such that the revised results match to the examples in the target domain in terms of the statistical properties, which are defined by the learned energy function. For the purpose of building up a correspondence between two unpaired domains, the proposed framework simultaneously learns a pair of cooperative networks with cycle consistency, accounting for a two-way translation between two domains, by alternating MCMC teaching. Experiments show that the proposed framework is useful for unsupervised image-to-image translation and unpaired image sequence translation.more » « less
-
This software repository provides the Python functions and a Jupyter notebook that implement the latent-variable bias-mitigating inference methods for the Tully-Fisher Relation. The methods are described in Fu (2025), titled "Mitigating Malmquist and Eddington Biases in Latent-Inclination Regression of the Tully-Fisher Relation". Repository DOI: https://doi.org/10.5281/zenodo.16378199more » « less
-
ABSTRACT Valid statistical inference is notoriously challenging when the sample is subject to nonresponse bias. We approach this difficult problem by employing multiple candidate models for the propensity score (PS) function combined with empirical likelihood. By incorporating multiple working PS models into the internal bias calibration constraint in the empirical likelihood, the selection bias can be safely eliminated as long as the working PS models contain the true model and their expectations are equal to the true missing rate. The bias calibration constraint for the multiple PS models is called the multiple bias calibration. The study delves into the asymptotic properties of the proposed method and provides a comparative analysis through limited simulation studies against existing methods. To illustrate practical implementation, we present a real data analysis on body fat percentage using the National Health and Nutrition Examination Survey dataset.more » « less
An official website of the United States government
