skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on November 4, 2026

Title: Bayesian Functional Data Analysis in Astronomy
Cosmic demographics—the statistical study of populations of astrophysical objects—has long relied on tools from multivariate statistics for analyzing data comprising fixed-length vectors of properties of objects, as might be compiled in a tabular astronomical catalog (say, with sky coordinates, and brightness measurements in a fixed number of spectral passbands). But beginning with the emergence of automated digital sky surveys, ca. 2000, astronomers began producing large collections of data with more complex structures: light curves (brightness time series) and spectra (brightness vs. wavelength). These comprise what statisticians call functional data—measurements of populations of functions. Upcoming automated sky surveys will soon provide astronomers with a flood of functional data. New methods are needed to accurately and optimally analyze large ensembles of light curves and spectra, accumulating information both along individual measured functions and across a population of such functions. Functional data analysis (FDA) provides tools for statistical modeling of functional data. Astronomical data presents several challenges for FDA methodology, e.g., sparse, irregular, and asynchronous sampling, and heteroscedastic measurement error. Bayesian FDA uses hierarchical Bayesian models for function populations, and is well suited to addressing these challenges. We provide an overview of astronomical functional data and some key Bayesian FDA modeling approaches, including functional mixed effects models, and stochastic process models. We briefly describe a Bayesian FDA framework combining FDA and machine learning methods to build low-dimensional parametric models for galaxy spectra.  more » « less
Award ID(s):
1814840 2206339 2210790
PAR ID:
10647619
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
MDPI - Physical Sciences Forum
Date Published:
Volume:
12
Issue:
1
Page Range / eLocation ID:
12
Subject(s) / Keyword(s):
astrostatistics time series spectroscopy Bayesian data analysis functional data analysis hierarchical Bayesian modeling Gaussian processes dimension reduction manifold learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Research in astronomy is undergoing a major paradigm shift, transformed by the advent of large, automated, sky-surveys into a data-rich field where multi-TB to PB-sized spatio-temporal data sets are commonplace. For example the Legacy Survey of Space and Time; LSST) is about to begin delivering observations of >10^10 objects, including a database with >4 x 10^13 rows of time series data. This volume presents a challenge: how should a domain-scientist with little experience in data management or distributed computing access data and perform analyses at PB-scale? We present a possible solution to this problem built on (adapted) industry standard tools and made accessible through web gateways. We have i) developed Astronomy eXtensions for Spark, AXS, a series of astronomy-specific modifications to Apache Spark allowing astronomers to tap into its computational scalability ii) deployed datasets in AXS-queriable format in Amazon S3, leveraging its I/O scalability, iii) developed a deployment of Spark on Kubernetes with auto-scaling configurations requiring no end-user interaction, and iv) provided a Jupyter notebook, web-accessible, front-end via JupyterHub including a rich library of pre-installed common astronomical software (accessible at http://hub.dirac.institute). We use this system to enable the analysis of data from the Zwicky Transient Facility, presently the closest precursor survey to the LSST, and discuss initial results. To our knowledge, this is a first application of cloud-based scalable analytics to astronomical datasets approaching LSST-scale. The code is available at https://github.com/astronomy-commons. 
    more » « less
  2. Functional data contains two components: shape (or amplitude) and phase. This paper focuses on a branch of functional data analysis (FDA), namely Shape-Based FDA, that isolates and focuses on shapes of functions. Specifically, this paper focuses on Scalar-on-Shape (ScoSh) regression models that incorporate the shapes of predictor functions and discard their phases. This aspect sets ScoSh models apart from the traditional Scalar-on-Function (ScoF) regression models that incorporate full predictor functions. ScoSh is motivated by object data analysis, {\it, e.g.}, for neuro-anatomical objects, where object morphologies are relevant and their parameterizations are arbitrary. ScoSh also differs from methods that arbitrarily pre-register data and uses it in subsequent analysis. In contrast, ScoSh models perform registration during regression, using the (non-parametric) Fisher-Rao inner product and nonlinear index functions to capture complex predictor-response relationships. This formulation results in novel concepts of {\it regression phase} and {\it regression mean} of functions. Regression phases are time-warpings of predictor functions that optimize prediction errors, and regression means are optimal regression coefficients. We demonstrate practical applications of the ScoSh model using extensive simulated and real-data examples, including predicting COVID outcomes when daily rate curves are predictors. 
    more » « less
  3. Abstract A reexamination of period-finding algorithms is prompted by new large-area astronomical sky surveys that can identify billions of individual sources having a thousand or more observations per source. This large increase in data necessitates fast and efficient period detection algorithms. In this paper, we provide an initial description of an algorithm that is being used for the detection of periodic behavior in a sample of 1.5 billion objects using light curves generated from Zwicky Transient Facility (ZTF) data. We call this algorithm “Fast Periodicity Weighting” (FPW), derived using a Gaussian Process formalism. Periodic sources in ZTF show a wide variety of waveforms, some quite complex, including eclipsing objects, sinusoidally varying objects also exhibiting eclipses, objects with cyclotron emission at various phases, and accreting objects with complex waveforms. A major advantage of the FPW algorithm is that it is sensitive to a broad range of waveforms. We describe the FPW algorithm and its application to ZTF, and provide efficient code for both CPU and GPU. 
    more » « less
  4. Wide-field astronomical surveys are often affected by the presence of undesirable reflections (often known as “ghosting artifacts” or “ghosts”) and scattered-light artifacts. The identification and mitigation of these artifacts is important for rigorous astronomical analyses of faint and low-surface-brightness systems. In this work, we use images from the Dark Energy Survey (DES) to train, validate, and test a deep neural network (Mask R-CNN) to detect and localize ghosts and scatteredlight artifacts. We find that the ability of the Mask R-CNN model to identify affected regions is superior to that of conventional algorithms that model the physical processes that lead to such artifacts, thus providing a powerful technique for the automated detection of ghosting and scattered-light artifacts in current and near-future surveys. 
    more » « less
  5. Abstract Low luminosity active galactic nuclei (LLAGN) probe accretion physics in the low Eddington regime can provide additional clues about galaxy evolution. AGN variability is ubiquitous and thus provides a reliable tool for finding AGN. We analyze the All-Sky Automated Survey for SuperNovae light curves of 1218 galaxies withg< 14 mag and Sloan Digital Sky Survey spectra in search of AGN. We find 37 objects that are both variable and have AGN-like structure functions, which is about 3% of the sample. The majority of the variability selected AGN are LLAGN with Eddington ratios ranging from 10−4to 10−2. We thus estimate the fraction of LLAGN in the population of galaxies as 2% down to a median Eddington ratio of 2 × 10−3. Combining the BPT line ratio AGN diagnostics and the broad-line AGN, up to ∼60% of the AGN candidates are confirmed spectroscopically. The BPT diagnostics also classified 10%–30% of the candidates as star-forming galaxies rather than AGN. 
    more » « less