Abstract High-fidelity simulators that connect theoretical models with observations are indispensable tools in many sciences. If the likelihood is known, inference can proceed using standard techniques. However, when the likelihood is intractable or unknown, a simulator makes it possible to infer the parameters of a theoretical model directly from real and simulated observations when coupled with machine learning. We introduce an extension of the recently proposed likelihood-free frequentist inference (LF2I) approach that makes it possible to construct confidence sets with thep-value function and to use the same function to check the coverage explicitly at any given parameter point. LikeLF2I, this extension yields provably valid confidence sets in parameter inference problems for which a high-fidelity simulator is available. The utility of our algorithm is illustrated by applying it to three pedagogically interesting examples: the first is from cosmology, the second from high-energy physics and astronomy, both with tractable likelihoods, while the third, with an intractable likelihood, is from epidemiology33Code to reproduce all of our results is available onhttps://github.com/AliAlkadhim/ALFFI..
more »
« less
Towards instance-wise calibration: local amortized diagnostics and reshaping of conditional densities (LADaR)
Abstract Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variableYgiven complex inputsX. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for allx, and when needed, to reshape the densities ofytoward ‘instance-wise’ calibration. This paper introduces the local amortized diagnostics and reshaping of conditional densities (LADaR) framework and proposes a new computationally efficient algorithm (Cal-PIT) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs).Cal-PITlearns a single interpretable local probability–probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, whereCal-PITachieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyzes99Code available as a Python package here:https://github.com/lee-group-cmu/Cal-PIT..
more »
« less
- Award ID(s):
- 2009251
- PAR ID:
- 10650966
- Publisher / Repository:
- IOP Publishing
- Date Published:
- Journal Name:
- Machine Learning: Science and Technology
- Volume:
- 6
- Issue:
- 4
- ISSN:
- 2632-2153
- Format(s):
- Medium: X Size: Article No. 045058
- Size(s):
- Article No. 045058
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Reliable studies of the long-term dynamics of planetary systems require numerical integrators that are accurate and fast. The challenge is often formidable because the chaotic nature of many systems requires relative numerical error bounds at or close to machine precision (∼10−16, double-precision arithmetic); otherwise, numerical chaos may dominate over physical chaos. Currently, the speed/accuracy demands are usually only met by symplectic integrators. For example, the most up-to-date long-term astronomical solutions for the solar system in the past (widely used in, e.g., astrochronology and high-precision geological dating) have been obtained using symplectic integrators. However, the source codes of these integrators are unavailable. Here I present the symplectic integratororbitN(lean version 1.0) with the primary goal of generating accurate and reproducible long-term orbital solutions for near-Keplerian planetary systems (here the solar system) with a dominant massM0. Among other features,orbitN-1.0includesM0’s quadrupole moment, a lunar contribution, and post-Newtonian corrections (1PN) due toM0(fast symplectic implementation). To reduce numerical round-off errors, Kahan compensated summation was implemented. I useorbitNto provide insight into the effect of various processes on the long-term chaos in the solar system. Notably, 1PN corrections have the opposite effect on chaoticity/stability on a 100 Myr versus Gyr timescale. For the current application,orbitNis about as fast as or faster (factor 1.15–2.6) than comparable integrators, depending on hardware.11The orbitN source code (C) is available athttp://github.com/rezeebe/orbitN.more » « less
-
Abstract Ground-based high-resolution cross-correlation spectroscopy (HRCCS;R ≳ 15,000) is a powerful complement to space-based studies of exoplanet atmospheres. By resolving individual spectral lines, HRCCS can precisely measure chemical abundance ratios, directly constrain atmospheric dynamics, and robustly probe multidimensional physics. But the subtleties of HRCCS data sets—e.g., the lack of exoplanetary spectra visible by eye and the statistically complex process of telluric removal—can make interpreting them difficult. In this work, we seek to clarify the uncertainty budget of HRCCS with a forward-modeling approach. We present an HRCCS observation simulator,scope,55https://github.com/arjunsavel/scopethat incorporates spectral contributions from the exoplanet, star, tellurics, and instrument. This tool allows us to control the underlying data set, enabling controlled experimentation with complex HRCCS methods. Simulating a fiducial hot Jupiter data set (WASP-77Ab emission with IGRINS), we first confirm via multiple tests that the commonly used principal component analysis does not bias the planetary signal when few components are used. Furthermore, we demonstrate that mildly varying tellurics and moderate wavelength solution errors induce only mild decreases in HRCCS detection significance. However, limiting-case, strongly varying tellurics can bias the retrieved velocities and gas abundances. Additionally, in the low signal-to-noise ratio limit, constraints on gas abundances become highly non-Gaussian. Our investigation of the uncertainties and potential biases inherent in HRCCS data analysis enables greater confidence in scientific results from this maturing method.more » « less
-
Abstract While space-borne optical and near-infrared facilities have succeeded in delivering a precise and spatially resolved picture of our Universe, their small survey area is known to underrepresent the true diversity of galaxy populations. Ground-based surveys have reached comparable depths but at lower spatial resolution, resulting in source confusion that hampers accurate photometry extractions. What once was limited to the infrared regime has now begun to challenge ground-based ultradeep surveys, affecting detection and photometry alike. Failing to address these challenges will mean forfeiting a representative view into the distant Universe. We introduceThe Farmer: an automated, reproducible profile-fitting photometry package that pairs a library of smooth parametric models fromThe Tractorwith a decision tree that determines the best-fit model in concert with neighboring sources. Photometry is measured by fitting the models on other bands leaving brightness free to vary. The resulting photometric measurements are naturally total, and no aperture corrections are required. Supporting diagnostics (e.g.,χ2) enable measurement validation. As fitting models is relatively time intensive,The Farmeris built with high-performance computing routines. We benchmarkThe Farmeron a set of realistic COSMOS-like images and find accurate photometry, number counts, and galaxy shapes.The Farmeris already being utilized to produce catalogs for several large-area deep extragalactic surveys where it has been shown to tackle some of the most challenging optical and near-infrared data available, with the promise of extending to other ultradeep surveys expected in the near future.The Farmeris available to download from GitHub (https://github.com/astroweaver/the_farmer) and Zenodo (https://doi.org/10.5281/zenodo.8205817).more » « less
-
Accurate prediction of citywide crowd activity levels (CALs),i.e., the numbers of participants of citywide crowd activities under different venue categories at certain time and locations, is essential for the city management, the personal service applications, and the entrepreneurs in commercial strategic planning. Existing studies have not thoroughly taken into account the complex spatial and temporal interactions among different categories of CALs and their extreme occurrences, leading to lowered adaptivity and accuracy of their models. To address above concerns, we have proposedIE-CALP, a novel spatio-temporalInteractive attention-based andExtreme-aware model forCrowdActivityLevelPrediction. The tasks ofIE-CALPconsist of(a)forecasting the spatial distributions of various CALs at different city regions (spatial CALs), and(b)predicting the number of participants per category of the CALs (categorical CALs). To realize above, we have designed a novel spatial CAL-POI interaction-attentive learning component inIE-CALPto model the spatial interactions across different CAL categories, as well as those among the spatial urban regions and CALs. In addition,IE-CALPincorporate the multi-level trends (e.g., daily and weekly levels of temporal granularity) of CALs through a multi-level temporal feature learning component. Furthermore, to enhance the model adaptivity to extreme CALs (e.g., during extreme urban events or weather conditions), we further take into account theextreme value theoryand model the impacts of historical CALs upon the occurrences of extreme CALs. Extensive experiments upon a total of 738,715 CAL records and 246,660 POIs in New York City (NYC), Los Angeles (LA), and Tokyo have further validated the accuracy, adaptivity, and effectiveness ofIE-CALP’s interaction-attentive and extreme-aware CAL predictions.more » « less
An official website of the United States government
