Abstract We introduce an end-to-end computational framework that allows for hyperparameter optimization using theDeepHyperlibrary, accelerated model training, and interpretable AI inference. The framework is based on state-of-the-art AI models includingCGCNN,PhysNet,SchNet,MPNN,MPNN-transformer, andTorchMD-NET. We employ these AI models along with the benchmarkQM9,hMOF, andMD17datasets to showcase how the models can predict user-specified material properties within modern computing environments. We demonstrate transferable applications in the modeling of small molecules, inorganic crystals and nanoporous metal organic frameworks with a unified, standalone framework. We have deployed and tested this framework in the ThetaGPU supercomputer at the Argonne Leadership Computing Facility, and in the Delta supercomputer at the National Center for Supercomputing Applications to provide researchers with modern tools to conduct accelerated AI-driven discovery in leadership-class computing environments. We release these digital assets as open source scientific software in GitLab, and ready-to-use Jupyter notebooks in Google Colab.
more »
« less
An Interpretable Machine-learning Framework for Modeling High-resolution Spectroscopic Data*
Abstract Comparison of échelle spectra to synthetic models has become a computational statistics challenge, with over 10,000 individual spectral lines affecting a typical cool star échelle spectrum. Telluric artifacts, imperfect line lists, inexact continuum placement, and inflexible models frustrate the scientific promise of these information-rich data sets. Here we debut an interpretable machine-learning frameworkblaséthat addresses these and other challenges. The semiempirical approach can be viewed as “transfer learning”—first pretraining models on noise-free precomputed synthetic spectral models, then learning the corrections to line depths and widths from whole-spectrum fitting to an observed spectrum. The auto-differentiable model employs back-propagation, the fundamental algorithm empowering modern deep learning and neural networks. Here, however, the 40,000+ parameters symbolize physically interpretable line profile properties such as amplitude, width, location, and shape, plus radial velocity and rotational broadening. This hybrid data-/model-driven framework allows joint modeling of stellar and telluric lines simultaneously, a potentially transformative step forward for mitigating the deleterious telluric contamination in the near-infrared. Theblaséapproach acts as both a deconvolution tool and semiempirical model. The general-purpose scaffolding may be extensible to many scientific applications, including precision radial velocities, Doppler imaging, chemical abundances for Galactic archeology, line veiling, magnetic fields, and remote sensing. Its sparse-matrix architecture and GPU acceleration makeblaséfast. The open-source PyTorch-based codeblaseincludes tutorials, Application Programming Interface documentation, and more. We show how the tool fits into the existing Python spectroscopy ecosystem, demonstrate a range of astrophysical applications, and discuss limitations and future extensions.
more »
« less
- Award ID(s):
- 1910969
- PAR ID:
- 10387356
- Publisher / Repository:
- DOI PREFIX: 10.3847
- Date Published:
- Journal Name:
- The Astrophysical Journal
- Volume:
- 941
- Issue:
- 2
- ISSN:
- 0004-637X
- Format(s):
- Medium: X Size: Article No. 200
- Size(s):
- Article No. 200
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The third data release (DR3) of Gaia has provided a fivefold increase in the number of radial velocity measurements of stars, as well as a stark improvement in parallax and proper motion measurements. To help with studies that seek to test models and interpret Gaia DR3, we present nine Gaia synthetic surveys, based on three solar positions in three Milky Way-mass galaxies of theLattesuite of theFire-2 cosmological simulations. These synthetic surveys match the selection function, radial velocity measurements, and photometry of Gaia DR3, adapting the code baseAnanke, previously used to match the Gaia DR2 release by Sanderson et al. The synthetic surveys are publicly available and can be found athttp://ananke.hub.yt/. Similarly to the previous release ofAnanke, these surveys are based on cosmological simulations and thus are able to model nonequilibrium dynamical effects, making them a useful tool in testing and interpreting Gaia DR3.more » « less
-
Abstract Ground-based high-resolution cross-correlation spectroscopy (HRCCS;R ≳ 15,000) is a powerful complement to space-based studies of exoplanet atmospheres. By resolving individual spectral lines, HRCCS can precisely measure chemical abundance ratios, directly constrain atmospheric dynamics, and robustly probe multidimensional physics. But the subtleties of HRCCS data sets—e.g., the lack of exoplanetary spectra visible by eye and the statistically complex process of telluric removal—can make interpreting them difficult. In this work, we seek to clarify the uncertainty budget of HRCCS with a forward-modeling approach. We present an HRCCS observation simulator,scope,55https://github.com/arjunsavel/scopethat incorporates spectral contributions from the exoplanet, star, tellurics, and instrument. This tool allows us to control the underlying data set, enabling controlled experimentation with complex HRCCS methods. Simulating a fiducial hot Jupiter data set (WASP-77Ab emission with IGRINS), we first confirm via multiple tests that the commonly used principal component analysis does not bias the planetary signal when few components are used. Furthermore, we demonstrate that mildly varying tellurics and moderate wavelength solution errors induce only mild decreases in HRCCS detection significance. However, limiting-case, strongly varying tellurics can bias the retrieved velocities and gas abundances. Additionally, in the low signal-to-noise ratio limit, constraints on gas abundances become highly non-Gaussian. Our investigation of the uncertainties and potential biases inherent in HRCCS data analysis enables greater confidence in scientific results from this maturing method.more » « less
-
Abstract The radius of maximum windRmax, an important parameter in tropical cyclone (TC) ocean surface wind structure, is currently resolved by only a few sensors so that, in most cases, it is estimated subjectively or via crude statistical models. Recently, a semiempirical model relying on an outer wind radius, intensity, and latitude was fit to best-track data. In this study we revise this semiempirical model and discuss its physical basis. While intensity and latitude are taken from best-track data,Rmaxobservations from high-resolution (3 km) spaceborne synthetic aperture radar (SAR) and wind radii from an intercalibrated dataset of medium-resolution radiometers and scatterometers are considered to revise the model coefficients. The new version of the model is then applied to the period 2010–20 and yieldsRmaxreanalyses and trends that are more accurate than best-track data. SAR measurements corroborate that fundamental conservation principles constrain the radial wind structure on average, endorsing the physical basis of the model. Observations highlight that departures from the average conservation situation are mainly explained by wind profile shape variations, confirming the model’s physical basis, which further shows that radial inflow, boundary layer depth, and drag coefficient also play roles. Physical understanding will benefit from improved observations of the near-core region from accumulated SAR observations and future missions. In the meantime, the revised model offers an efficient tool to provide guidance onRmaxwhen a radiometer or scatterometer observation is available, for either operations or reanalysis purposes.more » « less
-
Abstract Multiwavelength observations are now the norm for studying blazars’ various states of activity, classifying them, and determining the possible underlying physical processes driving their emission. Broadband emission models became unavoidable tools for testing emission scenarios and setting the values of physical quantities such as the magnetic field strength, Doppler factor, or shape of the particle distribution of the emission zone(s). We announce here the first public release of a new tool,Bjet_MCMC, that can automatically fit the broadband spectral energy distributions (SEDs) of blazars. The complete code is available on GitHub and allows for testing leptonic synchrotron self-Compton models with or without external inverse-Compton processes from the thermal environment of supermassive black holes (accretion disk and broad-line region). The code is designed to be user-friendly and computationally efficient. It contains a core written in C++ and a fully parallelized SED fitting method. The original multi-SSC zone model ofBjetis also available on GitHub but is not included in the Markov Chain Monte Carlo fitting process at the moment. We present the features, performance, and results ofBjet_MCMC, as well as user advice.more » « less
An official website of the United States government
