skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Astronomy Commons Platform: A Deployable Cloud-based Analysis Platform for Astronomy
Abstract We present a scalable, cloud-based science platform solution designed to enable next-to-the-data analyses of terabyte-scale astronomical tabular data sets. The presented platform is built on Amazon Web Services (over Kubernetes and S3 abstraction layers), utilizes Apache Spark and the Astronomy eXtensions for Spark for parallel data analysis and manipulation, and provides the familiar JupyterHub web-accessible front end for user access. We outline the architecture of the analysis platform, provide implementation details and rationale for (and against) technology choices, verify scalability through strong and weak scaling tests, and demonstrate usability through an example science analysis of data from the Zwicky Transient Facility’s 1Bn+ light-curve catalog. Furthermore, we show how this system enables an end user to iteratively build analyses (in Python) that transparently scale processing with no need for end-user interaction. The system is designed to be deployable by astronomers with moderate cloud engineering knowledge, or (ideally) IT groups. Over the past 3 yr, it has been utilized to build science platforms for the DiRAC Institute, the ZTF partnership, the LSST Solar System Science Collaboration, and the LSST Interdisciplinary Network for Collaboration and Computing, as well as for numerous short-term events (with over 100 simultaneous users). In a live demo instance, the deployment scripts, source code, and cost calculators are accessible.44http://hub.astronomycommons.org/  more » « less
Award ID(s):
2003196 1739419
PAR ID:
10374480
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
DOI PREFIX: 10.3847
Date Published:
Journal Name:
The Astronomical Journal
Volume:
164
Issue:
2
ISSN:
0004-6256
Format(s):
Medium: X Size: Article No. 68
Size(s):
Article No. 68
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Research in astronomy is undergoing a major paradigm shift, transformed by the advent of large, automated, sky-surveys into a data-rich field where multi-TB to PB-sized spatio-temporal data sets are commonplace. For example the Legacy Survey of Space and Time; LSST) is about to begin delivering observations of >10^10 objects, including a database with >4 x 10^13 rows of time series data. This volume presents a challenge: how should a domain-scientist with little experience in data management or distributed computing access data and perform analyses at PB-scale? We present a possible solution to this problem built on (adapted) industry standard tools and made accessible through web gateways. We have i) developed Astronomy eXtensions for Spark, AXS, a series of astronomy-specific modifications to Apache Spark allowing astronomers to tap into its computational scalability ii) deployed datasets in AXS-queriable format in Amazon S3, leveraging its I/O scalability, iii) developed a deployment of Spark on Kubernetes with auto-scaling configurations requiring no end-user interaction, and iv) provided a Jupyter notebook, web-accessible, front-end via JupyterHub including a rich library of pre-installed common astronomical software (accessible at http://hub.dirac.institute). We use this system to enable the analysis of data from the Zwicky Transient Facility, presently the closest precursor survey to the LSST, and discuss initial results. To our knowledge, this is a first application of cloud-based scalable analytics to astronomical datasets approaching LSST-scale. The code is available at https://github.com/astronomy-commons. 
    more » « less
  2. Abstract In this work, we present classification results on early supernova light curves from SCONE, a photometric classifier that uses convolutional neural networks to categorize supernovae (SNe) by type using light-curve data. SCONE is able to identify SN types from light curves at any stage, from the night of initial alert to the end of their lifetimes. Simulated LSST SNe light curves were truncated at 0, 5, 15, 25, and 50 days after the trigger date and used to train Gaussian processes in wavelength and time space to produce wavelength–time heatmaps. SCONE uses these heatmaps to perform six-way classification between SN types Ia, II, Ibc, Ia-91bg, Iax, and SLSN-I. SCONE is able to perform classification with or without redshift, but we show that incorporating redshift information improves performance at each epoch. SCONE achieved 75% overall accuracy at the date of trigger (60% without redshift), and 89% accuracy 50 days after trigger (82% without redshift). SCONE was also tested on bright subsets of SNe (r< 20 mag) and produced 91% accuracy at the date of trigger (83% without redshift) and 95% five days after trigger (94.7% without redshift). SCONE is the first application of convolutional neural networks to the early-time photometric transient classification problem. All of the data processing and model code developed for this paper can be found in the SCONE software package11github.com/helenqu/sconelocated at github.com/helenqu/scone (Qu 2021). 
    more » « less
  3. Abstract Accelerating the design and development of new advanced materials is one of the priorities in modern materials science. These efforts are critically dependent on the development of comprehensive materials cyberinfrastructures which enable efficient data storage, management, sharing, and collaboration as well as integration of computational tools that help establish processing–structure–property relationships. In this contribution, we present implementation of such computational tools into a cloud-based platform called BisQue (Kvilekval et al., Bioinformatics 26(4):554, 2010). We first describe the current state of BisQue as an open-source platform for multidisciplinary research in the cloud and its potential for 3D materials science. We then demonstrate how new computational tools, primarily aimed at processing–structure–property relationships, can be implemented into the system. Specifically, in this work, we develop a module for BisQue that enables microstructure-sensitive predictions of effective yield strength of two-phase materials. Towards this end, we present an implementation of a computationally efficient data-driven model into the BisQue platform. The new module is made available online (web address:https://bisque.ece.ucsb.edu/module_service/Composite_Strength/) and can be used from a web browser without any special software and with minimal computational requirements on the user end. The capabilities of the module for rapid property screening are demonstrated in case studies with two different methodologies based on datasets containing 3D microstructure information from (i) synthetic generation and (ii) sampling large 3D volumes obtained in experiments. 
    more » « less
  4. Abstract Reliable studies of the long-term dynamics of planetary systems require numerical integrators that are accurate and fast. The challenge is often formidable because the chaotic nature of many systems requires relative numerical error bounds at or close to machine precision (∼10−16, double-precision arithmetic); otherwise, numerical chaos may dominate over physical chaos. Currently, the speed/accuracy demands are usually only met by symplectic integrators. For example, the most up-to-date long-term astronomical solutions for the solar system in the past (widely used in, e.g., astrochronology and high-precision geological dating) have been obtained using symplectic integrators. However, the source codes of these integrators are unavailable. Here I present the symplectic integratororbitN(lean version 1.0) with the primary goal of generating accurate and reproducible long-term orbital solutions for near-Keplerian planetary systems (here the solar system) with a dominant massM0. Among other features,orbitN-1.0includesM0’s quadrupole moment, a lunar contribution, and post-Newtonian corrections (1PN) due toM0(fast symplectic implementation). To reduce numerical round-off errors, Kahan compensated summation was implemented. I useorbitNto provide insight into the effect of various processes on the long-term chaos in the solar system. Notably, 1PN corrections have the opposite effect on chaoticity/stability on a 100 Myr versus Gyr timescale. For the current application,orbitNis about as fast as or faster (factor 1.15–2.6) than comparable integrators, depending on hardware.11The orbitN source code (C) is available athttp://github.com/rezeebe/orbitN. 
    more » « less
  5. Abstract The availability and easy access of large-scale experimental and computational materials data have enabled the emergence of accelerated development of algorithms and models for materials property prediction, structure prediction, and generative design of materials. However, the lack of user-friendly materials informatics web servers has severely constrained the wide adoption of such tools in the daily practice of materials screening, tinkering, and design space exploration by materials scientists. Herein we first survey current materials informatics web apps and then propose and develop MaterialsAtlas.org, a web-based materials informatics toolbox for materials discovery, which includes a variety of routinely needed tools for exploratory materials discovery, including material’s composition and structure validity check (e.g. charge neutrality, electronegativity balance, dynamic stability, Pauling rules), materials property prediction (e.g. band gap, elastic moduli, hardness, and thermal conductivity), search for hypothetical materials, and utility tools. These user-friendly tools can be freely accessed athttp://www.materialsatlas.org. We argue that such materials informatics apps should be widely developed by the community to speed up materials discovery processes. 
    more » « less