skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Scalable Cloud-Based Analysis Platform for Survey Astronomy
Research in astronomy is undergoing a major paradigm shift, transformed by the advent of large, automated, sky-surveys into a data-rich field where multi-TB to PB-sized spatio-temporal data sets are commonplace. For example the Legacy Survey of Space and Time; LSST) is about to begin delivering observations of >10^10 objects, including a database with >4 x 10^13 rows of time series data. This volume presents a challenge: how should a domain-scientist with little experience in data management or distributed computing access data and perform analyses at PB-scale? We present a possible solution to this problem built on (adapted) industry standard tools and made accessible through web gateways. We have i) developed Astronomy eXtensions for Spark, AXS, a series of astronomy-specific modifications to Apache Spark allowing astronomers to tap into its computational scalability ii) deployed datasets in AXS-queriable format in Amazon S3, leveraging its I/O scalability, iii) developed a deployment of Spark on Kubernetes with auto-scaling configurations requiring no end-user interaction, and iv) provided a Jupyter notebook, web-accessible, front-end via JupyterHub including a rich library of pre-installed common astronomical software (accessible at http://hub.dirac.institute). We use this system to enable the analysis of data from the Zwicky Transient Facility, presently the closest precursor survey to the LSST, and discuss initial results. To our knowledge, this is a first application of cloud-based scalable analytics to astronomical datasets approaching LSST-scale. The code is available at https://github.com/astronomy-commons.  more » « less
Award ID(s):
2003196
PAR ID:
10282752
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Gateways 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We present a scalable, cloud-based science platform solution designed to enable next-to-the-data analyses of terabyte-scale astronomical tabular data sets. The presented platform is built on Amazon Web Services (over Kubernetes and S3 abstraction layers), utilizes Apache Spark and the Astronomy eXtensions for Spark for parallel data analysis and manipulation, and provides the familiar JupyterHub web-accessible front end for user access. We outline the architecture of the analysis platform, provide implementation details and rationale for (and against) technology choices, verify scalability through strong and weak scaling tests, and demonstrate usability through an example science analysis of data from the Zwicky Transient Facility’s 1Bn+ light-curve catalog. Furthermore, we show how this system enables an end user to iteratively build analyses (in Python) that transparently scale processing with no need for end-user interaction. The system is designed to be deployable by astronomers with moderate cloud engineering knowledge, or (ideally) IT groups. Over the past 3 yr, it has been utilized to build science platforms for the DiRAC Institute, the ZTF partnership, the LSST Solar System Science Collaboration, and the LSST Interdisciplinary Network for Collaboration and Computing, as well as for numerous short-term events (with over 100 simultaneous users). In a live demo instance, the deployment scripts, source code, and cost calculators are accessible.44http://hub.astronomycommons.org/ 
    more » « less
  2. null (Ed.)
    The Legacy Survey of Space and Time, operated by the Vera C. Rubin Observatory, is a 10-year astronomical survey due to start operations in 2022 that will image half the sky every three nights. LSST will produce ~20TB of raw data per night which will be calibrated and analyzed in almost real-time. Given the volume of LSST data, the traditional subset-download-process paradigm of data reprocessing faces significant challenges. We describe here, the first steps towards a gateway for astronomical science that would enable astronomers to analyze images and catalogs at scale. In this first step, we focus on executing the Rubin LSST Science Pipelines, a collection of image and catalog processing algorithms, on Amazon Web Services (AWS). We describe our initial impressions of the performance, scalability, and cost of deploying such a system in the cloud. 
    more » « less
  3. Abstract Next-generation surveys like the Legacy Survey of Space and Time (LSST) on the Vera C. Rubin Observatory (Rubin) will generate orders of magnitude more discoveries of transients and variable stars than previous surveys. To prepare for this data deluge, we developed the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC), a competition that aimed to catalyze the development of robust classifiers under LSST-like conditions of a nonrepresentative training set for a large photometric test set of imbalanced classes. Over 1000 teams participated in PLAsTiCC, which was hosted in the Kaggle data science competition platform between 2018 September 28 and 2018 December 17, ultimately identifying three winners in 2019 February. Participants produced classifiers employing a diverse set of machine-learning techniques including hybrid combinations and ensemble averages of a range of approaches, among them boosted decision trees, neural networks, and multilayer perceptrons. The strong performance of the top three classifiers on Type Ia supernovae and kilonovae represent a major improvement over the current state of the art within astronomy. This paper summarizes the most promising methods and evaluates their results in detail, highlighting future directions both for classifier development and simulation needs for a next-generation PLAsTiCC data set. 
    more » « less
  4. Abstract Using published simulations of the 10 yr Legacy Survey of Space and Time (LSST), we forecast its ability to determine the masses of individual main-belt asteroids (MBAs) through precise astrometry of any pairs of the ≈1.2 million known MBAs undergoing close gravitational encounters during the survey. The uncertaintyσIon the impulse applied to a tracer asteroid by its deflector is derived from the Fisher matrix of the tracer’s astrometric data, including an azimuthal acceleration from the Yarkovsky effect as a free parameter for each tracer. If only LSST observations are available, the mean forecastedσIis 7 × 10−6m s−1for MBAs at apparent magnitudemV < 19.5, degrading ≈10× formV = 23. The forecasted median uncertainty on the mass of a deflector MBA is ≈3 × 10−14M, with a wide range of variation depending on the (random) configurations of the closest encounters. All of these figures improve ≈ fivefold if strong pre-LSST astrometry is available. Use of LSST data will increase the number of MBAs with masses measured to 20% accuracy or better from the present ≈70 to between 200 and 550, depending on quality of pre-LSST data, if all MBAs have mass-to-reflected-light ratios (M/Ls) at the low end of the currently measured bodies. If high-end M/Ls are more common, 580–1100 bodies will attain a signal-to-noise ratio > 5, including nearly all bodies with absolute magnitudeH < 10 and some as faint asH = 13. We forecast that ≈20 new Jupiter Trojans could also have masses measured at <20% accuracy. Tables of the measurable deflector MBAs and their tracers are provided. 
    more » « less
  5. Abstract Light echoes (LEs) are the reflections of astrophysical transients off of interstellar dust. They are fascinating astronomical phenomena that enable studies of the scattering dust as well as of the original transients. LEs, however, are rare and extremely difficult to detect as they appear as faint, diffuse, time-evolving features. The detection of LEs still largely relies on human inspection of images, a method unfeasible in the era of large synoptic surveys. The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will generate an unprecedented amount of astronomical imaging data at high spatial resolution, exquisite image quality, and over tens of thousands of square degrees of sky: an ideal survey for LEs. However, the Rubin data processing pipelines are optimized for the detection of point sources and will entirely miss LEs. Over the past several years, artificial intelligence (AI) object-detection frameworks have achieved and surpassed real-time, human-level performance. In this work, we leverage a data set from the Asteroid Terrestrial-impact Last Alert System telescope to test a popular AI object-detection framework, You Only Look Once, or YOLO, developed by the computer-vision community, to demonstrate the potential of AI for the detection of LEs in astronomical images. We find that an AI framework can reach human-level performance even with a size- and quality-limited data set. We explore and highlight challenges, including class imbalance and label incompleteness, and road map the work required to build an end-to-end pipeline for the automated detection and study of LEs in high-throughput astronomical surveys. 
    more » « less