skip to main content


Search for: All records

Award ID contains: 1739419

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    We present a scalable, cloud-based science platform solution designed to enable next-to-the-data analyses of terabyte-scale astronomical tabular data sets. The presented platform is built on Amazon Web Services (over Kubernetes and S3 abstraction layers), utilizes Apache Spark and the Astronomy eXtensions for Spark for parallel data analysis and manipulation, and provides the familiar JupyterHub web-accessible front end for user access. We outline the architecture of the analysis platform, provide implementation details and rationale for (and against) technology choices, verify scalability through strong and weak scaling tests, and demonstrate usability through an example science analysis of data from the Zwicky Transient Facility’s 1Bn+ light-curve catalog. Furthermore, we show how this system enables an end user to iteratively build analyses (in Python) that transparently scale processing with no need for end-user interaction. The system is designed to be deployable by astronomers with moderate cloud engineering knowledge, or (ideally) IT groups. Over the past 3 yr, it has been utilized to build science platforms for the DiRAC Institute, the ZTF partnership, the LSST Solar System Science Collaboration, and the LSST Interdisciplinary Network for Collaboration and Computing, as well as for numerous short-term events (with over 100 simultaneous users). In a live demo instance, the deployment scripts, source code, and cost calculators are accessible.4

    http://hub.astronomycommons.org/

     
    more » « less
  2. Abstract Trans-Neptunian objects provide a window into the history of the solar system, but they can be challenging to observe due to their distance from the Sun and relatively low brightness. Here we report the detection of 75 moving objects that we could not link to any other known objects, the faintest of which has a VR magnitude of 25.02 ± 0.93 using the Kernel-Based Moving Object Detection (KBMOD) platform. We recover an additional 24 sources with previously known orbits. We place constraints on the barycentric distance, inclination, and longitude of ascending node of these objects. The unidentified objects have a median barycentric distance of 41.28 au, placing them in the outer solar system. The observed inclination and magnitude distribution of all detected objects is consistent with previously published KBO distributions. We describe extensions to KBMOD, including a robust percentile-based lightcurve filter, an in-line graphics-processing unit filter, new coadded stamp generation, and a convolutional neural network stamp filter, which allow KBMOD to take advantage of difference images. These enhancements mark a significant improvement in the readiness of KBMOD for deployment on future big data surveys such as LSST. 
    more » « less
  3. We design, implement, and evaluate DeepEverest, a system for the efficient execution of interpretation by example queries over the activation values of a deep neural network. DeepEverest consists of an efficient indexing technique and a query execution algorithm with various optimizations. We prove that the proposed query execution algorithm is instance optimal. Experiments with our prototype show that DeepEverest, using less than 20% of the storage of full materialization, significantly accelerates individual queries by up to 63X and consistently outperforms other methods on multi-query workloads that simulate DNN interpretation processes. 
    more » « less
  4. null (Ed.)
    The Legacy Survey of Space and Time, operated by the Vera C. Rubin Observatory, is a 10-year astronomical survey due to start operations in 2022 that will image half the sky every three nights. LSST will produce ~20TB of raw data per night which will be calibrated and analyzed in almost real-time. Given the volume of LSST data, the traditional subset-download-process paradigm of data reprocessing faces significant challenges. We describe here, the first steps towards a gateway for astronomical science that would enable astronomers to analyze images and catalogs at scale. In this first step, we focus on executing the Rubin LSST Science Pipelines, a collection of image and catalog processing algorithms, on Amazon Web Services (AWS). We describe our initial impressions of the performance, scalability, and cost of deploying such a system in the cloud. 
    more » « less
  5. Deep learning (DL) models have achieved paradigm-changing performance in many fields with high dimensional data, such as images, audio, and text. However, the black-box nature of deep neural networks is not only a barrier to adoption in applications such as medical diagnosis, where interpretability is essential, but it also impedes diagnosis of under performing models. The task of diagnosing or explaining DL models requires the computation of additional artifacts, such as activation values and gradients. These artifacts are large in volume, and their computation, storage, and querying raise significant data management challenges. In this paper, we develop a novel data sampling technique that produces approximate but accurate results for these model debugging queries. Our sampling technique utilizes the lower dimension representation learned by the DL model and focuses on model decision boundaries for the data in this lower dimensional space. 
    more » « less