skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How the Scientific Python ecosystem helps answer fundamental questions of the Universe
The ATLAS experiment at CERN explores vast amounts of physics data to answer the most fundamental questions of the Universe.The prevalence of Python in scientific computing motivated ATLAS to adopt it for its data analysis workflows while enhancing users’ experience.This paper will describe to a broad audience how a large scientific collaboration leverages the power of the Scientific Python ecosystem to tackle domain-specific challenges and advance our understanding of the Cosmos.Through a simplified example of the renowned Higgs boson discovery, attendees will gain insights into the utilization of Python libraries to discriminate a signal in immersive noise, through tasks such as data cleaning, feature engineering, statistical interpretation and visualization at scale.  more » « less
Award ID(s):
2209034
PAR ID:
10639024
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
proceedings.scipy.org
Date Published:
Page Range / eLocation ID:
280 to 290
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Szumlak, T; Rachwał, B; Dziurda, A; Schulz, M; vom_Bruch, D; Ellis, K; Hageboeck, S (Ed.)
    The ATLAS experiment is currently developing columnar analysis frameworks which leverage the Python data science ecosystem. We describe the construction and operation of the infrastructure necessary to support demonstrations of these frameworks, with a focus on those from IRIS-HEP. One such demonstrator aims to process the compact ATLAS data format PHYSLITE at rates exceeding 200 Gbps. Various access configurations and setups on different sites are explored, including direct access to a dCache storage system via Xrootd, the use of ServiceX, and the use of multiple XCache servers equipped with NVMe storage devices. Integral to this study was the analysis of network traffic and bottlenecks, worker node scheduling and disk configurations, and the performance of an S3 object store. The system’s overall performance was measured as the number of processing cores scaled to over 2,000 and the volume of data accessed in an interactive session approached 200 TB. The presentation will delve into the operational details and findings related to the physical infrastructure that underpins these demonstrators. 
    more » « less
  2. Szumlak, T; Rachwał, B; Dziurda, A; Schulz, M; vom_Bruch, D; Ellis, K; Hageboeck, S (Ed.)
    The IRIS-HEP software institute, as a contributor to the broader HEP Python ecosystem, is developing scalable analysis infrastructure and software tools to address the upcoming HL-LHC computing challenges with new approaches and paradigms, driven by our vision of what HL-LHC analysis will require. The institute uses a “Grand Challenge” format, constructing a series of increasingly large, complex, and realistic exercises to show the vision of HL-LHC analysis. Recently, the focus has been demonstrating the IRIS-HEP analysis infrastructure at scale and evaluating technology readiness for production. As a part of the Analysis Grand Challenge activities, the institute executed a “200 Gbps Challenge”, aiming to show sustained data rates into the event processing of multiple analysis pipelines. The challenge integrated teams internal and external to the institute, including operations and facilities, analysis software tools, innovative data delivery and management services, and scalable analysis infrastructure. The challenge showcases the prototypes — including software, services, and facilities — built to process around 200 TB of data in both the CMS NanoAOD and ATLAS PHYSLITE data formats with test pipelines. The teams were able to sustain the 200 Gbps target across multiple pipelines. The pipelines focusing on event rate were able to process at over 30 MHz. These target rates are demanding; the activity revealed considerations for future testing at this scale and changes necessary for physicists to work at this scale in the future. The 200 Gbps Challenge has established a baseline on today’s facilities, setting the stage for the next exercise at twice the scale. 
    more » « less
  3. The FASER experiment is a new small and inexpensive experiment that is being placed 480 meters downstream of the ATLAS experiment at the CERN LHC. The experiment will shed light on currently unexplored phenomena, having the potential to make a revolutionary discovery. FASER is designed to capture decays of exotic particles, produced in the very forward region, out of the ATLAS detector acceptance. This talk will present the physics prospects, the detector design, and the construction progress of FASER. The experiment has been successfully installed and will take data during the LHC Run-3. 
    more » « less
  4. Abstract The FASER experiment is a new small and inexpensive experiment that is placed 480 meters downstream of the ATLAS experiment at the CERN LHC. FASER is designed to capture decays of new long-lived particles, produced outside of the ATLAS detector acceptance. These rare particles can decay in the FASER detector together with about 500–1000 Hz of other particles originating from the ATLAS interaction point. A very high efficiency trigger and data acquisition system is required to ensure that the physics events of interest will be recorded. This paper describes the trigger and data acquisition system of the FASER experiment and presents performance results of the system acquired during initial commissioning. 
    more » « less
  5. The NASA Planetary Data System (PDS) hosts millions of images of planets, moons, and other bodies collected throughout many missions. The ever-expanding nature of data and user engagement demands an interpretable content classification system to support scientific discovery and individual curiosity. In this paper, we leverage a prototype-based architecture to enable users to understand and validate the evidence used by a classifier trained on images from the Mars Science Laboratory (MSL) Curiosity rover mission. In addition to providing explanations, we investigate the diversity and correctness of evidence used by the content-based classifier. The work presented in this paper will be deployed on the PDS Image Atlas, replacing its non-interpretable counterpart. 
    more » « less