skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An array-oriented Python interface for FastJet
Abstract Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python. Therefore, we developed fastjet, a pip-installable Python package that provides FastJet C++ binaries, the classic (particle-at-a-time) Python interface, and the new array-oriented interface for use with Awkward Array. The new interface streamlines interoperability with scientific Python software beyond HEP, such as machine learning. In one case, adopting this library along with other array-oriented tools accelerated HEP analysis code by a factor of 20. It was designed to be easily integrated with libraries in the Scikit-HEP ecosystem, including Uproot (file I/O), hist (histogramming), Vector (Lorentz vectors), and Coffea (high-level glue). We discuss the design of the fastjet Python library, integrating the classic interface with the array oriented interface and with the Vector library for Lorentz vector operations. The new interface was developed as open source.  more » « less
Award ID(s):
1836650
PAR ID:
10430293
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Physics: Conference Series
Volume:
2438
Issue:
1
ISSN:
1742-6588
Page Range / eLocation ID:
012011
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
    Boost.Histogram, a header-only C++14 library that provides multidimensional histograms and profiles, became available in Boost 1.70. It is extensible, fast, and uses modern C++ features. Using template metaprogramming, the most efficient code path for any given configuration is automatically selected. The library includes key features designed for the particle physics community, such as optional under- and overflow bins, weighted increments, reductions, growing axes, thread-safe filling, and memory-efficient counters with high-dynamic range. Python bindings for Boost.Histogram are being developed in the Scikit-HEP project to provide a fast, easy-to-install package as a backend for other Python libraries and for advanced users to manipulate histograms. Versatile and efficient histogram filling, effective manipulation, multithreading support, and other features make this a powerful tool. This library has also driven package distribution efforts in Scikit-HEP, allowing binary packages hosted on PyPI to be available for a very wide variety of platforms. Two other libraries fill out the remainder of the Scikit-HEP Python histogramming effort. Aghast is a library designed to provide conversions between different forms of histograms, enabling interaction between histogram libraries, often without an extra copy in memory. This enables a user to make a histogram in one library and then save it in another form, such as saving a Boost.Histogram in ROOT. And Hist is a library providing friendly, analyst-targeted syntax and shortcuts for quick manipulations and fast plotting using these two libraries. 
    more » « less
  2. Biscarat, C.; Campana, S.; Hegner, B.; Roiser, S.; Rovelli, C.I.; Stewart, G.A. (Ed.)
    Array operations are one of the most concise ways of expressing common filtering and simple aggregation operations that are the hallmark of a particle physics analysis: selection, filtering, basic vector operations, and filling histograms. The High Luminosity run of the Large Hadron Collider (HL-LHC), scheduled to start in 2026, will require physicists to regularly skim datasets that are over a PB in size, and repeatedly run over datasets that are 100’s of TB’s – too big to fit in memory. Declarative programming techniques are a way of separating the intent of the physicist from the mechanics of finding the data and using distributed computing to process and make histograms. This paper describes a library that implements a declarative distributed framework based on array programming. This prototype library provides a framework for different sub-systems to cooperate in producing plots via plug-in’s. This prototype has a ServiceX data-delivery sub-system and an awkward array sub-system cooperating to generate requested data or plots. The ServiceX system runs against ATLAS xAOD data and flat ROOT TTree’s and awkward on the columnar data produced by ServiceX. 
    more » « less
  3. In the past decade, distributed acoustic sensing (DAS) has enabled many new monitoring applications in diverse fields including hydrocarbon exploration and extraction; induced, local, regional, and global seismology; infrastructure and urban monitoring; and several others. However, to date, the open-source software ecosystem for handling DAS data is relatively immature. Here we introduce DASCore, a Python library for analyzing, visualizing, and managing DAS data. DASCore implements an object-oriented interface for performing common data processing and transformations, reading and writing various DAS file types, creating simple visualizations, and managing file system-based DAS archives. DASCore also integrates with other Python-based tools which enable the processing of massive data sets in cloud environments. DASCore is the foundational package for the broader DAS data analysis ecosystem (DASDAE), and as such its main goal is to facilitate the development of other DAS libraries and applications. 
    more » « less
  4. Vanschoren, J (Ed.)
    As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have grown in recent years. However, no comprehensive package exists that enables non-specialists to use these methods easily. mvlearn is a Python library which implements the leading multiview machine learning methods. Its simple API closely follows that of scikit-learn for increased ease-of-use. The package can be installed from Python Package Index (PyPI) and the conda package manager and is released under the MIT open-source license. The documentation, detailed examples, and all releases are available at https://mvlearn.github.io/. 
    more » « less
  5. Schneidman-Duhovny, Dina (Ed.)
    Gmxapi provides an integrated, native Python API for both standard and advanced molecular dynamics simulations in GROMACS. The Python interface permits multiple levels of integration with the core GROMACS libraries, and legacy support is provided via an interface that mimics the command-line syntax, so that all GROMACS commands are fully available. Gmxapi has been officially supported since the GROMACS 2019 release and is enabled by default in current versions of the software. Here we describe gmxapi 0.3 and later. Beyond simply wrapping GROMACS library operations, the API permits several advanced operations that are not feasible using the prior command-line interface. First, the API allows custom user plugin code within the molecular dynamics force calculations, so users can execute custom algorithms without modifying the GROMACS source. Second, the Python interface allows tasks to be dynamically defined, so high-level algorithms for molecular dynamics simulation and analysis can be coordinated with loop and conditional operations. Gmxapi makes GROMACS more accessible to custom Python scripting while also providing support for high-level data-flow simulation algorithms that were previously feasible only in external packages. 
    more » « less