skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Recent developments in histogram libraries
Boost.Histogram, a header-only C++14 library that provides multidimensional histograms and profiles, became available in Boost 1.70. It is extensible, fast, and uses modern C++ features. Using template metaprogramming, the most efficient code path for any given configuration is automatically selected. The library includes key features designed for the particle physics community, such as optional under- and overflow bins, weighted increments, reductions, growing axes, thread-safe filling, and memory-efficient counters with high-dynamic range. Python bindings for Boost.Histogram are being developed in the Scikit-HEP project to provide a fast, easy-to-install package as a backend for other Python libraries and for advanced users to manipulate histograms. Versatile and efficient histogram filling, effective manipulation, multithreading support, and other features make this a powerful tool. This library has also driven package distribution efforts in Scikit-HEP, allowing binary packages hosted on PyPI to be available for a very wide variety of platforms. Two other libraries fill out the remainder of the Scikit-HEP Python histogramming effort. Aghast is a library designed to provide conversions between different forms of histograms, enabling interaction between histogram libraries, often without an extra copy in memory. This enables a user to make a histogram in one library and then save it in another form, such as saving a Boost.Histogram in ROOT. And Hist is a library providing friendly, analyst-targeted syntax and shortcuts for quick manipulations and fast plotting using these two libraries.  more » « less
Award ID(s):
1836650
PAR ID:
10256978
Author(s) / Creator(s):
; ;
Editor(s):
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W.
Date Published:
Journal Name:
EPJ Web of Conferences
Volume:
245
ISSN:
2100-014X
Page Range / eLocation ID:
05014
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Analysis on HEP data is an iterative process in which the results of one step often inform the next. In an exploratory analysis, it is common to perform one computation on a collection of events, then view the results (often with histograms) to decide what to try next. Awkward Array is a Scikit-HEP Python package that enables data analysis with array-at-a-time operations to implement cuts as slices, combinatorics as composable functions, etc. However, most C++ HEP libraries, such as FastJet, have an imperative, one-particle-at-a-time interface, which would be inefficient in Python and goes against the grain of the array-at-a-time logic of scientific Python. Therefore, we developed fastjet, a pip-installable Python package that provides FastJet C++ binaries, the classic (particle-at-a-time) Python interface, and the new array-oriented interface for use with Awkward Array. The new interface streamlines interoperability with scientific Python software beyond HEP, such as machine learning. In one case, adopting this library along with other array-oriented tools accelerated HEP analysis code by a factor of 20. It was designed to be easily integrated with libraries in the Scikit-HEP ecosystem, including Uproot (file I/O), hist (histogramming), Vector (Lorentz vectors), and Coffea (high-level glue). We discuss the design of the fastjet Python library, integrating the classic interface with the array oriented interface and with the Vector library for Lorentz vector operations. The new interface was developed as open source. 
    more » « less
  2. Vanschoren, J (Ed.)
    As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have grown in recent years. However, no comprehensive package exists that enables non-specialists to use these methods easily. mvlearn is a Python library which implements the leading multiview machine learning methods. Its simple API closely follows that of scikit-learn for increased ease-of-use. The package can be installed from Python Package Index (PyPI) and the conda package manager and is released under the MIT open-source license. The documentation, detailed examples, and all releases are available at https://mvlearn.github.io/. 
    more » « less
  3. The Montage image mosaic engine has found wide applicability in astronomy research, integration into processing environments, and is an examplar application for the development of advanced cyber-infrastructure. It is written in C to provide performance and portability. Linking C/C++ libraries to the Python kernel at run time as binary extensions allows them to run under Python at compiled speeds and enables users to take advantage of all the functionality in Python. We have built Python binary extensions of the 59 ANSI-C modules that make up version 5 of the Montage toolkit. This has involved a turning the code into a C library, with driver code fully separated to reproduce the calling sequence of the command-line tools; and then adding Python and C linkage code with the Cython library, which acts as a bridge between general C libraries and the Python interface. We will demonstrate how to use these Python binary extensions to perform image processing, including reprojecting and resampling images, rectifying background emission to a common level, creation of image mosaics that preserve the calibration and astrometric fidelity of the input images, creating visualizations with an adaptive stretch algorithm, processing HEALPix images, and analyzing and managing image metadata. 
    more » « less
  4. Abstract

    Echosounders are high-frequency sonar systems used to sense fish and zooplankton underwater. Their deployment on a variety of ocean observing platforms is generating vast amounts of data at an unprecedented speed from the oceans. Efficient and integrative analysis of these data, whether across different echosounder instruments or in combination with other oceanographic datasets, is crucial for understanding marine ecosystem response to the rapidly changing climate. Here we present Echopype, an open-source Python software library designed to address this need. By standardizing data as labeled, multi-dimensional arrays encoded in the widely embraced netCDF data model following a community convention, Echopype enhances the interoperability of echosounder data, making it easier to explore and use. By leveraging scientific Python libraries optimized for distributed computing, Echopype achieves computational scalability, enabling efficient processing in both local and cloud computing environments. Echopype’s modularized package structure further provides a unified framework for expanding support for additional instrument raw data formats and incorporating new analysis functionalities. We plan to continue developing Echopype by supporting and collaborating with the echosounder user community, and envision that the growth of this package will catalyze the integration of echosounder data into broader regional and global ocean observation strategies.

     
    more » « less
  5. Abstract

    Toytreeis a lightweight Python library for programmatically visualizing and manipulating tree‐based data structures. It implements a minimalist design aesthetic and modern plotting architecture suited for interactive coding in IPython/Jupyter.

    Tree drawings are generated in HTML using thetoyplotlibrary backend, and display natively in Jupyter notebooks with interactivity features. Tree drawings can be combined with other plotting functions from thetoyplotlibrary (e.g. scatterplots, histograms) to create composite figures on a shared coordinate grid, and can be exported to additional formats including PNG, PDF and SVG.

    To parse and store tree data,toytreeuses a modified fork of theete3TreeNode object, which includes functions for manipulating, annotating and comparing trees.Toytreeintegrates these functions with a plotting layout to allow node values to be extracted from trees in the correct order to style nodes for plotting. In addition,toytreeprovides functions for parsing additional tree formats, generating random trees, inferring consensus trees and drawing grids or clouds from multiple trees to visualize discordance.

    The goal oftoytreeis to provide a simple Python equivalent to commonly used tree manipulation and plotting libraries in R, and in doing so, to promote further development of phylogenetic and other tree‐based methods in Python.Toytreeis released under the GPLv3 license. Source code is available on GitHub and documentation is available athttps://toytree.readthedocs.io.

     
    more » « less