skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A user-friendly tool to convert photon counting data to the open-source Photon-HDF5 file format
Photon-HDF5 is an open-source and open file format for storing photon-counting data from single molecule microscopy experiments, introduced to simplify data exchange and increase the reproducibility of data analysis. Part of the Photon-HDF5 ecosystem, is phconvert, an extensible python library that allows converting proprietary formats into Photon-HDF5 files. However, its use requires some proficiency with command line instructions, the python programming language, and the YAML markup format. This creates a significant barrier for potential users without that expertise, but who want to benefit from the advantages of releasing their files in an open format. In this work, we present a GUI that lowers this barrier, thus simplifying the use of Photon-HDF5. This tool uses the phconvert python library to convert data files originally saved in proprietary data formats to Photon-HDF5 files, without users having to write a single line of code. Because reproducible analyses depend on essential experimental information, such as laser power or sample description, the GUI also includes (currently limited) functionality to associate valid metadata with the converted file, without having to write any YAML. Finally, the GUI includes several productivity-enhancing features such as whole-directory batch conversion and the ability to re-run a failed batch, only converting the files that could not be converted in the previous run.  more » « less
Award ID(s):
1842951
PAR ID:
10350903
Author(s) / Creator(s):
; ; ;
Editor(s):
Gregor, Ingo; Erdmann, Rainer; Koberling, Felix
Date Published:
Journal Name:
Proc. SPIE 11967, Single Molecule Spectroscopy and Superresolution Imaging XV
Page Range / eLocation ID:
8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern scientific workflows couple simulations with AI-powered analytics by frequently exchanging data to accelerate time-to-science to reduce the complexity of the simulation planes. However, this data exchange is limited in performance and portability due to a lack of support for scientific data formats in AI frameworks. We need a cohesive mechanism to effectively integrate at scale complex scientific data formats such as HDF5, PnetCDF, ADIOS2, GNCF, and Silo into popular AI frameworks such as TensorFlow, PyTorch, and Caffe. To this end, we designed Stimulus, a data management library for ingesting scientific data effectively into the popular AI frameworks. We utilize the StimOps functions along with StimPack abstraction to enable the integration of scientific data formats with any AI framework. The evaluations show that Stimulus outperforms several large-scale applications with different use-cases such as Cosmic Tagger (consuming HDF5 dataset in PyTorch), Distributed FFN (consuming HDF5 dataset in TensorFlow), and CosmoFlow (converting HDF5 into TFRecord and then consuming that in TensorFlow) by 5.3 x, 2.9 x, and 1.9 x respectively with ideal I/O scalability up to 768 GPUs on the Summit supercomputer. Through Stimulus, we can portably extend existing popular AI frameworks to cohesively support any complex scientific data format and efficiently scale the applications on large-scale supercomputers. 
    more » « less
  2. Abstract Phylogenetic studies now routinely require manipulating and summarizing thousands of data files. For most of these tasks, currently available software requires considerable computing resources and substantial knowledge of command‐line applications. We develop an ultrafast and memory‐efficient software, SEGUL, that performs common phylogenomic dataset manipulations and calculates statistics summarizing essential data features. Our software is available as standalone command‐line interface (CLI) and graphical user interface (GUI) applications, and as a library for Rust, R and Python, with possible support of other languages. The CLI and library versions run native on Windows, Linux and macOS, including Apple ARM Macs. The GUI version extends support to include mobile iOS, iPadOS and Android operating systems. SEGUL leverages the high performance of the Rust programming language to offer fast execution times and low memory footprints regardless of dataset size and platform choice. The inclusion of a GUI minimizes bioinformatics barriers to phylogenomics while SEGUL's efficiency reduces economic barriers by allowing analysis on inexpensive hardware. Our support for mobile operating systems further enables teaching phylogenomics where access to computing power is limited. 
    more » « less
  3. This dataset is associated with a manuscript on river plumes and idealized coastal corners with first author Michael M. Whitney. The dataset includes source code, compilation files, and routines to generate input files for the Regional Ocean Modeling System (ROMS) runs used in this study. ROMS output files in NetCDF format are generated by executing the compiled ROMS code with the input files. The dataset also includes MATLAB routines and datafiles for the analysis of model results and generation of figures in the manuscript. The following zip files are included: ROMS_v783_Yan_code.zip [ROMS source code branch used in this study] coastalcorner_ROMS_compilation.zip [files to compile ROMS source code and run-specific Fortran-90 built code] coastalcorner_ROMS_input_generate_MATLAB.zip [ROMS ASCII input file and MATLAB routines to generate ROMS NetCDF input files for runs] coastalcorner_MATLAB_output_analysis.zip [MATLAB data files with selected ROMS output fields and custom analysis routines and datafiles in MATLAB formats used in this study] coastalcorner_MATLAB_figures.zip [custom MATLAB routine for manuscript figure generation and MATLAB data files with all data fields included in figures] coastalcorner_tif_figures.zip [TIF image files of each figure in manuscript] 
    more » « less
  4. This exploratory interpretive case study investigated the collaborative potential of open government data available through data.gov, the US federal open data catalog. Open data is a central aspect of open government collaboration because it fosters exchange and communication between governments and the public. Government organizations that release open data make choices about file formats that have a substantial impact on the potential for collaboration. A file format, such as a document or a spreadsheet, is a constraint on which programs can read the file and what actions a user can do with the file. Overall, we found data.gov formats with limited collaboration potential but files that could be accessed by people with a wide range of skills. The findings are incorporated into suggestions for future iterations of open data policy. The advantages and limitations of using file formats for open data research are considered. The exploratory findings raise questions about future user-centric open data evaluations. 
    more » « less
  5. Recorder is a multi-level I/O tracing tool that captures HDF5, MPI-I/O, and POSIX I/O calls. In this paper, we present a new version of Recorder that adds support for most metadata POSIX calls such as stat, link, and rename. We also introduce a compressed tracing format to reduce trace file size and run time overhead incurred from collecting the trace data. Moreover, we add a set of post-mortem and visualization routines to our new version of Recorder that manage the compressed trace data for users. Our experiments with four HPC applications show a file size reduction of over 2× and reduced post-processing time by 20% when using our new compressed trace file format. 
    more » « less