Today’s large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high-ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion. 
                        more » 
                        « less   
                    This content will become publicly available on June 23, 2026
                            
                            Abstract Visual Scientific Workflow Design using VisFlow 2.0
                        
                    
    
            Scientific workflows are pivotal for managing complex computational tasks, including data analysis, processing, simulation, and visualization. However, their design and administration typically demand substantial programming expertise, limiting access for domain scientists. Many such workflow systems also lack real-time execution tracking, and streamlined data integration capabilities, hindering efficiency and repeatability in scientific experimentation. In response, we introduce VisFlow 2.0, a next-generation platform derived from the original VisFlow. We compare VisFlow 2.0 to traditional alternatives through a well-studied computational pipeline, highlighting its usability, flexibility, and effectiveness, especially for non-expert users. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2410668
- PAR ID:
- 10631948
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400714627
- Page Range / eLocation ID:
- 1 to 6
- Format(s):
- Medium: X
- Location:
- Columbus USA
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            What new questions could ecophysiologists answer if physio-logging research was fully reproducible? We argue that technical debt (computational hurdles resulting from prioritizing short-term goals over long-term sustainability) stemming from insufficient cyberinfrastructure (field-wide tools, standards, and norms for analyzing and sharing data) trapped physio-logging in a scientific silo. This debt stifles comparative biological analyses and impedes interdisciplinary research. Although physio-loggers (e.g., heart rate monitors and accelerometers) opened new avenues of research, the explosion of complex datasets exceeded ecophysiology’s informatics capacity. Like many other scientific fields facing a deluge of complex data, ecophysiologists now struggle to share their data and tools. Adapting to this new era requires a change in mindset, from “data as a noun” (e.g., traits, counts) to “data as a sentence”, where measurements (nouns) are associate with transformations (verbs), parameters (adverbs), and metadata (adjectives). Computational reproducibility provides a framework for capturing the entire sentence. Though usually framed in terms of scientific integrity, reproducibility offers immediate benefits by promoting collaboration between individuals, groups, and entire fields. Rather than a tax on our productivity that benefits some nebulous greater good, reproducibility can accelerate the pace of discovery by removing obstacles and inviting a greater diversity of perspectives to advance science and society. In this article, we 1) describe the computational challenges facing physio-logging scientists and connect them to the concepts of technical debt and cyberinfrastructure , 2) demonstrate how other scientific fields overcame similar challenges by embracing computational reproducibility, and 3) present a framework to promote computational reproducibility in physio-logging, and bio-logging more generally.more » « less
- 
            Abstract Machine learning (ML) models are universal function approximators and—if used correctly—can summarize the information content of observational data sets in a functional form for scientific and engineering applications. A benefit to ML over parametric models is that there are no a priori assumptions about particular basis functions which can potentially limit the phenomena that can be modeled. In this work, we develop ML models on three data sets: the Space Environment Technologies High Accuracy Satellite Drag Model (HASDM) density database, a spatiotemporally matched data set of outputs from the Jacchia‐Bowman 2008 Empirical Thermospheric Density Model (JB2008), and an accelerometer‐derived density data set from CHAllenging Minisatellite Payload (CHAMP). These ML models are compared to the Naval Research Laboratory Mass Spectrometer and Incoherent Scatter radar (NRLMSIS 2.0) model to study the presence of post‐storm cooling in the middle‐thermosphere. We find that both NRLMSIS 2.0 and JB2008‐ML do not account for post‐storm cooling and consequently perform poorly in periods following strong geomagnetic storms (e.g., the 2003 Halloween storms). Conversely, HASDM‐ML and CHAMP‐ML do show evidence of post‐storm cooling indicating that this phenomenon is present in the original data sets. Results show that density reductions up to 40% can occur 1–3 days post‐storm depending on the location and strength of the storm.more » « less
- 
            During the COVID-19 pandemic, many students lost opportunities to explore science in labs due to school closures. Remote labs provide a possible solution to mitigate this loss. However, most remote labs to date are based on a somehow centralized model in which experts design and conduct certain types of experiments in well-equipped facilities, with a few options of manipulation provided to remote users. In this paper, we propose a distributed framework, dubbed remote labs 2.0, that offers the flexibility needed to build an open platform to support educators to create, operate, and share their own remote labs. Similar to the transformation of the Web from 1.0 to 2.0, remote labs 2.0 can greatly enrich experimental science on the Internet by allowing users to choose and contribute their subjects and topics. As a reference implementation, we developed a platform branded as Telelab. In collaboration with a high school chemistry teacher, we conducted remote chemical reaction experiments on the Telelab platform with two online classes. Pre/post-test results showed that these high school students attained significant gains (t(26)=8.76, p<0.00001) in evidence-based reasoning abilities. Student surveys revealed three key affordances of Telelab: live experiments, scientific instruments, and social interactions. All 31 respondents were engaged by one or more of these affordances. Students behaviors were characterized by analyzing their interaction data logged by the platform. These findings suggest that appropriate applications of remote labs 2.0 in distance education can, to some extent, reproduce critical effects of their local counterparts on promoting science learning.more » « less
- 
            Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
