skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 18, 2026

Title: Data Driven CI System Design and Procurement with Open XDMoD
The ability to apply data-driven design principles to customize new CI investment to best serve the intended community as well as provide fact-based justification for its need is critical given the important role it plays in research and economic development and its high cost. Here we describe a data driven approach to CI sys- tem design based on workload analyses obtained using the popular open-source CI management tool Open XDMoD, and how it was leveraged in a procurement to provide end-users with an additional 5.6 million CPU hours annually, with subsequent procurements following similar design goals. In addition to system design, we demonstrate Open XDMoD’s utility in providing fact-based justifi- cation for the CI procurement through usage metrics of existing CI resources.  more » « less
Award ID(s):
2137603
PAR ID:
10625656
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400713989
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Location:
Columbus Ohio USA
Sponsoring Org:
National Science Foundation
More Like this
  1. With the increase in data-driven analytics, the demand for high performing computing resources has risen. There are many high-performance computing centers providing cyberinfrastructure (CI) for academic research. However, there exists access barriers in bringing these resources to a broad range of users. Users who are new to data analytics field are not yet equipped to take advantage of the tools offered by CI. In this paper, we propose a framework to lower the access barriers that exist in bringing the high-performance computing resources to users that do not have the training to utilize the capability of CI. The framework uses divide-and-conquer (DC) paradigm for data-intensive computing tasks. It consists of three major components - user interface (UI), parallel scripts generator (PSG) and underlying cyberinfrastructure (CI). The goal of the framework is to provide a user-friendly method for parallelizing data-intensive computing tasks with minimal user intervention. Some of the key design goals are usability, scalability and reproducibility. The users can focus on their problem and leave the parallelization details to the framework. 
    more » « less
  2. In recent decades, the design of budget feasible mechanisms for a wide range of procurement auction settings has received significant attention in the Artificial Intelligence (AI) community. These procurement auction settings have practical applications in various domains such as federated learning, crowdsensing, edge computing, and resource allocation. In a basic procurement auction setting of these domains, a buyer with a limited budget is tasked with procuring items (\eg, goods or services) from strategic sellers, who have private information on the true costs of their items and incentives to misrepresent their items' true costs. The primary goal of budget feasible mechanisms is to elicit the true costs from sellers and determine items to procure from sellers to maximize the buyer valuation function for the items and ensure that the total payment to the sellers is no more than the budget. In this survey, we provide a comprehensive overview of key procurement auction settings and results of budget feasible mechanisms. We provide several promising future research directions. 
    more » « less
  3. Large scientific facilities are unique and complex infrastructures that have become fundamental instruments for enabling high quality, world-leading research to tackle scientific problems at unprecedented scales. Cyberinfrastructure (CI) is an essential component of these facilities, providing the user community with access to data, data products, and services with the potential to transform data into knowledge. However, the timely evolution of the CI available at large facilities is challenging and can result in science communities requirements not being fully satisfied. Furthermore, integrating CI across multiple facilities as part of a scientific workflow is hard, resulting in data silos. In this paper, we explore how science gateways can provide improved user experiences and services that may not be offered at large facility datacenters. Using a science gateway supported by the Science Gateway Community Institute, which provides subscription-based delivery of streamed data and data products from the NSF Ocean Observatories Initiative (OOI), we propose a system that enables streaming-based capabilities and workflows using data from large facilities, such as the OOI, in a scalable manner. We leverage data infrastructure building blocks, such as the Virtual Data Collaboratory, which provides data and comput- ing capabilities in the continuum to efficiently and collaboratively integrate multiple data-centric CIs, build data-driven workflows, and connect large facilities data sources with NSF-funded CI, such as XSEDE. We also introduce architectural solutions for running these workflows using dynamically provisioned federated CI. 
    more » « less
  4. his work investigates the potential of using aggregate controllable loads and energy storage systems from multiple heterogeneous feeders to jointly optimize a utility's energy procurement cost from the real-time market and their revenue from ancillary service markets. Toward this, we formulate an optimization problem that co-optimizes real-time and energy reserve markets based on real-time and ancillary service market prices, along with available solar power, storage and demand data from each of the feeders within a single distribution network. The optimization, which includes all network system constraints, provides real/reactive power and energy storage set-points for each feeder as well as a schedule for the aggregate system's participation in the two types of markets. We evaluate the performance of our algorithm using several trace-driven simulations based on a real-world circuit of a New Jersey utility. The results demonstrate that active participation through controllable loads and storage significantly reduces the utility's net costs, i.e., real-time energy procurement costs minus ancillary market revenues. 
    more » « less
  5. Electricity bill constitutes a significant portion of operational costs for large scale data centers. Empowering data centers with on-site storages can reduce the electricity bill by shaping the energy procurement from deregulated electricity markets with real-time price fluctuations. This work focuses on designing energy procurement and storage management strategies to minimize the electricity bill of storage-assisted data centers. Designing such strategies is challenging since the net energy demand of the data center and electricity market prices are not known in advance, and the underlying problem is coupled over time due to evolution of the storage level. Using competitive ratio as the performance measure, we propose an online algorithm that determines the energy procurement and storage management strategies using a threshold based policy. Our algorithm achieves the optimal competitive ratio of as a function of the price fluctuation ratio. We validate the algorithm using data traces from electricity markets and data-center energy demands. The results show that our algorithm achieves close to the offline optimal performance and outperforms existing alternatives.% 
    more » « less