skip to main content


Search for: All records

Award ID contains: 1305375

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. As in-memory data analytics become increasingly important in a wide range of domains, the ability to develop large-scale and sustainable platforms faces significant challenges related to storage latency and memory size constraints. These challenges can be resolved by adopting new and effective formulations and novel architectures such as software-defined infrastructure. This paper investigates the key issue of data persistency for in-memory processing systems by evaluating persistence methods using different storage and memory devices for Apache Spark and the use of Alluxio. It also proposes and evaluates via simulation a Spark execution model for using disaggregated off- rack memory and non-volatile memory targeting next-generation software-defined infrastructure. Experimental results provide better understanding of behaviors and requirements for improving data persistence in current in-memory systems and provide data points to better understand requirements and design choices for next-generation software-defined infrastructure. The findings indicate that in-memory processing systems can benefit from ongoing software-defined infrastructure implementations; however current frameworks need to be enhanced appropriately to run efficiently at scale. 
    more » « less
  2. Computational demand has brought major changes to Advanced Cyber-Infrastructure (ACI) architectures. It is now possible to run scientific simulations faster and obtain more accurate results. However, power and energy have become critical concerns. Also, the current roadmap toward the new generation of ACI includes power budget as one of the main constraints. Current research efforts have studied power and performance tradeoffs and how to balance these (e.g., using Dynamic Voltage and Frequency Scaling (DVFS) and power capping for meeting power constraints, which can impact performance). However, applications may not tolerate degradation in performance, and other tradeoffs need to be explored to meet power budgets (e.g., involving the application in making energy-performance-quality tradeoff decisions). This paper proposes using the properties of AMR-based algorithms (e.g., dynamically adjusting the resolution of a simulation in combination with power capping techniques) to schedule or re-distribute the power budget. It specifically explores the opportunities to realize such an approach using checkpointing as a proof-of-concept use case and provides a characterization of a representative set of applications that use Adaptive Mesh Refinement (AMR) methods, including a Low- Mach-Number Combustion (LMC) application. It also explores the potential of utilizing power capping to understand power- quality tradeoffs via simulation. 
    more » « less
  3. As data analytics applications become increasingly important in a wide range of domains, the ability to develop large-scale and sustainable platforms and software infrastructure to support these applications has significant potential to drive research and innovation in both science and business domains. This paper characterizes performance and power-related behavior trends and tradeoffs of the two predominant frameworks for Big Data analytics (i.e., Apache Hadoop and Spark) for a range of representative applications. It also evaluates system design knobs, such as storage and network technologies and power capping techniques. Experimental results from empirical executions provide meaningful data points for exploring the potential of software-defined infrastructure for Big Data processing systems through simulation. The results provide better understanding of the design space to build multi-criteria application-centric models as well as show significant advantages of software-defined infrastructure in terms of execution time, energy and cost. It motivates further research focused on in-memory processing formulations regarding systems with deeper memory hierarchies and software-defined infrastructure. 
    more » « less
  4. Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex natural and engineered phenomena. However, the increasing complexity necessitates managing, transporting, and processing unprecedented amounts of data, and as a result, researchers are increasingly exploring data-staging and in-situ workflows to reduce data movement and data-related overheads. However, as these workflows become more dynamic in their structures and behaviors, data staging and in-situ solutions must evolve to support new requirements. In this paper, we explore how the service-oriented concept can be applied to extreme-scale in-situ workflows. Specifically, we explore persistent data staging as a service and present the design and implementation of DataSpaces as a Service, a service-oriented data staging framework. We use a dynamically coupled fusion simulation workflow to illustrate the capabilities of this framework and evaluate its performance and scalability. 
    more » « less