skip to main content


Search for: All records

Award ID contains: 1464317

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. As in-memory data analytics become increasingly important in a wide range of domains, the ability to develop large-scale and sustainable platforms faces significant challenges related to storage latency and memory size constraints. These challenges can be resolved by adopting new and effective formulations and novel architectures such as software-defined infrastructure. This paper investigates the key issue of data persistency for in-memory processing systems by evaluating persistence methods using different storage and memory devices for Apache Spark and the use of Alluxio. It also proposes and evaluates via simulation a Spark execution model for using disaggregated off- rack memory and non-volatile memory targeting next-generation software-defined infrastructure. Experimental results provide better understanding of behaviors and requirements for improving data persistence in current in-memory systems and provide data points to better understand requirements and design choices for next-generation software-defined infrastructure. The findings indicate that in-memory processing systems can benefit from ongoing software-defined infrastructure implementations; however current frameworks need to be enhanced appropriately to run efficiently at scale. 
    more » « less
  2. Computational demand has brought major changes to Advanced Cyber-Infrastructure (ACI) architectures. It is now possible to run scientific simulations faster and obtain more accurate results. However, power and energy have become critical concerns. Also, the current roadmap toward the new generation of ACI includes power budget as one of the main constraints. Current research efforts have studied power and performance tradeoffs and how to balance these (e.g., using Dynamic Voltage and Frequency Scaling (DVFS) and power capping for meeting power constraints, which can impact performance). However, applications may not tolerate degradation in performance, and other tradeoffs need to be explored to meet power budgets (e.g., involving the application in making energy-performance-quality tradeoff decisions). This paper proposes using the properties of AMR-based algorithms (e.g., dynamically adjusting the resolution of a simulation in combination with power capping techniques) to schedule or re-distribute the power budget. It specifically explores the opportunities to realize such an approach using checkpointing as a proof-of-concept use case and provides a characterization of a representative set of applications that use Adaptive Mesh Refinement (AMR) methods, including a Low- Mach-Number Combustion (LMC) application. It also explores the potential of utilizing power capping to understand power- quality tradeoffs via simulation. 
    more » « less
  3. Large scale observatories are shared-use resources that provide open access to data from geographically distributed sensors and instruments. This data has the potential to accelerate scientific discovery. However, seamlessly integrating the data into scientific workflows remains a challenge. In this paper, we summarize our ongoing work in supporting data-driven and data-intensive workflows and outline our vision for how these observatories can improve large-scale science. Specifically, we present programming abstractions and runtime management services to enable the automatic integration of data in scientific workflows. Further, we show how approximation techniques can be used to address network and processing variations by studying constraint limitations and their associated latencies. We use the Ocean Observatories Initiative (OOI) as a driving use case for this work. 
    more » « less
  4. The emergence of Internet of Things (IoT) is participating to the increase of data-and energy-hungry applications. As connected devices do not yet offer enough capabilities for sustaining these applications, users perform computation offloading to the cloud. To avoid network bottlenecks and reduce the costs associated to data movement, edge cloud solutions have started being deployed, thus improving the Quality of Service. In this paper, we advocate for leveraging on-site renewable energy production in the different edge cloud nodes to green IoT systems while offering improved QoS compared to core cloud solutions. We propose an analytic model to decide whether to offload computation from the objects to the edge or to the core Cloud, depending on the renewable energy availability and the desired application QoS. This model is validated on our application use-case that deals with video stream analysis from vehicle cameras. 
    more » « less
  5. As data analytics applications become increasingly important in a wide range of domains, the ability to develop large-scale and sustainable platforms and software infrastructure to support these applications has significant potential to drive research and innovation in both science and business domains. This paper characterizes performance and power-related behavior trends and tradeoffs of the two predominant frameworks for Big Data analytics (i.e., Apache Hadoop and Spark) for a range of representative applications. It also evaluates system design knobs, such as storage and network technologies and power capping techniques. Experimental results from empirical executions provide meaningful data points for exploring the potential of software-defined infrastructure for Big Data processing systems through simulation. The results provide better understanding of the design space to build multi-criteria application-centric models as well as show significant advantages of software-defined infrastructure in terms of execution time, energy and cost. It motivates further research focused on in-memory processing formulations regarding systems with deeper memory hierarchies and software-defined infrastructure. 
    more » « less
  6. Enterprise and Cloud environments are rapidly evolving with the use of lightweight virtualization mechanisms such as containers. Containerization allow users to deploy applications in any environment faster and more efficiently than using virtual machines. However, most of the work in this area focused on Linux-based containerization such as Docker and LXC and other mature solutions such as FreeBSD Jails have not been adopted by production-ready environments. In this work we explore the use of FreeBSD virtualization and provide a comparative study with respect to Linux containerization using Apache Spark. Preliminary results show that, while Linux containers provide better performance, FreeBSD solutions provide more stable and consistent results. 
    more » « less
  7. A Distributed Denial of Service (DDoS) attack is an attempt to make an online service, a network, or even an entire organization, unavailable by saturating it with traffic from multiple sources. DDoS attacks are among the most common and most devastating threats that network defenders have to watch out for. DDoS attacks are becoming bigger, more frequent, and more sophisticated. Volumetric attacks are the most common types of DDoS attacks. A DDoS attack is considered volumetric, or high-rate, when within a short period of time it generates a large amount of packets or a high volume of traffic. High-rate attacks are well-known and have received much attention in the past decade; however, despite several detection and mitigation strategies have been designed and implemented, high-rate attacks are still halting the normal operation of information technology infrastructures across the Internet when the protection mechanisms are not able to cope with the aggregated capacity that the perpetrators have put together. With this in mind, the present paper aims to propose and test a distributed and collaborative architecture for online high-rate DDoS attack detection and mitigation based on an in-memory distributed graph data structure and unsupervised machine learning algorithms that leverage real-time streaming data and analytics. We have successfully tested our proposed mechanism using a real-world DDoS attack dataset at its original rate in pursuance of reproducing the conditions of an actual large scale attack. 
    more » « less