skip to main content


Title: ReFlex4ARM: Supporting 100GbE Flash Storage Disaggregation on ARM SoC
Abstract—Flash Disaggregation enables to share flash storage across the data center, improving resource utilization and reduc- ing the total cost of ownership (TCO). Previous work on flash disaggregation utilized costly server processors leaving significant headroom for optimizing TCO. In this work, we develop a new flash disaggregation system based on a cost-effective and power- efficient ARM-based Smart NIC. This work introduces our archi- tecture and provides a comprehensive evaluation outperforming previous work in TCO by 2.57x.  more » « less
Award ID(s):
1841545
NSF-PAR ID:
10136133
Author(s) / Creator(s):
Date Published:
Journal Name:
OCP Future Technology Symposium
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. After over a decade of researcher anticipation for the arrival of persistent memory (PMem), the first shipments of 3D XPoint-based Intel Optane Memory in 2019 were quickly followed by its cancellation in 2022. Was this another case of an idea quickly fading from future to past tense, relegating work in this area to the graveyard of failed technologies? The recently introduced Compute Express Link (CXL) may offer a path forward, with its persistent memory profile offering a universal PMem attachment point. Yet new technologies for memory-speed persistence seem years off, and may never become competitive with evolving DRAM and flash speeds. Without persistent memory itself, is future PMem research doomed? We offer two arguments for why reports of the death of PMem research are greatly exaggerated. First, the bulk of persistent-memory research has not in fact addressed memory persistence, but rather in-memory crash consistency, which was never an issue in prior systems where CPUs could not observe post-crash memory states. CXL memory pooling allows multiple hosts to share a single memory, all in different failure domains, raising crash-consistency issues even with volatile memory. Second, we believe CXL necessitates a ``disaggregation'' of PMem research. Most work to date assumed a single technology and set of features, \ie speed, byte addressability, and CPU load/store access. With an open interface allowing new topologies and diverse PMem technologies, we argue for the need to examine these features individually and in combination. While one form of PMem may have been canceled, we argue that the research problems it raised not only remain relevant but have expanded in a CXL-based future. 
    more » « less
  2. Providing itemized energy consumption in a utility bill is becoming a priority, and perhaps a business practice in the near term. In recent times, a multitude of systems have been developed such as smart plugs, smart circuit breakers etc., for non-intrusive load monitoring (NILM). They are integrated either with the smart meters or at the plug-levels to footprint appliance-level energy consumption patterns in an entire home environment While deploying the existing technologies in a single home is feasible, scaling these technological advancements across thousands of homes in a region is not realized yet. This is primarily due to the cost, deployment complexity, and intrusive nature associated with these types of real deployment. Motivated by these shortcomings, in this paper we investigate the first step to address scalable disaggregation by proposing a disaggregation mechanism that works on a large dataset to accurately deconstruct the cumulative signals. We propose an iterative noise separation based approach to perform energy disaggregation using sparse coding based methodologies which work at the single ingress point of a home, i.e., at the meter level. We performed a ranked iterative signal removal methodology that effectively isolates appliances' individual signal waveform as noise on an aggregate energy datasets with moderate granularity (1 min). We performed experiments on real dataset and obtained approximately 94% energy disaggregation, i.e., disaggregated appliance-wise signal estimation accuracy. 
    more » « less
  3. With the acceleration of ICT technologies and the Internet of Things (IoT) paradigm, smart residential environments , also known as smart homes are becoming increasingly common. These environments have significant potential for the development of intelligent energy management systems, and have therefore attracted significant attention from both academia and industry. An enabling building block for these systems is the ability of obtaining energy consumption at the appliance-level. This information is usually inferred from electric signals data (e.g., current) collected by a smart meter or a smart outlet, a problem known as appliance recognition . Several previous approaches for appliance recognition have proposed load disaggregation techniques for smart meter data. However, these approaches are often very inaccurate for low consumption and multi-state appliances. Recently, Machine Learning (ML) techniques have been proposed for appliance recognition. These approaches are mainly based on passive MLs, thus requiring pre-labeled data to be trained. This makes such approaches unable to rapidly adapt to the constantly changing availability and heterogeneity of appliances on the market. In a home setting scenario, it is natural to consider the involvement of users in the labeling process, as appliances’ electric signatures are collected. This type of learning falls into the category of Stream-based Active Learning (SAL). SAL has been mainly investigated assuming the presence of an expert , always available and willing to label the collected samples. Nevertheless, a home user may lack such availability, and in general present a more erratic and user-dependent behavior. In this paper, we develop a SAL algorithm, called K -Active-Neighbors (KAN), for the problem of household appliance recognition. Differently from previous approaches, KAN jointly learns the user behavior and the appliance signatures. KAN dynamically adjusts the querying strategy to increase accuracy by considering the user availability as well as the quality of the collected signatures. Such quality is defined as a combination of informativeness , representativeness , and confidence score of the signature compared to the current knowledge. To test KAN versus state-of-the-art approaches, we use real appliance data collected by a low-cost Arduino-based smart outlet as well as the ECO smart home dataset. Furthermore, we use a real dataset to model user behavior. Results show that KAN is able to achieve high accuracy with minimal data, i.e., signatures of short length and collected at low frequency. 
    more » « less
  4. To keep global surface warming below 1.5°C by 2100, the portfolio of cost-effective CDR technologies must expand. To evaluate the potential of macroalgae CDR, we developed a kelp aquaculture bio-techno-economic model in which large quantities of kelp would be farmed at an offshore site, transported to a deep water “sink site”, and then deposited below the sequestration horizon (1,000 m). We estimated the costs and associated emissions of nursery production, permitting, farm construction, ocean cultivation, biomass transport, and Monitoring, Reporting, and Verification (MRV) for a 1,000 acre (405 ha) “baseline” project located in the Gulf of Maine, USA. The baseline kelp CDR model applies current systems of kelp cultivation to deep water (100 m) exposed sites using best available modeling methods. We calculated the levelized unit costs of CO 2 eq sequestration (LCOC; $ tCO 2 eq -1 ). Under baseline assumptions, LCOC was $17,048 tCO 2 eq -1 . Despite annually sequestering 628 tCO 2 eq within kelp biomass at the sink site, the project was only able to net 244 C credits (tCO 2 eq) each year, a true sequestration “additionality” rate (AR) of 39% (i.e., the ratio of net C credits produced to gross C sequestered within kelp biomass). As a result of optimizing 18 key parameters for which we identified a range within the literature, LCOC fell to $1,257 tCO 2 eq -1 and AR increased to 91%, demonstrating that substantial cost reductions could be achieved through process improvement and decarbonization of production supply chains. Kelp CDR may be limited by high production costs and energy intensive operations, as well as MRV uncertainty. To resolve these challenges, R&D must (1) de-risk farm designs that maximize lease space, (2) automate the seeding and harvest processes, (3) leverage selective breeding to increase yields, (4) assess the cost-benefit of gametophyte nursery culture as both a platform for selective breeding and driver of operating cost reductions, (5) decarbonize equipment supply chains, energy usage, and ocean cultivation by sourcing electricity from renewables and employing low GHG impact materials with long lifespans, and (6) develop low-cost and accurate MRV techniques for ocean-based CDR. 
    more » « less
  5. On large-scale high performance computing (HPC) systems, applications are provisioned with aggregated resources to meet their peak demands for brief periods. This results in resource underutilization because application requirements vary a lot during execution. This problem is particularly pronounced for deep learning applications that are running on leadership HPC systems with a large pool of burst buffers in the form of flash or non-volatile memory (NVM) devices. In this paper, we examine the I/O patterns of deep neural networks and reveal their critical need of loading many small samples randomly for successful training. We have designed a specialized Deep Learning File System (DLFS) that provides a thin set of APIs. Particularly, we design the metadata management of DLFS through an in-memory tree-based sample directory and its file services through the user-level SPDK protocol that can disaggregate the capabilities of NVM Express (NVMe) devices to parallel training tasks. Our experimental results show that DLFS can dramatically improve the throughput of training for deep neural networks on NVMe over Fabric, compared with the kernel-based Ext4 file system. Furthermore, DLFS achieves efficient user-level storage disaggregation with very little CPU utilization. 
    more » « less