skip to main content


Search for: All records

Creators/Authors contains: "Zou, Chen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Computational storage adds computing to storage devices, providing potential benefits in offload, data-reduction, and lower energy. Successful computational SSD architectures should match growing flash bandwidth, which in turn requires high SSD DRAM memory bandwidth. This creates a memory wall scaling problem, resulting from SSDs’ stringent power and cost constraints. A survey of recent computational SSD research shows that many computational storage offloads are suited to stream computing. To exploit this opportunity, we propose a novel general-purpose computational SSD and core architecture, called ASSASIN (Architecture Support for Stream computing to Accelerate computatIoNal Storage). ASSASIN provides a unified set of compute engines between SSD DRAM and the flash array. This eliminates the SSD DRAM bottleneck by enabling direct computing on flash data streams. ASSASIN further employs a crossbar to achieve performance even when flash data layout is uneven and preserve independence for page layout decisions in the flash translation layer. With stream buffers and scratchpad memories, ASSASIN core’s memory hierarchy and instruction set extensions provide superior low-latency access at low-power and effectively keep streaming flash data out of the in-SSD cache-DRAM memory hierarchy, thereby solving the memory wall. Evaluation shows that ASSASIN delivers 1.5x - 2.4x speedup for offloaded functions compared to state-of-the-art computational SSD architectures. Further, ASSASIN’s streaming approach yields 2.0x power efficiency and 3.2x area efficiency improvement. And these performance benefits at the level of computational SSDs translate to 1.1x - 1.5x end-to-end speedups on data analytics workloads. 
    more » « less
  2. Shuffle is an indispensable process in distributed online analytical processing systems to enable task-level parallelism exploitation via multiple nodes. As a data-intensive data reorganization process, shuffle implemented on general-purpose CPUs not only incurs data traffic back and forth between the computing and storage resources, but also pollutes the cache hierarchy with almost zero data reuse. As a result, shuffle can easily become the bottleneck of distributed analysis pipelines.Our PSACS approach attacks these bottlenecks with the rising computational storage paradigm. Shuffle is offloaded to the storage-side PSACS accelerator to avoid polluting computing node memory hierarchy and enjoy the latency, bandwidth and energy benefits of near-data computing. Further, the microarchitecture of PSACS exploits data-, subtask-, and task-level parallelism for high performance and a customized scratchpad for fast on-chip random access.PSACS achieves 4.6x—5.7x shuffle throughput at kernel-level and up to 1.3x overall shuffle throughput with only a twentieth of CPU utilization comparing to software baselines. These mount up to 23% end-to-end OLAP query speedup on average. 
    more » « less
  3. null (Ed.)
  4. null (Ed.)
  5.  
    more » « less
  6. Metal halide perovskite light-emitting diodes (PeLEDs) have experienced a rapid advancement in the last several years with the external quantum efficiencies (EQEs) reaching over 20%, comparable to the state-of-the-art organic LEDs and quantum dot LEDs. The photoluminescence quantum yields of perovskite films have also been approaching 100%. Therefore, the next step to improving the EQE of PeLEDs should be focused on boosting light extraction. In this Letter, we demonstrate the emitter dipole orientation as a key parameter in determining the outcoupling efficiency of PeLEDs. We find that theCsPbBr3emitter has a slightly preferred orientation with the horizontal-to-vertical dipole ratio of 0.41:0.59, as compared to 0.33:0.67 in the isotropic case. A theoretical analysis predicts that a purely anisotropic perovskite emitter may result in a maximum EQE of 36%.

     
    more » « less
  7. In this paper, we demonstrate a new method to pattern perovskites using a dry lift-off process. By utilizing parylene-C as a sacrificial layer, patterns with <12 um features and multi-color patterns can be achieved. 
    more » « less