skip to main content


Search for: All records

Creators/Authors contains: "Zhong, Lin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This paper questions a fundamental assumption by a modern operating system (OS): it must run in the same computer it manages. We show that for many desirable OS functions, embedded systems often do not have the necessary resources. By carefully offloading some OS functions to another more resourceful computer, e.g., the cloud, one not only immediately overcomes the local resource limits but also opens the door for interesting optimizations because the remote computer becomes an advantageous point of aggregation and coordination. We discuss the challenges to offloading OS functions and their potential solutions. We also share some preliminary results of offloading system initialization logic and dynamic memory management from a microcontroller-based embedded system. 
    more » « less
    Free, publicly-accessible full text available February 28, 2025
  2. A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction. The Union-Find (UF) decoder is promising with an average time complexity slightly higher than $O(d^3)$. We report a distributed version of the UF decoder that exploits parallel computing resources for further speedup. Using an FPGA-based implementation, we empirically show that this distributed UF decoder has a sublinear average time complexity with regard to $d$, given $O(d^3)$ parallel computing resources. The decoding time per measurement round decreases as $d$ increases, the first time for a quantum error decoder. The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure. Using a Xilinx VCU129 FPGA, we successfully implement $d$ up to 21 with an average decoding time of 11.5 ns per measurement round under 0.1\% phenomenological noise, and 23.7 ns for $d=17$ under equivalent circuit-level noise. This performance is significantly faster than any existing decoder implementation. Furthermore, we show that \name can optimize for resource efficiency by decoding $d=51$ on a Xilinx VCU129 FPGA with an average latency of 544 ns per measurement round. 
    more » « less
  3. Stack unwinding is a well-established approach for handling panics in Rust programs. However, its feasibility on resource- constrained embedded systems has been unclear due to the associated overhead and complexity. This paper presents our experience of implementing stack unwinding and panic recovery within a Rust-based soft real-time embedded oper- ating system. We describe several novel optimizations that help achieve adequate performance for a ying drone with a CPU overhead of 2.6% and a storage overhead of 26.0% to recover from panics in application tasks and interrupt handlers. 
    more » « less
  4. The Minimum-Weight Perfect Matching (MWPM) decoder is widely used in Quantum Error Correction (QEC) decoding. Despite its high accuracy, existing implementations of the MWPM decoder cannot catch up with quantum hardware, e.g., 1 million measurements per second for superconducting qubits. They suffer from a backlog of measurements that grows exponentially and as a result, cannot realize the power of quantum computation. We design and implement a fast MWPM decoder, called Parity Blossom, which reaches a time complexity almost proportional to the number of defect measurements. We further design and implement a parallel version of Parity Blossom called Fusion Blossom. Given a practical circuit-level noise of 0.1%, Fusion Blossom can decode a million measurement rounds per second up to a code distance of 33. Fusion Blossom also supports stream decoding mode that reaches a 0.7 ms decoding latency at code distance 21 regardless of the measurement rounds. 
    more » « less
  5. A fault-tolerant quantum computer must decode and correct errors faster than they appear. The faster errors can be corrected, the more time the computer can do useful work. The Union-Find (UF) decoder is promising with an average time complexity slightly higher than O(d3). We report a distributed version of the UF decoder that exploits parallel computing resources for further speedup. Using an FPGA-based implementation, we empirically show that this distributed UF decoder has a sublinear average time complexity with regard to d, given O(d3) parallel computing resources. The decoding time per measurement round decreases as d increases, a first time for a quantum error decoder. The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure. We are able to implement d up to 21 with a Xilinx VCU129 FPGA, for which an average decoding time is 11.5 ns per measurement round under phenomenological noise of 0.1%, significantly faster than any existing decoder implementation. Since the decoding time per measurement round of Helios decreases with d, Helios can decode a surface code of arbitrarily large d without a growing backlog. 
    more » « less
  6. Microcontrollers are the heart of embedded systems. Due to cost and power constraints, they do not have memory management units (MMUs) or even memory protection units (MPUs). As a result, embedded software faces two related challenges both concerned with the stack. First, in a multi-tasking environment, physical memory used by the stack is usually statically allocated per task. Second, a stack overflow is difficult to detect for lower-end microcontrollers without an MPU. In this work, we argue that segmented stacks, a notion investigated and subsequently dismissed for systems with virtual memory, can solve both challenges for embedded software. We show that many problems with segmented stacks vanish on embedded systems and present novel solutions to the rest. Importantly, we show that segmented stacks, combined with Rust, can guarantee memory safety without MMU or MPU. Moreover, segmented stacks allow memory to be dynamically allocated to per-task stacks and can improve memory efficiency when combined with proper scheduling. 
    more » « less
  7. Massive multiple-input multiple-output (mMIMO) technology uses a very large number of antennas at base stations to significantly increase efficient use of the wireless spectrum. Thus, mMIMO is considered an essential part of 5G and beyond. However, developing a scalable and reliable mMIMO system is an extremely challenging task, significantly hampering the ability of the research community to research nextgeneration networks. This "research bottleneck" motivated us to develop a deployable experimental mMIMO platform to enable research across many areas. We also envision that this platform could unleash novel collaborations between communications, computing, and machine learning researchers to completely rethink next-generation networks. 
    more » « less
  8. Abstract

    Triboelectric nanogenerators offer an environmentally friendly approach to harvesting energy from mechanical excitations. This capability has made them widely sought‐after as an efficient, renewable, and sustainable energy source, with the potential to decrease reliance on traditional fossil fuels. However, developing triboelectric nanogenerators with specific output remains a challenge mainly due to the uncertainties associated with their complex designs for real‐life applications. Artificial intelligence‐enabled inverse design is a powerful tool to realize performance‐oriented triboelectric nanogenerators. This is an emerging scientific direction that can address the concerns about the design and optimization of triboelectric nanogenerators leading to a next generation nanogenerator systems. This perspective paper aims at reviewing the principal analysis of triboelectricity, summarizing the current challenges of designing and optimizing triboelectric nanogenerators, and highlighting the physics‐informed inverse design strategies to develop triboelectric nanogenerators. Strategic inverse design is particularly discussed in the contexts of expanding the four‐mode analytical models by physics‐informed artificial intelligence, discovering new conductive and dielectric materials, and optimizing contact interfaces. Various potential development levels of artificial intelligence‐enhanced triboelectric nanogenerators are delineated. Finally, the potential of physics‐informed artificial intelligence inverse design to propel triboelectric nanogenerators from prototypes to multifunctional intelligent systems for real‐life applications is discussed.

     
    more » « less