skip to main content


Title: Membrane-based optomechanical accelerometry
Optomechanical accelerometers promise quantum-limited readout, high bandwidth, self-calibration, and radiation-pressure stabilization. We present a simple, scalable platform that enables these benefits with sub-µg sensitivity and 10 kHz bandwidth, based on a pair of vertically integrated SiN membranes.  more » « less
Award ID(s):
1945832
PAR ID:
10291515
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Conference on Lasers and Electro-Optics
Page Range / eLocation ID:
JTu2I.3
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Spatiotemporal variation in cellular bandwidth availability is well-known and could affect a mobile user's quality of experience (QoE), especially while using bandwidth intensive streaming applications such as movies, podcasts, and music videos during commute. If such variations are made available to a streaming service in advance it could perhaps plan better to avoid sub-optimal performance while the user travels through regions of low bandwidth availability. The intuition is that such future knowledge could be used to buffer additional content in regions of higher bandwidth availability to tide over the deficits in regions of low bandwidth availability. Foresight is a service designed to provide this future knowledge for client apps running on a mobile device. It comprises three components: (a) a crowd-sourced bandwidth estimate reporting facility, (b) an on-cloud bandwidth service that records the spatiotemporal variations in bandwidth and serves queries for bandwidth availability from mobile users, and (c) an on-device bandwidth manager that caters to the bandwidth requirements from client apps by providing them with bandwidth allocation schedules. Foresight is implemented in the Android framework. As a proof of concept for using this service, we have modified an open-source video player---Exoplayer---to use the results of Foresight in its video buffer management. Our performance evaluation shows Foresight's scalability. We also showcase the opportunity that Foresight offers to ExoPlayer to enhance video quality of experience (QoE) despite spatiotemporal bandwidth variations for metrics such as overall higher bitrate of playback, reduction in number of bitrate switches, and reduction in the number of stalls during video playback. 
    more » « less
  2. null (Ed.)
    Contemporary GPUs support multiple kernels to run concurrently on the same streaming multiprocessors (SMs). Recent studies have demonstrated that such concurrent kernel execution (CKE) improves both resource utilization and computational throughput. Most of the prior works focus on partitioning the GPU resources at the cooperative thread array (CTA) level or the warp scheduler level to improve CKE. However, significant performance slowdown and unfairness are observed when latency-sensitive kernels co-run with bandwidth-intensive ones. The reason is that bandwidth over-subscription from bandwidth-intensive kernels leads to much aggravated memory access latency, which is highly detrimental to latency-sensitive kernels. Even among bandwidth-intensive kernels, more intensive kernels may unfairly consume much higher bandwidth than less-intensive ones. In this article, we first make a case that such problems cannot be sufficiently solved by managing CTA combinations alone and reveal the fundamental reasons. Then, we propose a coordinated approach for CTA combination and bandwidth partitioning. Our approach dynamically detects co-running kernels as latency sensitive or bandwidth intensive. As both the DRAM bandwidth and L2-to-L1 Network-on-Chip (NoC) bandwidth can be the critical resource, our approach partitions both bandwidth resources coordinately along with selecting proper CTA combinations. The key objective is to allocate more CTA resources for latency-sensitive kernels and more NoC/DRAM bandwidth resources to NoC-/DRAM-intensive kernels. We achieve it using a variation of dominant resource fairness (DRF). Compared with two state-of-the-art CKE optimization schemes, SMK [52] and WS [55], our approach improves the average harmonic speedup by 78% and 39%, respectively. Even compared to the best possible CTA combinations, which are obtained from an exhaustive search among all possible CTA combinations, our approach improves the harmonic speedup by up to 51% and 11% on average. 
    more » « less
  3. null (Ed.)
    As FPGA-based accelerators become ubiquitous and more powerful, the demand for integration with High-Performance Memory (HPM) grows. Although HPMs offer a much greater bandwidth than standard DDR4 DRAM, they introduce new design challenges such as increased latency and higher bandwidth mismatch between memory and FPGA cores. This paper presents a scalable architecture for convolutional neural network accelerators conceived specifically to address these challenges and make full use of the memory's high bandwidth. The accelerator, which was designed using high-level synthesis, is highly configurable. The intrinsic parallelism of its architecture allows near-perfect scaling up to saturating the available memory bandwidth. 
    more » « less
  4. Network architects are frequently presented with a tradeoff: either (a) introduce a new or improved control-/management plane application that boosts overall performance, or (b) use the bandwidth it would have occupied to deliver user traffic. In this paper, we present OrbWeaver, a framework that can exploit unused network bandwidth for in-network coordination. Using real hardware, we demonstrate that OrbWeaver can harvest this bandwidth (1) with little-to-no impact on the bandwidth/latency of user packets and (2) while providing guarantees on the interarrival time of the injected traffic. Through an exploration of three example use cases, we show that this opportunistic coordination abstraction is sufficient to approximate recently proposed systems without any of their associated bandwidth overheads. 
    more » « less
  5. Memory Hard Functions (MHFs) have been proposed as an answer to the growing inequality between the computational speed of general purpose CPUs and ASICs. MHFs have seen widespread applications including password hashing, key stretching and proofs of work. Several metrics have been proposed to quantify the memory hardness of a function. Cumulative memory complexity (CMC) quantifies the cost to acquire/build the hardware to evaluate the function repeatedly at a given rate. By contrast, bandwidth hardness quantifies the energy costs of evaluating this function. Ideally, a good MHF would be both bandwidth hard and have high CMC. While the CMC of leading MHF candidates is well understood, little is known about the bandwidth hardness of many prominent MHF candidates. Our contributions are as follows: First, we provide the first reduction proving that, in the parallel random oracle model (pROM), the bandwidth hardness of a data-independent MHF (iMHF) is described by the red-blue pebbling cost of the directed acyclic graph associated with that iMHF. Second, we show that the goals of designing an MHF with high CMC/bandwidth hardness are well aligned. Any function (data-independent or not) with high CMC also has relatively high bandwidth costs. Third, we prove that in the pROM the prominent iMHF candidates such as Argon2i, aATSample and DRSample are maximally bandwidth hard. Fourth, we prove the first unconditional tight lower bound on the bandwidth hardness of a prominent data-dependent MHF called Scrypt in the pROM. Finally, we show the problem of finding the minimum cost red–blue pebbling of a directed acyclic graph is NP-hard. 
    more » « less