NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Zoomie: A Software-like Debugging Tool for FPGAs

https://doi.org/10.1145/3620666.3651356

Wei, Tianrui; Laeufer, Kevin; Lim, Katie; Zhao, Jerry; Sen, Koushik; Balkind, Jonathan; Asanovic, Krste (April 2024, Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3)

FPGA prototyping has long been an indispensable technique in pre-silicon verification as well as enabling early-stage software development. FPGAs themselves have also gained popularity as hardware accelerators deployed in datacenters. However, FPGA development brings a plethora of problems. These issues constitute a high barrier towards mass adoption of agile development surrounding FPGA-based projects.To address these problems, we have built Zoomie for fast incremental compilation, reusing verification infrastructure, and a software-inspired approach towards open-source emulation. We show that Zoomie achieves 18\texttimes{} speedup over the vendor toolchain in incremental compilation time for million-gate designs. At the same time, Zoomie also provides a software-like debugging experience with breakpoints, stepping the design, and forcing values in a running design.
more » « less
Full Text Available
AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads

https://doi.org/10.1145/3613424.3614280

Kim, Seah; Zhao, Jerry; Asanovic, Krste; Nikolic, Borivoje; Shao, Yakun Sophia (October 2023, IEEE/ACM International Symposium on Microarchitecture)
Constellation: An Open-Source SoC-Capable NoC Generator

https://doi.org/10.1109/NoCArc57472.2022.9911299

Zhao, Jerry; Agrawal, Animesh; Nikolic, Borivoje; Asanovic, Krste (October 2022, 2022 15th IEEE/ACM International Workshop on Network on Chip Architectures (NoCArc))

Full Text Available
CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems

https://doi.org/10.1145/3579371.3589074

Karandikar, Sagar; Udipi, Aniruddha N.; Choi, Junsun; Whangbo, Joonho; Zhao, Jerry; Kanev, Svilen; Lim, Edwin; Alakuijala, Jyrki; Madduri, Vrishab; Shao, Yakun Sophia; et al (June 2023, ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture)

Full Text Available
Profiling Hyperscale Big Data Processing

https://doi.org/10.1145/3579371.3589082

Gonzalez, Abraham; Kolli, Aasheesh; Khan, Samira; Liu, Sihang; Dadu, Vidushi; Karandikar, Sagar; Chang, Jichuan; Asanovic, Krste; Ranganathan, Parthasarathy (January 2023, Proceedings of the 50th Annual International Symposium on Computer Architecture)

Full Text Available
A Hardware Accelerator for Protocol Buffers

https://doi.org/10.1145/3466752.3480051

Karandikar, Sagar; Leary, Chris; Kennelly, Chris; Zhao, Jerry; Parimi, Dinesh; Nikolic, Borivoje; Asanovic, Krste; Ranganathan, Parthasarathy (October 2021, 54th Annual IEEE/ACM International Symposium on Microarchitecture)

Full Text Available
Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs

https://doi.org/10.1109/ISCAS51556.2021.9401515

Amid, Alon; Ou, Albert; Asanovic, Krste; Shao, Yakun Sophia; Nikolic, Borivoje (May 2021, 2021 IEEE International Symposium on Circuits and Systems (ISCAS))
null (Ed.)
The design of computing systems has changed dramatically over the past decade, but most courses in advanced computer architecture remain unchanged. Computer architecture education lies at the intersection between computer science and electrical engineering, with practical exercises in classes based on appropriate levels of abstraction in the computing system design stack. Hardware-centric lab exercises often require broad infrastructure resources and tend to navigate around tedious practical implementation concepts, while software-centric exercises leave a gap between modeling and system implementation implications that students later need to overcome in professional settings. Vertical integration trends in domain-specific compute systems, as well as software-hardware co-design, are often covered in classroom lectures, but are not reflected in laboratory exercises due to complex tooling and simulation infrastructure. We describe our experiences with a joint hardware-software approach to exploring computer architecture concepts in class exercises, by using opensource processor hardware implementations, generator-based hardware design methodologies, and cloud-hosted FPGAs. This approach further enables scaling course enrollment, remote learning and a cross-class collaborative lab ecosystem, creating a connecting thread between computer science and electrical engineering experience-based curricula.
more » « less
Full Text Available
COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors

https://doi.org/10.1109/ISPASS51385.2021.00053

Zhao, Jerry; Gonzalez, Abraham; Amid, Alon; Karandikar, Sagar; Asanovic, Krste (March 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS))
null (Ed.)
We present COBRA, a framework which enables a realistic hardware-guided methodology for evaluating compositions of hardware branch predictors. COBRA provides a common interface for developing RTL implementations of predictor subcomponents, as well as a predictor composer that automatically generates hardware predictor pipelines from sub-components based on a high-level topological model of a desired algorithm. We demonstrate how COBRA aids in the design and evaluation of diverse predictor architectures and how our hardware-centric approach captures concerns in predictor characterization that are not exposed in software-based algorithm development. Using COBRA, we generate three superscalar pipelined branch predictors with diverse architectures, synthesize them to run at 1 GHz on a commercial FinFET process, integrate them with the open-source BOOM out-of-order core, and evaluate their endto- end performance on workloads over trillions of cycles. The COBRA generator system has been open-sourced as part of the SonicBOOM out-of-order core.
more » « less
Full Text Available
NeuroVectorizer: end-to-end vectorization with deep reinforcement learning

https://doi.org/10.1145/3368826.3377928

Haj-Ali, Ameer; Ahmed, Nesreen K.; Willke, Ted; Shao, Yakun Sophia; Asanovic, Krste; Stoica, Ion (February 2020, CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization)
null (Ed.)
One of the key challenges arising when compilers vectorize loops for today’s SIMD-compatible architectures is to decide if vectorization or interleaving is beneficial. Then, the compiler has to determine the number of instructions to pack together and the interleaving level (stride). Compilers are designed today to use fixed-cost models that are based on heuristics to make vectorization decisions on loops. However, these models are unable to capture the data dependency, the computation graph, or the organization of instructions. Alternatively, software engineers often hand-write the vectorization factors of every loop. This, however, places a huge burden on them, since it requires prior experience and significantly increases the development time. In this work, we explore a novel approach for handling loop vectorization and propose an end-to-end solution using deep reinforcement learning (RL). We conjecture that deep RL can capture different instructions, dependencies, and data structures to enable learning a sophisticated model that can better predict the actual performance cost and determine the optimal vectorization factors. We develop an end-to-end framework, from code to vectorization, that integrates deep RL in the LLVM compiler. Our proposed framework takes benchmark codes as input and extracts the loop codes. These loop codes are then fed to a loop embedding generator that learns an embedding for these loops. Finally, the learned embeddings are used as input to a Deep RL agent, which dynamically determines the vectorization factors for all the loops. We further extend our framework to support random search, decision trees, supervised neural networks, and nearest-neighbor search. We evaluate our approaches against the currently used LLVM vectorizer and loop polyhedral optimization techniques. Our experiments show 1.29×−4.73× performance speedup compared to baseline and only 3% worse than the brute-force search on a wide range of benchmarks.
more » « less
Full Text Available
RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization

https://doi.org/10.1109/NetSoft48620.2020.9165471

Li, Bin; Wang, Yipeng; Wang, Ren; Tai, Charlie; Iyer, Ravi; Zhou, Zhu; Herdrich, Andrew; Zhang, Tong; Haj-Ali, Ameer; Stoica, Ion; et al (June 2020, 2020 6th IEEE International Conference on Network Softwarization (NetSoft))
null (Ed.)
Network function virtualization (NFV) technologyattracts tremendous interests from telecommunication industryand data center operators, as it allows service providers to assignresource for Virtual Network Functions (VNFs) on demand,achieving better flexibility, programmability, and scalability. Toimprove server utilization, one popular practice is to deploy besteffort (BE) workloads along with high priority (HP) VNFs whenhigh priority VNF’s resource usage is detected to be low. The keychallenge of this deployment scheme is to dynamically balancethe Service level objective (SLO) and the total cost of ownership(TCO) to optimize the data center efficiency under inherentlyfluctuating workloads. With the recent advancement in deepreinforcement learning, we conjecture that it has the potential tosolve this challenge by adaptively adjusting resource allocationto reach the improved performance and higher server utilization.In this paper, we present a closed-loop automation systemRLDRM1to dynamically adjust Last Level Cache allocationbetween HP VNFs and BE workloads using deep reinforcementlearning. The results demonstrate improved server utilizationwhile maintaining required SLO for the HP VNFs.
more » « less
Full Text Available

« Prev Next »

Search for: All records