skip to main content

Search for: All records

Creators/Authors contains: "Hao, Cong"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. This effort develops the first rich suite of analog and mixed-signal benchmark of various sizes and domains, intended for use with contemporary analog and mixed-signal designs and synthesis tools. Benchmarking enables analog-digital co-design exploration as well as extensive evaluation of analog synthesis tools and the generated analog/mixed-signal circuit or device. The goals of this effort are defining analog computation system benchmarks, developing the required concepts for higher-level analog and mixed-signal tools to utilize these benchmarks, and enabling future automated architectural design space exploration (DSE) to determine the best configurable architecture (e.g., a new FPAA) for a certain family of applications. The benchmarks comprise multiple levels of anacoustic, avision, acommunications, and an analogfiltersystem that must be simultaneously satisfied for a complete system. 
    more » « less
    Free, publicly-accessible full text available March 31, 2025
  2. The design of analog computing systems requires significant human resources and domain expertise due to the lack of automation tools to enable these highly energy-efficient, high-performance computing nodes. This work presents the first automated tool flow from a high-level representation to a reconfigurable physical device. This tool begins with a high-level algorithmic description, utilizing either our custom Python framework or the XCOS GUI, to compile and optimize computations for integration into an Integrated Circuit (IC) design or a Field Programmable Analog Array (FPAA). An energy-efficient embedded speech classifier benchmark illustrates the tool demonstration, automatically generating GDSII layout or FPAA switch list targeting.

    more » « less
    Free, publicly-accessible full text available December 1, 2024
  3. Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework, which offers orders-of-magnitude memory reduction in the training process. We propose a novel rank-adaptive tensorized neural network model, and design a hardware-friendly low-precision algorithm to train this model. We present an FPGA accelerator to demonstrate the benefits of this training method on edge devices. Our preliminary FPGA implementation achieves 59× speedup and 123× energy reduction compared to embedded CPU, and 292× memory reduction over a standard full-size training. 
    more » « less
  4. Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data. Graph contrastive learning (GCL), by training GNNs to maximize the correspondence between the representations of the same graph in its different augmented forms, may yield robust and transferable GNNs even without using labels. However, GNNs trained by traditional GCL often risk capturing redundant graph features and thus may be brittle and provide sub-par performance in downstream tasks. Here, we propose a novel principle, termed adversarial-GCL (\textit{AD-GCL}), which enables GNNs to avoid capturing redundant information during the training by optimizing adversarial graph augmentation strategies used in GCL. We pair AD-GCL with theoretical explanations and design a practical instantiation based on trainable edge-dropping graph augmentation. We experimentally validate AD-GCL by comparing with the state-of-the-art GCL methods and achieve performance gains of up-to~14\% in unsupervised, ~6\% in transfer and~3\% in semi-supervised learning settings overall with 18 different benchmark datasets for the tasks of molecule property regression and classification, and social network classification. 
    more » « less
  5. null (Ed.)