skip to main content

Search for: All records

Creators/Authors contains: "Huang, Heng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available May 7, 2025
  2. Free, publicly-accessible full text available February 1, 2025
  3. With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latency. In this work, we first systematically investigate two execution models: (1) sequentially (temporally) launch one monolithic accelerator, and (2) spatially launch multiple accelerators. From the observations, we find that there is a latency throughput tradeoff between these two execution models, and combining these two strategies together can give us a more efficient latency throughput Pareto front. To achieve this, we propose spatial sequential architecture (SSR) and SSR design automation framework to explore both strategies together when deploying deep learning inference. We use the 7nm AMD Versal ACAP VCK190 board to implement SSR accelerators for four end-to-end transformer-based deep learning models. SSR achieves average throughput gains of 2.53x, 35.71x, and 14.20x under different batch sizes compared to the 8nm Nvidia GPU A10G, 16nm AMD FPGAs ZCU102, and U250. The average energy efficiency gains are 8.51x, 6.75x, and 21.22x, respectively. Compared with the sequential-only solution and spatial-only solution on VCK190, our spatial-sequential-hybrid solutions achieve higher throughput under the same latency requirement and lower latency under the same throughput requirement. We also use SSR analytical models to demonstrate how to use SSR to optimize solutions on other computing platforms, e.g., 14nm Intel Stratix 10 NX. 
    more » « less
  4. Abstract

    Spatial transcriptomics technologies have shed light on the complexities of tissue structures by accurately mapping spatial microenvironments. Nonetheless, a myriad of methods, especially those utilized in platforms like Visium, often relinquish spatial details owing to intrinsic resolution limitations. In response, we introduce TransformerST, an innovative, unsupervised model anchored in the Transformer architecture, which operates independently of references, thereby ensuring cost-efficiency by circumventing the need for single-cell RNA sequencing. TransformerST not only elevates Visium data from a multicellular level to a single-cell granularity but also showcases adaptability across diverse spatial transcriptomics platforms. By employing a vision transformer-based encoder, it discerns latent image-gene expression co-representations and is further enhanced by spatial correlations, derived from an adaptive graph Transformer module. The sophisticated cross-scale graph network, utilized in super-resolution, significantly boosts the model’s accuracy, unveiling complex structure–functional relationships within histology images. Empirical evaluations validate its adeptness in revealing tissue subtleties at the single-cell scale. Crucially, TransformerST adeptly navigates through image-gene co-representation, maximizing the synergistic utility of gene expression and histology images, thereby emerging as a pioneering tool in spatial transcriptomics. It not only enhances resolution to a single-cell level but also introduces a novel approach that optimally utilizes histology images alongside gene expression, providing a refined lens for investigating spatial transcriptomics.

    more » « less
  5. Free, publicly-accessible full text available March 1, 2025
  6. Free, publicly-accessible full text available October 8, 2024
  7. Remote monitoring and evaluation of pulmonary diseases via telemedicine are important to disease diagnosis and management, but current telemedicine solutions have limited capability of objectively examining the airway's internal physiological conditions that are crucial to pulmonary disease evaluation. Existing solutions based on smartphone sensing are also limited to externally monitoring breath rates, respiratory events, or lung function. In this paper, we present PTEase, a new system design that addresses these limitations and uses commodity smartphones to examine the airway's internal physiological conditions. PTEase uses active acoustic sensing to measure the internal changes of lower airway caliber, and then leverages machine learning to analyze the sensory data for pulmonary disease evaluation. We implemented PTEase as a smartphone app, and verified its measurement error in lab-controlled settings as <10%. Clinical studies further showed that PTEase reaches 75% accuracy on disease prediction and 11%-15% errors in estimating lung function indices. Given that such accuracy is comparable with that in clinical practice using spirometry, PTEase can be reliably used as an assistive telemedicine tool for disease evaluation and monitoring. 
    more » « less