skip to main content

Search for: All records

Creators/Authors contains: "Wang, Jie"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures for convolutional layers, but without proper optimizations, their efficiency drops dramatically for reasons: 1) the different dimensions within same-type layers, 2) the different convolution layers especially transposed and dilated convolutions, and 3) CNN’s complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into machine learning frameworks. Therefore, we present a flexible, composable architecture called FlexCNN, which delivers high computation efficiency by employing dynamic tiling, layer fusion, and data layout optimizations. Additionally, we implement a novel versatile SA to process normal, transposed, and dilated convolutions efficiently. FlexCNN also uses a fully-pipelined software-hardware integration that alleviates the software overheads. Moreover, with an automated compilation flow, FlexCNN takes a CNN in the ONNX representation, performs a design space exploration, and generates an FPGA accelerator. The framework is tested using three complex CNNs: OpenPose, U-Net, and E-Net. The architecture optimizations achieve 2.3 × performance improvement. Compared to a standard SA, the versatile SA achieves close-to-ideal speedups, with up to 15.98 × and 13.42 × for transposed and dilated convolutions, with a 6% average area overhead. The pipelined integration leadsmore »to a 5 × speedup for OpenPose.« less
    Free, publicly-accessible full text available December 20, 2023
  2. Free, publicly-accessible full text available June 26, 2023
  3. Free, publicly-accessible full text available September 1, 2023
  4. Ghinassi, Massimiliano (Ed.)
    Free, publicly-accessible full text available June 1, 2023
  5. Free, publicly-accessible full text available June 8, 2023

    We measure the enclosed Milky Way mass profile to Galactocentric distances of ∼70 and ∼50 kpc using the smooth, diffuse stellar halo samples of Bird et al. The samples are Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) and Sloan Digital Sky Survey/Sloan Extension for Galactic Understanding and Exploration (SDSS/SEGUE) K giants (KG) and SDSS/SEGUE blue horizontal branch (BHB) stars with accurate metallicities. The 3D kinematics are available through LAMOST and SDSS/SEGUE distances and radial velocities and Gaia DR2 proper motions. Two methods are used to estimate the enclosed mass: 3D spherical Jeans equation and Evans et al. tracer mass estimator (TME). We remove substructure via the Xue et al. method based on integrals of motion. We evaluate the uncertainties on our estimates due to random sampling noise, systematic distance errors, the adopted density profile, and non-virialization and non-spherical effects of the halo. The tracer density profile remains a limiting systematic in our mass estimates, although within these limits we find reasonable agreement across the different samples and the methods applied. Out to ∼70 and ∼50 kpc, the Jeans method yields total enclosed masses of 4.3 ± 0.95 (random) ±0.6 (systematic) × 1011 M⊙ and 4.1 ± 1.2 (random) ±0.6 (systematic) × 1011 M⊙ for the KG and BHB stars, respectively.more »For the KG and BHB samples, we find a dark matter virial mass of $M_{200}=0.55^{+0.15}_{-0.11}$ (random) ±0.083 (systematic) × 1012 M⊙ and $M_{200}=1.00^{+0.67}_{-0.33}$ (random) ±0.15 (systematic) × 1012 M⊙, respectively.

    « less