skip to main content

Search for: All records

Creators/Authors contains: "Wang, Y."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available November 1, 2022
  2. Free, publicly-accessible full text available November 1, 2022
  3. Free, publicly-accessible full text available October 4, 2022
  4. Free, publicly-accessible full text available October 28, 2022
  5. Monte Carlo (MC) methods are widely used in many research areas such as physical simulation, statistical analysis, and machine learning. Application of MC methods requires drawing fast mixing samples from a given probability distribution. Among existing sampling methods, the Hamiltonian Monte Carlo (HMC) utilizes gradient information during Hamiltonian simulation and can produce fast mixing samples at the highest efficiency. However, without carefully chosen simulation parameters for a specific problem, HMC generally suffers from simulation locality and computation waste. As a result, the No-U-Turn Sampler (NUTS) has been proposed to automatically tune these parameters during simulation and is the current state-of-the-artmore »sampling algorithm. However, application of NUTS requires frequent gradient calculation of a given distribution and high-volume vector processing, especially for large-scale problems, leading to drawing an expensively large number of samples and a desire of hardware acceleration. While some hardware acceleration works have been proposed for traditional Markov Chain Monte Carlo (MCMC) and HMC methods, there is no existing work targeting hardware acceleration of the NUTS algorithm. In this paper, we present the first NUTS accelerator on FPGA while addressing the high complexity of this state-of-the-art algorithm. Our hardware and algorithm co-optimizations include an incremental resampling technique which leads to a more memory efficient architecture and pipeline optimization for multi-chain sampling to maximize the throughput. We also explore three levels of parallelism in the NUTS accelerator to further boost performance. Compared with optimized C++ NUTS package: RSTAN, our NUTS accelerator can reach a maximum speedup of 50.6X and an energy improvement of 189.7X.« less
    Free, publicly-accessible full text available July 7, 2022
  6. Free, publicly-accessible full text available July 15, 2022
  7. Free, publicly-accessible full text available April 1, 2022
  8. Free, publicly-accessible full text available April 1, 2022
  9. Free, publicly-accessible full text available May 18, 2022
  10. Abstract Spin-valley locking in monolayer transition metal dichalcogenides has attracted enormous interest, since it offers potential for valleytronic and optoelectronic applications. Such an exotic electronic state has sparsely been seen in bulk materials. Here, we report spin-valley locking in a Dirac semimetal BaMnSb 2 . This is revealed by comprehensive studies using first principles calculations, tight-binding and effective model analyses, angle-resolved photoemission spectroscopy measurements. Moreover, this material also exhibits a stacked quantum Hall effect (QHE). The spin-valley degeneracy extracted from the QHE is close to 2. This result, together with the Landau level spin splitting, further confirms the spin-valley lockingmore »picture. In the extreme quantum limit, we also observed a plateau in the z -axis resistance, suggestive of a two-dimensional chiral surface state present in the quantum Hall state. These findings establish BaMnSb 2 as a rare platform for exploring coupled spin and valley physics in bulk single crystals and accessing 3D interacting topological states.« less
    Free, publicly-accessible full text available December 1, 2022