skip to main content


Search for: All records

Creators/Authors contains: "Yu, S."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 1, 2024
  2. We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks. 
    more » « less
  3. We propose a model-based lifelong reinforcement-learning approach that estimates a hierarchical Bayesian posterior distilling the common structure shared across different tasks. The learned posterior combined with a sample-based Bayesian exploration procedure increases the sample efficiency of learning across a family of related tasks. We first derive an analysis of the relationship between the sample complexity and the initialization quality of the posterior in the finite MDP setting. We next scale the approach to continuous-state domains by introducing a Variational Bayesian Lifelong Reinforcement Learning algorithm that can be combined with recent model-based deep RL methods, and that exhibits backward transfer. Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than state-of-the-art lifelong RL methods 
    more » « less
  4. Supervised training of optical flow predictors generally yields better accuracy than unsupervised training. However, the improved performance comes at an often high annotation cost. Semi-supervised training trades off accuracy against annotation cost. We use a simple yet effective semi-supervised training method to show that even a small fraction of labels can improve flow accuracy by a significant margin over unsupervised training. In addition, we propose active learning methods based on simple heuristics to further reduce the number of labels required to achieve the same target accuracy. Our experiments on both synthetic and real optical flow datasets show that our semi-supervised networks generally need around 50% of the labels to achieve close to full-label accuracy, and only around 20% with active learning on Sintel. We also analyze and show insights on the factors that may influence active learning performance. Code is available at https://github.com/duke-vision/ optical-flow-active-learning-release. 
    more » « less
  5. Stochastic computing (SC) can lead area-efficient implementation of logic designs. Existing SC multiplication, however, suffers a long-standing problem: large multiplication error with small inputs due to its intrinsic nature of bit-stream based computing. In this article, we propose a new scaled counting-based SC multiplication approach, called {\it Scaled-CBSC}, to mitigate this issue by introducing scaling bits to ensure the bit `1' density of the stochastic number is sufficiently large. The idea is to convert the ``small'' inputs to ``large'' inputs, thus improve the accuracy of SC multiplication. But different from an existing stream-bit based approach, the new method uses the binary format and does not require stochastic addition as the SC multiplication always starts with binary numbers. Furthermore, Scaled-CBSC only requires all the numbers to be larger than 0.5 instead of arbitrary defined threshold, which leads to integer numbers only for the scaling term. The experimental results show that the 8-bit Scaled-CBSC multiplication with 3 scaling bits can achieve up to 46.6\% and 30.4\% improvements in mean error and standard deviation, respectively; reduce the peak relative error from 100\% to 1.8\%; and improve 12.6\%, 51.5\%, 57.6\%, 58.4\% in delay, area, area-delay product, energy consumption, respectively, over the state of art work. 
    more » « less
  6. Abstract The IceCube Neutrino Observatory is designed to observe neutrinos interacting deep within the South Pole ice sheet. It consists of 5160 digital optical modules, which are arrayed over a cubic kilometer from 1450 m to 2450 m depth. At the lower center of the array is the DeepCore subdetector. It has a denser configuration which lowers the observable energy threshold to about 10 GeV and creates the opportunity to study neutrino oscillations with low energy atmospheric neutrinos. A precise reconstruction of neutrino direction is critical in the measurements of oscillation parameters. In this contribution, I will discuss a method to reconstruct the zenith angle of 10-GeV scale events in IceCube using a convolutional neural network and compare the result to that of the current likelihood-based reconstruction algorithm. 
    more » « less
  7. We propose MONet, a convolutional neural network that jointly detects motion boundaries and occlusion regions in video both forward and backward in time. Detection is difficult because optical flow is discontinuous along motion boundaries and undefined in occlusion regions, while many flow estimators assume smoothness and a flow defined everywhere. To reason in the two time directions simultaneously, we direct-warp the estimated maps between the two frames. Since appearance mismatches between frames often signal vicinity to motion boundaries or occlusion regions, we construct a cost block that for each feature in one frame records the lowest discrepancy with matching features in a search range. This cost block is two-dimensional, and much less expensive than the four-dimensional cost volumes used in flow analysis. Cost-block features are computed by an encoder, and motion boundary and occlusion region estimates are computed by a decoder. We found that arranging decoder layers fine-to- coarse, rather than coarse-to-fine, improves performance. MONet outperforms the prior state of the art for both tasks on the Sintel and FlyingChairsOcc benchmarks without any fine-tuning on them. 
    more » « less
  8. In this paper, we propose a new dynamic reliability technique using an accuracy-reconfigurable stochastic computing (ARSC) framework for deep learning computing. Unlike the conventional stochastic computing that conducts design time accuracy power/energy trade-off, the new ARSC design can adjust the bit-width of the data in run time. Hence, the ARSC can mitigate the long-term aging effects by slowing the system clock frequency, while maintaining the inference throughput by reducing the data bit-width at a small cost of accuracy. We show how to implement the recently proposed counter-based SC multiplication and bit-width reduction on a layer-wise quantization scheme for CNN networks with dynamic fixed-point data. We validate an ARSC-based five-layer convolutional neural network designs for the MNIST dataset based on Vivado HLS with constraints from Xilinx Zynq-7000 family xc7z045 platform. Experimental results show that new ARSC DNN can sufficiently compensate the NBTI induced aging effects in 10 years with marginal classification accuracy loss while maintaining or even exceeding the pre-aging computing throughput. At the same time, the proposed ARSC computing framework also reduces the active power consumption due to the frequency scaling, which can further improve system reliability due to the reduced temperature. 
    more » « less
  9. We experimentally demonstrate a 400 Gbit/s optical communication link utilizing wavelength-division multiplexing and mode-division multiplexing for a total of 40 channels. This link utilizes a novel, to the best of our knowledge, 400 GHz frequency comb source based on a chip-scale photonic crystal resonator. Silicon-on-insulator photonic inverse-designed 4 × 4 mode-division multiplexer structures enable a fourfold increase in data capacity. We show less than −10 dBm of optical receiver power for error-free data transmission in 34 out of a total of 40 channels using a PRBS31 pattern.

     
    more » « less
  10. In this paper, we propose a novel accuracy-reconfigurable stochastic computing (ARSC) framework for dynamic reliability and power management. Different than the existing stochastic computing works, where the accuracy versus power/energy trade-off is carried out in the design time, the new ARSC design can change accuracy or bit-width of the data in the run-time so that it can accommodate the long-term aging effects by slowing the system clock frequency at the cost of accuracy while maintaining the throughput of the computing. We validate the ARSC concept on a discrete cosine transformation (DCT) and inverse DCT designs for image compressing/decompressing applications, which are implemented on Xilinx Spartan-6 family XC6SLX45 platform. Experimental results show that the new design can easily mitigate the long-term aging-induced effects by accuracy trade-off while maintaining the throughput of the whole computing process using simple frequency scaling. We further show that one-bit precision loss for the input data, which translated to 3.44dB of the accuracy loss in term of Peak Signal to Noise Ratio (PSNR) for images, we can sufficiently compensate the NBTI induced aging effects in 10 years while maintaining the pre-aging computing throughput of 7.19 frames per second. At the same time, we can save 74\% power consumption by 10.67dB of accuracy loss. The proposed ARSC computing framework also allows much aggressive frequency scaling, which can lead to order of magnitude power savings compared to the traditional dynamic voltage and frequency scaling (DVFS) techniques. 
    more » « less