Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is computationally expensive. Moreover, the performance of the student model could be limited by the teacher model if the teacher model is not optimal. In the light of collaborative learning, we study the feasibility of involving joint intellectual efforts from diverse perspectives of student models. In this work, we introduce dual policy distillation (DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment and extract knowledge from each other to enhance their learning. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms, since it is unclear whether the knowledge distilled from an imperfect and noisy peer learner would be helpful. To address the challenge, we theoretically justify that distilling knowledge from a peer learner will lead to policy improvement and propose a disadvantageous distillation strategy based on the theoretical results. The conducted experiments on several continuous control tasks show that the proposed framework achieves superior performance with a learning-based agent and function approximation without the use of expensive teacher models.
more »
« less
Accelerated dual-averaging primal–dual method for composite convex minimization
- Award ID(s):
- 1934568
- PAR ID:
- 10182513
- Date Published:
- Journal Name:
- Optimization Methods and Software
- Volume:
- 35
- Issue:
- 4
- ISSN:
- 1055-6788
- Page Range / eLocation ID:
- 741 to 766
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This research proposes an inkjet printed dual-band dual-sense circularly polarized antenna using CPW-feeding on PET substrate. The antenna is designed and optimized using ANSYS HFSS, which operates at 4.01 GHz - 5.05 GHz (22.96%) and 6.23 GHz - 7.58 GHz (19.55%) with a return loss of <−10 dB. On top of that, the antenna shows an axial ratio of less than 3 dB at 4.23 GHz - 4.62 GHz (8.81%) and 7.11 GHz - 7.36 GHz (3.45%), whereas left hand circular polarization (LHCP) is observed in the first band and right hand circular polarization (RHCP) is observed in the second band. The overall dimensions of the antenna is x x , where is the free-space wavelength at the lowest circular polarization frequency. Measurement of the fabricated version shows good agreement with the simulated version. To the best of author’s knowledge, this proposed design is the first circularly polarized …more » « less
-
Three-dimensional fluorescence microscopy often suffers from anisotropy, where the resolution along the axial direction is lower than that within the lateral imaging plane. We address this issue by presenting Dual-Cycle, a new framework for joint deconvolution and fusion of dual-view fluorescence images. Inspired by the recent Neuroclear method, Dual-Cycle is designed as a cycle-consistent generative network trained in a self-supervised fashion by combining a dual-view generator and prior-guided degradation model. We validate Dual-Cycle on both synthetic and real data showing its state-of-the-art performance without any external training data.more » « less
An official website of the United States government

