skip to main content

Search for: All records

Creators/Authors contains: "Lin, Sen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. By learning a sequence of tasks continually, an agent in continual learning (CL) can improve the learning performance of both a new task and `old' tasks by leveraging the forward knowledge transfer and the backward knowledge transfer, respectively. However, most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks. This inevitably limits the backward knowledge transfer from the new task to the old tasks, because judicious model updates could possibly improve the learning performance of the old tasks as well. To tackle this problem, we first theoretically analyze the conditions under which updating the learnt model of old tasks could be beneficial for CL and also lead to backward knowledge transfer, based on the gradient projection onto the input subspaces of old tasks. Building on the theoretical analysis, we next develop a ContinUal learning method with Backward knowlEdge tRansfer (CUBER), for a fixed capacity neural network without data replay. In particular, CUBER first characterizes the task correlation to identify the positively correlated old tasks in a layer-wise manner, and then selectively modifies the learnt model of the old tasks when learning the new task. Experimental studies showmore »that CUBER can even achieve positive backward knowledge transfer on several existing CL benchmarks for the first time without data replay, where the related baselines still suffer from catastrophic forgetting (negative backward knowledge transfer). The superior performance of CUBER on the backward knowledge transfer also leads to higher accuracy accordingly.« less
    Free, publicly-accessible full text available November 29, 2023
  2. Free, publicly-accessible full text available December 1, 2023
  3. Ceria (CeO 2 ) has recently been found to catalyze the selective hydrogenation of alkynes, which has stimulated much discussion on the catalytic mechanism on various facets of reducible oxides. In this work, H 2 dissociation and acetylene hydrogenation on bare and Ni doped CeO 2 (110) surfaces are investigated using density functional theory (DFT). Similar to that on the CeO 2 (111) surface, our results suggest that catalysis is facilitated by frustrated Lewis pairs (FLPs) formed by oxygen vacancies (O v s) on the oxide surfaces. On bare CeO 2 (110) with a single O v (CeO 2 (110)–O v ), two surface Ce cations with one non-adjacent O anion are shown to form (Ce 3+ –Ce 4+ )/O quasi-FLPs, while for the Ni doped CeO 2 (110) surface with one (Ni–CeO 2 (110)–O v ) or two (Ni–CeO 2 (110)–2O v ) O v s, one Ce and a non-adjacent O counterions are found to form a mono-Ce/O FLP. DFT calculations indicate that Ce/O FLPs facilitate the H 2 dissociation via a heterolytic mechanism, while the resulting surface O–H and Ce–H species catalyze the subsequent acetylene hydrogenation. With CeO 2 (110)–O v and Ni–CeO 2 (110)–2O v ,more »our DFT calculations suggest that the first hydrogenation step is the rate-determining step with a barrier of 0.43 and 0.40 eV, respectively. For Ni–CeO 2 (110)–O v , the reaction is shown to be controlled by the H 2 dissociation with a barrier of 0.41 eV. These barriers are significantly lower than that (about 0.7 eV) on CeO 2 (111), explaining the experimentally observed higher catalytic efficiency of the (110) facet of ceria. The change of the rate-determining step is attributed to the different electronic properties of Ce in the Ce/O FLPs – the Ce f states closer to the Fermi level not only facilitate the heterolytic dissociation of H 2 but also lead to a higher barrier of acetylene hydrogenation.« less
    Free, publicly-accessible full text available May 11, 2023
  4. Catastrophic forgetting is one of the major challenges in continual learning. To address this issue, some existing methods put restrictive constraints on the optimization space of the new task for minimizing the interference to old tasks. However, this may lead to unsatisfactory performance for the new task, especially when the new task is strongly correlated with old tasks. To tackle this challenge, we propose Trust Region Gradient Projection (TRGP) for continual learning to facilitate the forward knowledge transfer based on an efficient characterization of task correlation. Particularly, we introduce a notion of 'trust region' to select the most related old tasks for the new task in a layer-wise and single-shot manner, using the norm of gradient projection onto the subspace spanned by task inputs. Then, a scaled weight projection is proposed to cleverly reuse the frozen weights of the selected old tasks in the trust region through a layer-wise scaling matrix. By jointly optimizing the scaling matrices and the model, where the model is updated along the directions orthogonal to the subspaces of old tasks, TRGP can effectively prompt knowledge transfer without forgetting. Extensive experiments show that our approach achieves significant improvement over related state-of-the-art methods.
    Free, publicly-accessible full text available April 25, 2023
  5. Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these challenges, aiming to learn an informative meta-policy from a collection of tasks. Nevertheless, as shown in our empirical studies, offline Meta-RL could be outperformed by offline single-task RL methods on tasks with good quality of datasets, indicating that a right balance has to be delicately calibrated between "exploring" the out-of-distribution state-actions by following the meta-policy and "exploiting" the offline dataset by staying close to the behavior policy. Motivated by such empirical analysis, we propose model-based offline ta-RL with regularized policy optimization (MerPO), which learns a meta-model for efficient task structure inference and an informative meta-policy for safe exploration of out-of-distribution state-actions. In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using both conservative policy evaluation and regularized policy improvement; and the intrinsic tradeoff therein is achieved via striking the right balance between two regularizers, one based on the behavior policy and the other on the meta-policy. We theoretically show that the learnt policymore »offers guaranteed improvement over both the behavior policy and the meta-policy, thus ensuring the performance improvement on new tasks via offline Meta-RL. Our experiments corroborate the superior performance of MerPO over existing offline Meta-RL methods.« less
    Free, publicly-accessible full text available April 25, 2023
  6. The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the 'right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existingmore »methods for the MuJoCo benchmark.« less
  7. There has recently been an increasing interest in computationally-efficient learning methods for resource-constrained applications, e.g., pruning, quantization and channel gating. In this work, we advocate a holistic approach to jointly train the backbone network and the channel gating which can speed up subnet selection for a new task at the resource-limited node. In particular, we develop a federated meta-learning algorithm to jointly train good meta-initializations for both the backbone networks and gating modules, by leveraging the model similarity across learning tasks on different nodes. In this way, the learnt meta-gating module effectively captures the important filters of a good meta-backbone network, and a task-specific conditional channel gated network can be quickly adapted from the meta-initializations using data samples of the new task. The convergence of the proposed federated meta-learning algorithm is established under mild conditions. Experimental results corroborate the effectiveness of our method in comparison to related work.