Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Knowledge Distillation (KD) (Hinton et al., 2015) is one of the most effective approaches for deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the largescale models to smaller student models. Previous KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further enhance student capacity and improve generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new Retrieval-augmented KD framework with a loss function that aligns the relational knowledge in teacher and student embedding spaces. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for taskspecific knowledge distillation on the GLUE benchmark (Wang et al., 2018a).more » « less
-
UV absorption is widely used for characterizing proteins structures. The mapping of UV spectra to atomic structure of proteins relies on expensive theoretical simulations, circumventing the heavy computational cost which involves repeated quantum-mechanical simulations of excited-state properties of many fluctuating protein geometries, which has been a long-time challenge. Here we show that a neural network machine-learning technique can predict electronic absorption spectra of N -methylacetamide (NMA), which is a widely used model system for the peptide bond. Using ground-state geometric parameters and charge information as descriptors, we employed a neural network to predict transition energies, ground-state, and transition dipole moments of many molecular-dynamics conformations at different temperatures, in agreement with time-dependent density-functional theory calculations. The neural network simulations are nearly 3,000× faster than comparable quantum calculations. Machine learning should provide a cost-effective tool for simulating optical properties of proteins.more » « less