skip to main content

Search for: All records

Creators/Authors contains: "Klein, D."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. III, Hal Daumé (Ed.)
    Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy ismore »to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.« less
  2. Free, publicly-accessible full text available January 1, 2023
  3. Free, publicly-accessible full text available August 1, 2022
  4. A bstract Jet production in lead-lead (PbPb) and proton-proton (pp) collisions at a nucleon-nucleon center-of-mass energy of 5.02 TeV is studied with the CMS detector at the LHC, using PbPb and pp data samples corresponding to integrated luminosities of 404 μ b − 1 and 27.4 pb − 1 , respectively. Jets with different areas are reconstructed using the anti- k T algorithm by varying the distance parameter R . The measurements are performed using jets with transverse momenta ( p T ) greater than 200 GeV and in a pseudorapidity range of |η| < 2. To reveal the mediummore »modification of the jet spectra in PbPb collisions, the properly normalized ratio of spectra from PbPb and pp data is used to extract jet nuclear modification factors as functions of the PbPb collision centrality, p T and, for the first time, as a function of R up to 1.0. For the most central collisions, a strong suppression is observed for high- p T jets reconstructed with all distance parameters, implying that a significant amount of jet energy is scattered to large angles. The dependence of jet suppression on R is expected to be sensitive to both the jet energy loss mechanism and the medium response, and so the data are compared to several modern event generators and analytic calculations. The models considered do not fully reproduce the data.« less
  5. A bstract We present the first study of charged-hadron production associated with jets originating from b quarks in proton-proton collisions at a center-of-mass energy of 5.02 TeV. The data sample used in this study was collected with the CMS detector at the CERN LHC and corresponds to an integrated luminosity of 27.4 pb − 1 . To characterize the jet substructure, the differential jet shapes, defined as the normalized transverse momentum distribution of charged hadrons as a function of angular distance from the jet axis, are measured for b jets. In addition to the jet shapes, the per-jet yields ofmore »charged particles associated with b jets are also quantified, again as a function of the angular distance with respect to the jet axis. Extracted jet shape and particle yield distributions for b jets are compared with results for inclusive jets, as well as with the predictions from the pythia and herwig++ event generators.« less