skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation
Award ID(s):
2019897
PAR ID:
10633902
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Page Range / eLocation ID:
5063 to 5074
Format(s):
Medium: X
Location:
Online and Punta Cana, Dominican Republic
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
  2. Abstract A search for$${\text {Z}{}{}} {\text {Z}{}{}} $$ Z Z and$${\text {Z}{}{}} {\text {H}{}{}} $$ Z H production in the$${\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} {\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} $$ b b ¯ b b ¯ final state is presented, where H is the standard model (SM) Higgs boson. The search uses an event sample of proton-proton collisions corresponding to an integrated luminosity of 133$$\,\text {fb}^{-1}$$ fb - 1 collected at a center-of-mass energy of 13$$\,\text {Te}\hspace{-.08em}\text {V}$$ Te V with the CMS detector at the CERN LHC. The analysis introduces several novel techniques for deriving and validating a multi-dimensional background model based on control samples in data. A multiclass multivariate classifier customized for the$${\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} {\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} $$ b b ¯ b b ¯ final state is developed to derive the background model and extract the signal. The data are found to be consistent, within uncertainties, with the SM predictions. The observed (expected) upper limits at 95% confidence level are found to be 3.8 (3.8) and 5.0 (2.9) times the SM prediction for the$${\text {Z}{}{}} {\text {Z}{}{}} $$ Z Z and$${\text {Z}{}{}} {\text {H}{}{}} $$ Z H production cross sections, respectively. 
    more » « less
  3. As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics. 
    more » « less