Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation
                        
                    - Award ID(s):
- 2019897
- PAR ID:
- 10633902
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Page Range / eLocation ID:
- 5063 to 5074
- Format(s):
- Medium: X
- Location:
- Online and Punta Cana, Dominican Republic
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            Abstract A search for$${\text {Z}{}{}} {\text {Z}{}{}} $$ and$${\text {Z}{}{}} {\text {H}{}{}} $$ production in the$${\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} {\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} $$ final state is presented, where H is the standard model (SM) Higgs boson. The search uses an event sample of proton-proton collisions corresponding to an integrated luminosity of 133$$\,\text {fb}^{-1}$$ collected at a center-of-mass energy of 13$$\,\text {Te}\hspace{-.08em}\text {V}$$ with the CMS detector at the CERN LHC. The analysis introduces several novel techniques for deriving and validating a multi-dimensional background model based on control samples in data. A multiclass multivariate classifier customized for the$${\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} {\text {b}{}{}} {\bar{{\text {b}{}{}}}{}{}} $$ final state is developed to derive the background model and extract the signal. The data are found to be consistent, within uncertainties, with the SM predictions. The observed (expected) upper limits at 95% confidence level are found to be 3.8 (3.8) and 5.0 (2.9) times the SM prediction for the$${\text {Z}{}{}} {\text {Z}{}{}} $$ and$${\text {Z}{}{}} {\text {H}{}{}} $$ production cross sections, respectively.more » « less
- 
            As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    