Title: Contrastive Learning of Sentence Representations
Learning sentence representations which capture
rich semantic meanings has been crucial
for many NLP tasks. Pre-trained language
models such as BERT have achieved great
success in NLP, but sentence embeddings extracted
directly from these models do not perform
well without fine-tuning. We propose
Contrastive Learning of Sentence Representations
(CLSR), a novel approach which applies
contrastive learning to learn universal sentence
representations on top of pre-trained language
models. CLSR utilizes semantic similarity of
two sentences to construct positive instance
for contrastive learning. Semantic information
that has been captured by the pre-trained
models is kept by getting sentence embeddings
from these models with proper pooling strategy.
An encoder followed by a linear projection
takes these embeddings as inputs and is
trained under a contrastive objective. To evaluate
the performance of CLSR, we run experiments
on a range of pre-trained language models
and their variants on a series of Semantic
Contextual Similarity tasks. Results show that
CLSR gains significant performance improvements
over existing SOTA language models. more »« less
Zhang, Xiao; Dou, Dejing; Wu, Ji(
, Proceedings of the AAAI Conference on Artificial Intelligence)
null
(Ed.)
External knowledge is often useful for natural language understanding tasks. We introduce a contextual text representation model called Conceptual-Contextual (CC) embeddings, which incorporates structured knowledge into text representations. Unlike entity embedding methods, our approach encodes a knowledge graph into a context model. CC embeddings can be easily reused for a wide range of tasks in a similar fashion to pre-trained language models. Our model effectively encodes the huge UMLS database by leveraging semantic generalizability. Experiments on electronic health records (EHRs) and medical text processing benchmarks showed our model gives a major boost to the performance of supervised medical NLP tasks.
Meng, Yu; Huang, Jiaxin; Zhang, Yu; Han, Jiawei(
, KDD'21:The 27th {ACM} {SIGKDD} Conference on Knowledge Discovery and Data Mining, August 14-18, 2021)
null
(Ed.)
Recent years have witnessed the enormous success of text representation
learning in a wide range of text mining tasks. Earlier
word embedding learning approaches represent words as fixed
low-dimensional vectors to capture their semantics. The word embeddings
so learned are used as the input features of task-specific
models. Recently, pre-trained language models (PLMs), which learn
universal language representations via pre-training Transformer-based
neural models on large-scale text corpora, have revolutionized
the natural language processing (NLP) field. Such pre-trained
representations encode generic linguistic features that can be transferred
to almost any text-related applications. PLMs outperform
previous task-specific models in many applications as they only
need to be fine-tuned on the target corpus instead of being trained
from scratch.
In this tutorial, we introduce recent advances in pre-trained text
embeddings and language models, as well as their applications to a
wide range of text mining tasks. Specifically, we first overview a
set of recently developed self-supervised and weakly-supervised
text embedding methods and pre-trained language models that
serve as the fundamentals for downstream tasks. We then present
several new methods based on pre-trained text embeddings and
language models for various text mining applications such as topic
discovery and text classification. We focus on methods that are
weakly-supervised, domain-independent, language-agnostic, effective
and scalable for mining and discovering structured knowledge
from large-scale text corpora. Finally, we demonstrate with real world
datasets how pre-trained text representations help mitigate
the human annotation burden and facilitate automatic, accurate
and efficient text analyses.
Huang, James; Yao, Wenlin; Song, Kaiqiang; Zhang, Hongming; Chen, Muhao; Yu, Dong(
, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)
Traditional sentence embedding models encode sentences into vector representations to capture useful properties such as the semantic similarity between sentences. However, in addition to similarity, sentence semantics can also be interpreted via compositional operations such as sentence fusion or difference. It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space. To more effectively bridge the continuous embedding and discrete text spaces, we explore the plausibility of incorporating various compositional properties into the sentence embedding space that allows us to interpret embedding transformations as compositional sentence operations. We propose InterSent, an end-to-end framework for learning interpretable sentence embeddings that supports compositional sentence operations in the embedding space. Our method optimizes operator networks and a bottleneck encoder-decoder model to produce meaningful and interpretable sentence embeddings. Experimental results demonstrate that our method significantly improves the interpretability of sentence embeddings on four textual generation tasks over existing approaches while maintaining strong performance on traditional semantic similarity tasks.
Yu, Wenhao; Zhu, Chenguang; Fang, Yuwei; Yu, Donghan; Wang, Shuohang; Xu, Yichong; Zeng, Michael; Jiang, Meng(
, Findings of the Association for Computational Linguistics: ACL 2022)
Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora. Since PLMs capture word semantics in different contexts, the quality of word representations highly depends on word frequency, which usually follows a heavy-tailed distributions in the pre-training corpus. Therefore, the embeddings of rare words on the tail are usually poorly optimized. In this work, we focus on enhancing language model pre-training by leveraging definitions of the rare words in dictionaries (e.g., Wiktionary). To incorporate a rare word definition as a part of input, we fetch its definition from the dictionary and append it to the end of the input text sequence. In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary. We evaluate the proposed Dict-BERT model on the language understanding benchmark GLUE and eight specialized domain benchmark datasets. Extensive experiments demonstrate that Dict-BERT can significantly improve the understanding of rare words and boost model performance on various NLP downstream tasks.
Zhao, Jieyu; Mukherjee, Subhabrata; Hosseini, saghar; Chang, Kai-Wei; Hassan Awadallah, Ahmed(
, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics)
Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.
Qiu, Hefei, Ding, Wei, and Chen, Ping. Contrastive Learning of Sentence Representations. Retrieved from https://par.nsf.gov/biblio/10352930. Proceedings of the 18th International Conference on Natural Language Processing .
Qiu, Hefei, Ding, Wei, & Chen, Ping. Contrastive Learning of Sentence Representations. Proceedings of the 18th International Conference on Natural Language Processing, (). Retrieved from https://par.nsf.gov/biblio/10352930.
Qiu, Hefei, Ding, Wei, and Chen, Ping.
"Contrastive Learning of Sentence Representations". Proceedings of the 18th International Conference on Natural Language Processing (). Country unknown/Code not available. https://par.nsf.gov/biblio/10352930.
@article{osti_10352930,
place = {Country unknown/Code not available},
title = {Contrastive Learning of Sentence Representations},
url = {https://par.nsf.gov/biblio/10352930},
abstractNote = {Learning sentence representations which capture rich semantic meanings has been crucial for many NLP tasks. Pre-trained language models such as BERT have achieved great success in NLP, but sentence embeddings extracted directly from these models do not perform well without fine-tuning. We propose Contrastive Learning of Sentence Representations (CLSR), a novel approach which applies contrastive learning to learn universal sentence representations on top of pre-trained language models. CLSR utilizes semantic similarity of two sentences to construct positive instance for contrastive learning. Semantic information that has been captured by the pre-trained models is kept by getting sentence embeddings from these models with proper pooling strategy. An encoder followed by a linear projection takes these embeddings as inputs and is trained under a contrastive objective. To evaluate the performance of CLSR, we run experiments on a range of pre-trained language models and their variants on a series of Semantic Contextual Similarity tasks. Results show that CLSR gains significant performance improvements over existing SOTA language models.},
journal = {Proceedings of the 18th International Conference on Natural Language Processing},
author = {Qiu, Hefei and Ding, Wei and Chen, Ping},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.