SemTexIB: Semantic Text Communication with Information Bottleneck: Integrating Rate and Semantic Similarity into Training Objectives

Alamoudi, Abdulrahman

Recent major developments in semantic communi- cation systems stem from integration of deep learning (DL) tech- niques. Following the discovery of capacity achieving codes, the primary motivation for adopting the semantic approach, which retrieves meaning without requiring an exact reconstruction, is its potential to further conserve resources such as bandwidth and power. In this paper, we propose a novel semantic communica- tion framework for textual data over additive white Gaussian noise (AWGN) channels via DL. Our framework leverages the information bottleneck (IB) principle to balance minimizing bit transmission under wireless channel rate constraints with maximizing semantic information retention. Unlike previous works, we integrate the bilingual evaluation understudy (BLEU) sentence similarity score into the training objective to enhance model performance. In particular, inspired by knowledge distil- lation, we utilize large language models (LLMs) during training to transfer their knowledge of text semantics into our model. Using IB principle, we train a neural semantic encoder at the transmitter and a neural semantic decoder at the receiver that incorporates into its objective function the rate constraint together with the BLEU score and the knowledge encoded in the soft probabilities produced by the LLM. Through extensive experiments, our proposed framework demonstrates a notable improvement of up to 45% in text semantic similarity compared to state-of-the-art benchmarks operating at the same channel capacity, significantly outperforming traditional communication systems. Moreover, it exhibits robustness to variations in signal- to-noise ratio (SNR) and achieves significant gains across both low and medium SNR regimes.

More Like this