Contrastive learning learns input representation by pushing similar data together and pulling dissimilar data away, along with data augmentation and pretext task construction. It enhances the large model learning due to its ability to use a large amount of unlabeled data. It has been suc- cessfully applied to large language models, pre-trained image models, and multimodal models. In addition, contrastive learning learns a representation from modeling the explainable structure of the latent space, which has a broad application in scientific discovery and interpretable Artificial Intelligence (AI). The primary focus of this thesis is to explore contrastive learning from a data construction perspective in real-world problems to fill the gap between the principle of contrastive learning and its application. The challenges, such as sampling bias and data quality, will largely affect the representations learned by contrastive learning. This thesis analyzes the data construction chanlledges and limitations in 1) the negative sampling of knowledge graph embedding (KGE), 2) high-quliaty preference data labeling of Large Language Models (LLMs) alignment, 3) data augmentation in Non-linear dynamic system modeling, and 4) data properties in functions of mesange RNA (mRNA) sequence. To solve the challenges 1), a hardness and structure-based objective function was proposed by considering sampling bias in hard negative sampling. For challenge 2), the similarity of response embedding is used to evaluate the quality of preference pairs to mitigate the labeling error of humans when they face an ambiguous response pair. Chal- lenge 3) is solved by systematically considering the physical system and contrastive learning. A data augmentation strategy by partitioning the full sequence is used for learning the transition matrix in the latent linear space. Challenge 4) is common to see in the biological domain due to the high cost of lab experiments. Pre-trained model will advantage the limited dataset su- pervised learning by learning general features from domain knowledge. A contrastive learning based teacher-student framework is proposed for mRNA sequence learning by contrasting the unmasked sequence and the hard-masked sequence. By providing careful data construction or data sampling, contrastive learning will be boosted to solve tasks in reality. For the KGE, the novel contrastive loss function learns the boundary between negative samples and positive samples to improve the link prediction task in the knowl- edge graph; For the LLM alignment, in the same labeling cost, the selected dissimilar responses will improve the vanilla direct preference optimization (DPO) alignment; The data augmentation with contrastive loss play crucial role to learn more accuracy dynamic system, which explained by the learned the continiues eigenfunction; By considering the tearch-student framework with hard-masked strategy, the pre-trained model achieve the state-of-the-art result by fine-tuning on limited downstrame task data. Overall, this thesis provides a broad data-driven contrastive learning methodology to enhance representation learning in different domains. The methodology consists of a imprived objective function in the face of data bias, a better data selection reducing labeling error, and proper data augmentation for a particular application domain. This methodology improve the learning result compare to traditional method.
more »
« less
This content will become publicly available on July 16, 2026
REAL: Response Embedding-based Alignment for LLMs
Aligning large language models (LLMs) to human preferences is a crucial step in building helpful and safe AI tools, which usually involve training on supervised datasets. Popular algorithms such as Direct Preference Optimization (DPO) rely on pairs of AI-generated responses ranked according to human annotation. The response pair annotation process might bring human bias. Building a correct preference dataset is the costly part of the alignment pipeline. To improve annotation efficiency and quality in the LLMs alignment, we propose REAL:Response Embedding-based Alignment for LLMs, a strategy for constructing a high-quality training dataset that focuses on acquiring the less ambiguous preference pairs for labeling out of a set of response candidates. Our selection process is based on the similarity of embedding responses independently of prompts, which guarantees the selection process in an off-policy setting, avoiding adaptively measuring the similarity during the training. Experimental results on real-world dataset SHP2 and synthetic HH-RLHF benchmarks indicate that choosing dissimilar response pairs enhances the direct alignment of LLMs while reducing inherited labeling errors. The model aligned with dissimilar response pairs obtained a better margin and win rate on the dialogue task. Our findings suggest that focusing on distinct pairs can reduce the label error and improve LLM alignment efficiency, saving up to 65% of annotators’ work. The code of the work can be found https://github.com/ honggen-zhang/REAL-Alignment.
more »
« less
- Award ID(s):
- 2244574
- PAR ID:
- 10615950
- Publisher / Repository:
- arXiV
- Date Published:
- Subject(s) / Keyword(s):
- LLM, directed preference optimization, contrastive learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Training emotion recognition models has relied heavily on human annotated data, which present diversity, quality, and cost challenges. In this paper, we explore the potential of Large Language Models (LLMs), specifically GPT-4, in automating or assisting emotion annotation. We compare GPT-4 with supervised models and/or humans in three aspects: agreement with human annotations, alignment with human perception, and impact on model training. We find that common metrics that use aggregated human annotations as ground truth can underestimate GPT-4's performance, and our human evaluation experiment reveals a consistent preference for GPT-4 annotations over humans across multiple datasets and evaluators. Further, we investigate the impact of using GPT-4 as an annotation filtering process to improve model training. Together, our findings highlight the great potential of LLMs in emotion annotation tasks and underscore the need for refined evaluation methodologies.more » « less
-
Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. For programming tasks, most models are finetuned with costly human-annotated instruction-response pairs or those generated by large, proprietary LLMs, which may not be permitted. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1.5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67.1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component’s effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3.5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance. Overall, SelfCodeAlign shows for the first time that a strong instruction-tuned code LLM can result from self-alignment rather than distillation.more » « less
-
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. This misalignment arises because the model tends to prioritize textual information over visual input, even when both the language model and visual representations are of high quality. Existing methods leverage additional models or human annotations to curate preference data and enhance modality alignment through preference optimization. These approaches are resource-intensive and may not effectively reflect the target LVLM's preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input. Empirical results demonstrate that CSR significantly enhances performance and reduces hallucinations across twelve benchmarks and tasks, achieving substantial improvements over existing methods by 7.62%. Our empirical results are further supported by rigorous theoretical analysis, under mild assumptions, verifying the effectiveness of introducing visual constraints into the self-rewarding paradigm. Additionally, CSR shows compatibility with different vision-language models and the ability to incrementally improve performance through iterative fine-tuning.more » « less
-
Large language models (LLMs) are trained on a deluge of text data with limited quality control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as leaking information, fake news or hate speech. Countermeasures, commonly referred to as preference alignment, include fine-tuning the pretrained LLMs with carefully crafted text examples of desired behaviour. Even then, empirical evidence shows preference aligned LLMs can be enticed to harmful behaviour. This so called jailbreaking of LLMs is typically achieved by adversarially modifying the input prompt to the LLM. Our paper provides theoretical insights into the phenomenon of preference alignment and jailbreaking from a statistical perspective. Under our framework, we first show that pretrained LLMs will mimic harmful behaviour if present in the training corpus. Under that same framework, we then introduce a statistical notion of alignment, and lower-bound the jailbreaking probability, showing that it is unpreventable under reasonable assumptions. Based on our insights, we propose an alteration to the currently prevalent alignment strategy RLHF. Specifically, we introduce a simple modification to the RLHF objective, we call E-RLHF, that aims to increase the likelihood of safe responses. E-RLHF brings no additional training cost, and is compatible with other methods. Empirically, we demonstrate that E-RLHF outperforms RLHF on all alignment problems put forward by the AdvBench and HarmBench project without sacrificing model performance as measured by the MT-Bench project.more » « less
An official website of the United States government
