skip to main content


Search for: All records

Creators/Authors contains: "Xiao, Chaowei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Vigilance refers to an individual’s ability to maintain attention over time. Vigilance decrement is particularly concerning in clinical environments where shift work and long working hours are common. This study identifies significant factors and indicators for predicting and monitoring individuals’ vigilance decrement. We enrolled 11 participants and measured their vigilance levels by recording their reaction times while completing the Psychomotor Vigilance Test. Additionally, we measured participants’ physiological responses and collected their sleep deprivation data, demographic information, and self-reported anxiety levels. Using repeated-measures correlation analysis, we found that decreased vigilance levels, indicated by longer reaction times, were associated with higher electrodermal activity ( p < .01), lower skin temperature ( p < .001), shorter fixation durations ( p < .05), and increased saccade frequency ( p < .05). Moreover, sleep deprivation significantly affected vigilance decrement ( p < .001). Our findings provide the potential to develop a predictive model of vigilance decrements using physiological signals collected from non-intrusive devices, as an alternative to current behavior-based methods.

     
    more » « less
  2. Free, publicly-accessible full text available January 1, 2025
  3. Free, publicly-accessible full text available January 1, 2025
  4. Free, publicly-accessible full text available January 1, 2025
  5. Free, publicly-accessible full text available January 1, 2025
  6. Free, publicly-accessible full text available January 1, 2025
  7. Instruction tuning is an effective technique to align large language models (LLMs) with human intent. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose AutoPoison, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs. 
    more » « less
  8. Recent advances in large language models (LMs) have facilitated their ability to synthesize programming code. However, they have also raised concerns about intellectual property (IP) rights violations. Despite the significance of this issue, it has been relatively less explored. In this paper, we aim to bridge the gap by presenting CODEIPPROMPT, a platform for automatic evaluation of the extent to which code language models may reproduce licensed programs. It comprises two key components: prompts constructed from a licensed code database to elicit LMs to generate IP-violating code, and a measurement tool to evaluate the extent of IP violation of code LMs. We conducted an extensive evaluation of existing open-source code LMs and commercial products, and revealed the prevalence of IP violations in all these models. We further identified that the root cause is the substantial proportion of training corpus subject to restrictive licenses, resulting from both intentional inclusion and inconsistent license practice in the real world. To address this issue, we also explored potential mitigation strategies, including fine-tuning and dynamic token filtering. Our study provides a testbed for evaluating the IP violation issues of the existing code generation platforms and stresses the need for a better mitigation strategy. 
    more » « less
  9. Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small set of exemplar molecules, i.e., those that (partially) satisfy the design criteria, to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. We design a retrieval mechanism that retrieves and fuses the exemplar molecules with the input molecule, which is trained by a new self-supervised objective that predicts the nearest neighbor of the input molecule. We also propose an iterative refinement process to dynamically update the generated molecules and retrieval database for better generalization. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning. On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-CoV-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods. 
    more » « less
  10. Free, publicly-accessible full text available July 29, 2025