NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Natural Language Processing Methods for the Study of Protein–Ligand Interactions

https://doi.org/10.1021/acs.jcim.4c01907

Michels, James; Bandarupalli, Ramya; Ahangar_Akbari, Amin; Le, Thai; Xiao, Hong; Li, Jing; Hom, Erik_F_Y (February 2025, Journal of Chemical Information and Modeling)
ALISON: Fast and Effective Stylometric Authorship Obfuscation

https://doi.org/10.1609/aaai.v38i17.29901

Xing, Eric; Venkatraman, Saranya; Le, Thai; Lee, Dongwon (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods,new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours.To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: https://github.com/EricX003/ALISON.
more » « less
Full Text Available
A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts

https://doi.org/10.18653/v1/2024.acl-long.357

Tripto, Nafis Irtiza; Venkatraman, Saranya; Macko, Dominik; Moro, Robert; Srba, Ivan; Uchendu, Adaku; Le, Thai; Lee, Dongwon (January 2024, Association for Computational Linguistics)

Full Text Available
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective

https://doi.org/10.1145/3606274.3606276

Uchendu, Adaku; Le, Thai; Lee, Dongwon. (July 2023, ACM SIGKDD Explorations Newsletter)

Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality openended texts (so-called neural texts), one has to now consider authorships by humans, machines, or their combination. Due to the implications and potential threats of neural texts when used maliciously, it has become critical to understand the limitations of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution and obfuscation of neural text authorship from a Data Mining perspective, and share our view on their limitations and promising research directions.
more » « less
Full Text Available
Do Language Models Plagiarize?

https://doi.org/10.1145/3543507.3583199

Lee, Jooyoung; Le, Thai; Chen, Jinghui; Lee, Dongwon (April 2023, The ACM Web Conference (WWW))

Full Text Available
UPTON: Preventing Authorship Leakage from Public Text Release via Data Poisoning

https://doi.org/10.18653/v1/2023.findings-emnlp.800

Wang, Ziyao; Le, Thai; Lee, Dongwon (January 2023, Association for Computational Linguistics)

Full Text Available
Socialbots on Fire: Modeling Adversarial Behaviors of Socialbots via Multi-Agent Hierarchical Reinforcement Learning

https://doi.org/10.1145/3485447.3512215

Le, Thai; Tran-Thanh, Long; Lee, Dongwon (April 2022, In Proceedings of the ACM Web Conference 2022)

Full Text Available
CAPS: Comprehensible Abstract Policy Summaries for Explaining Reinforcement Learning Agents

McCalmon, Joe; Le, Thai; Alqahtani, Sarra; Lee, Dongwon (May 2022, In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems)

Full Text Available
Detecting False Claims in Low-Resource Regions: A Case Study of Caribbean Islands

https://doi.org/10.18653/v1/2022.constraint-1.11

Lucas, Jason; Cui, Limeng; Le, Thai; Lee, Dongwon (May 2022, ACL Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation (CONSTRAINT))

Full Text Available
CAPS: Comprehensible Abstract Policy Summaries for Explaining Reinforcement Learning Agents

McCalmon, Joe; Le, Thai; Alqahtani, Sarra; Lee, Dongwon (May 2022, nt'l Conf. on Autonomous Agents and Multiagent Systems (AAMAS))

Full Text Available

« Prev Next »

Search for: All records