NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cascading Adversarial Bias from Injection to Distillation in Language Models

Chaudhari, Harsh; Hayes, Jamie; Jagielski, Matthew; Shumailov, Ilia; Nasr, Milad; Oprea, Alina (October 2025, ACM Conference on Computer and Communications Security (CCS))

Free, publicly-accessible full text available October 13, 2026
Model-agnostic clean-label backdoor mitigation in cybersecurity environments

Severi, Giorgio; Boboila, Simona; Holodnak, John; Kratkiewicz, Kendra; Izmailov, Rauf; De_Lucia, Michael; Oprea, Alina (October 2025, IEEE Military Communications Conference)

Free, publicly-accessible full text available October 6, 2026
Adversarial Inception Backdoor Attacks against Reinforcement Learning

Rathbun, Ethan; Oprea, Alina; Amato, Christopher (July 2025, 42nd International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
Adversarial Inception Backdoor Attacks against Reinforcement Learning

Rathbun, Ethan; Oprea, Alina; Amato, Christopher (July 2025, 42nd International Conference on Machine Learning (ICML))

Recent works have demonstrated the vulnerability of Deep Reinforcement Learning (DRL) algorithms against training-time, backdoor poisoning attacks. The objectives of these attacks are twofold: induce pre-determined, adversarial behavior in the agent upon observing a fixed trigger during deployment while allowing the agent to solve its intended task during training. Prior attacks assume arbitrary control over the agent's rewards, inducing values far outside the environment's natural constraints. This results in brittle attacks that fail once the proper reward constraints are enforced. Thus, in this work we propose a new class of backdoor attacks against DRL which are the first to achieve state of the art performance under strict reward constraints. These ``inception'' attacks manipulate the agent's training data -- inserting the trigger into prior observations and replacing high return actions with those of the targeted adversarial behavior. We formally define these attacks and prove they achieve both adversarial objectives against arbitrary Markov Decision Processes (MDP). Using this framework we devise an online inception attack which achieves an 100% attack success rate on multiple environments under constrained rewards while minimally impacting the agent's task performance.
more » « less
Free, publicly-accessible full text available July 13, 2026
How to Poison an xApp: Dissecting Backdoor Attacks to Deep Reinforcement Learning in Open Radio Access Networks

https://doi.org/10.1016/j.comnet.2025.111727

Lacava, Andrea; Maxenti, Stefano; Bonati, Leonardo; D’Oro, Salvatore; Oprea, Alina; Melodia, Tommaso; Restuccia, Francesco (September 2025, Computer Networks)

Free, publicly-accessible full text available September 1, 2026
SleeperNets: universal backdoor poisoning attacks against reinforcement learning agents

Rathbun, Ethan; Amato, Christopher; Oprea, Alina (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Free, publicly-accessible full text available December 10, 2025
SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Rathbun, Ethan; Amato, Christopher; Oprea, Alina (December 2024, the Conference on Neural Information Processing Systems (NeurIPS))

Free, publicly-accessible full text available December 10, 2025
Synthesizing Tight Privacy and Accuracy Bounds via Weighted Model Counting

Oakley, Lisa; Holzen, Steven; Oprea, Alina (July 2024, IEEE 37th Computer Security Foundations Symposium (CSF))

Programmatically generating tight differential privacy (DP) bounds is a hard problem. Two core challenges are (1) finding expressive, compact, and efficient encodings of the distributions of DP algorithms, and (2) state space explosion stemming from the multiple quantifiers and relational properties of the DP definition. We address the first challenge by developing a method for tight privacy and accuracy bound synthesis using weighted model counting on binary decision diagrams, a state of the art technique from the artificial intelligence and automated reasoning communities for exactly computing probability distributions. We address the second challenge by developing a framework for leveraging inherent symmetries in DP algorithms. Our solution benefits from ongoing research in probabilistic programming languages, allowing us to succinctly and expressively represent different DP algorithms with approachable language syntax that can be used by non-experts. We provide a detailed case study of our solution on the binary randomized response algorithm. We also evaluate an implementation of our solution using the Dice probabilistic programming language for the randomized response and truncated geometric above threshold algorithms. We compare to prior work on exact DP verification using Markov chain probabilistic model checking and the decision procedure DiPC. Very few existing works consider mechanized analysis of accuracy guarantees for DP algorithms. We additionally provide a detailed analysis using our technique for finding tight accuracy bounds for DP algorithms
more » « less
Full Text Available
TMI! Finetuned Models Leak Private Information from their Pretraining Data

https://doi.org/10.56553/popets-2024-0075

Abascal, John; Wu, Stanley; Oprea, Alina; Ullman, Jonathan (July 2024, Proceedings on Privacy Enhancing Technologies)

Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for privacy in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, TMI, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate TMI on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that TMI can successfully infer membership of pretraining examples using query access to the finetuned model.
more » « less
Full Text Available
Dropout Attacks

https://doi.org/10.1109/SP54263.2024.00026

Yuan, Andrew; Oprea, Alina; Tan, Cheng (May 2024, IEEE)

Full Text Available

« Prev Next »

Search for: All records