Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract The new wave of ‘foundation models’—general-purpose generative AI models, for production of text (e.g., ChatGPT) or images (e.g., MidJourney)—represent a dramatic advance in the state of the art for AI. But their use also introduces a range of new risks, which has prompted an ongoing conversation about possible regulatory mechanisms. Here we propose a specific principle that should be incorporated into legislation: that any organization developing a foundation model intended for public use must demonstrate a reliabledetection mechanismfor the content it generates, as a condition of its public release. The detection mechanism should be made publicly available in a tool that allows users to query, for an arbitrary item of content, whether the item was generated (wholly or partly) by the model. In this paper, we argue that this requirement is technically feasible and would play an important role in reducing certain risks from new AI models in many domains. We also outline a number of options for the tool’s design, and summarize a number of points where further input from policymakers and researchers would be required.more » « less
-
Sequential learning models situations where agents predict a ground truth in sequence, by using their private, noisy measurements, and the predictions of agents who came earlier in the sequence. We study sequential learning in a social network, where agents only see the actions of the previous agents in their own neighborhood. The fraction of agents who predict the ground truth correctly depends heavily on both the network topology and the ordering in which the predictions are made. A natural question is to find an ordering, with a given network, to maximize the (expected) number of agents who predict the ground truth correctly. In this paper, we show that it is in fact NP-hard to answer this question for a general network, with both the Bayesian learning model and a simple majority rule model. Finally, we show that even approximating the answer is hard.more » « lessFree, publicly-accessible full text available May 19, 2026
-
Static binary analysis is critical to various security tasks such as vulnerability discovery and malware detection. In recent years, binary analysis has faced new challenges as vendors of the Internet of Things (IoT) and Industrial Control Systems (ICS) continue to introduce customized or non-standard binary formats that existing tools cannot readily process. Reverse-engineering each of the new formats is costly as it requires extensive expertise and analysts’ time. In this paper, we investigate the first step to automate the analysis of non-standard binaries, which is to recognize the bytes representing “code” from “data” (i.e., data-code separation). We propose Loadstar, and its key idea is to use the abundant labeled data from standard binaries to train a classifier and adapt it for processing unlabeled non-standard binaries. We use a pseudo-label-based method for domain adaption and leverage knowledge-inspired rules for pseudo-label correction, which serves as the guardrail for the adaption process. A key advantage of the system is that it does not require labeling any non-standard binaries. Using three datasets of non-standard PLC binaries, we evaluate Loadstar and show it outperforms existing tools in terms of both accuracy and processing speed. We will share the tool (open source) with the community.more » « lessFree, publicly-accessible full text available May 12, 2026
-
This work proposes a class of differentially private mechanisms for linear queries, in par- ticular range queries, that leverages corre- lated input perturbation to simultaneously achieve unbiasedness, consistency, statisti- cal transparency, and control over utility re- quirements in terms of accuracy targets ex- pressed either in certain query margins or as implied by the hierarchical database struc- ture. The proposed Cascade Sampling al- gorithm instantiates the mechanism exactly and efficiently. Our theoretical and empir- ical analysis demonstrates that we achieve near-optimal utility, effectively compete with other methods, and retain all the favorable statistical properties discussed earlier.more » « lessFree, publicly-accessible full text available May 3, 2026
-
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs’ ability to manipulate, program, and reason about structured data, especially graphs. We introduce GraphEval36K1 , the first comprehensive graph dataset, comprising 40 graph coding problems and 36,900 test cases to evaluate the ability of LLMs on graph problem solving. Our dataset is categorized into eight primary and four sub-categories to ensure a thorough evaluation across different types of graphs. We benchmark ten LLMs, finding that private models outperform open-source ones, though the gap is narrowing. We also analyze the performance of LLMs across directed vs undirected graphs, different kinds of graph concepts, and network models. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on complex graph tasks. Results show that SSD improves the average passing rate of GPT-4, GPT4o, Gemini-Pro and Claude-3-Sonnet by 8.38%, 6.78%, 29.28% and 25.28%, respectively.more » « lessFree, publicly-accessible full text available April 29, 2026
-
In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing vulnerability datasets, including poor data quality, low label accuracy, and high duplication rates, leading to unreliable model performance in realistic vulnerability detection scenarios. Additionally, the evaluation methods used with these datasets are not representative of real-world vulnerability detection. To address these challenges, we introduce PRIMEVUL, a new dataset for training and evaluating code LMs for vulnerability detection. PRIMEVUL incorporates a novel set of data labeling techniques that achieve comparable label accuracy to humanverified benchmarks while significantly expanding the dataset. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues, alongside introducing more realistic evaluation metrics and settings. This comprehensive approach aims to provide a more accurate assessment of code LMs’ performance in real-world conditions. Evaluating code LMs on PRIMEVUL reveals that existing benchmarks significantly overestimate the performance of these models. For instance, a state-of-the-art 7B model scored 68.26% F1 on BigVul but only 3.09% F1 on PRIMEVUL. Attempts to improve performance through advanced training techniques and larger models like GPT-3.5 and GPT-4 were unsuccessful, with results akin to random guessing in the most stringent settings. These findings underscore the considerable gap between current capabilities and the practical requirements for deploying code LMs in security roles, highlighting the need for more innovative research in this domain.more » « lessFree, publicly-accessible full text available April 27, 2026
-
AI-assisted decision making becomes increasingly prevalent, yet individuals often fail to utilize AI-based decision aids appropriately especially when the AI explanations are absent, potentially as they do not reflect on AI’s decision recommendations critically. Large language models (LLMs), with their exceptional conversational and analytical capabilities, present great opportunities to enhance AI-assisted decision making in the absence of AI explanations by providing natural-language-based analysis of AI’s decision recommendation, e.g., how each feature of a decision making task might contribute to the AI recommendation. In this paper, via a randomized experiment, we first show that presenting LLM-powered analysis of each task feature, either sequentially or concurrently, does not significantly improve people’s AI-assisted decision performance. To enable decision makers to better leverage LLM-powered analysis, we then propose an algorithmic framework to characterize the effects of LLM-powered analysis on human decisions and dynamically decide which analysis to present. Our evaluation with human subjects shows that this approach effectively improves decision makers’ appropriate reliance on AI in AI-assisted decision making.more » « lessFree, publicly-accessible full text available April 26, 2026
-
The problem of maximizing the adoption of a product through viral marketing in social networks has been studied heavily through postulated network models. We present a novel data-driven formulation of the problem. We use Graph Neural Networks (GNNs) to model the adoption of products by utilizing both topological and attribute information. The resulting Dynamic Viral Marketing (DVM) problem seeks to find the minimum budget and minimal set of dynamic topological and attribute changes in order to attain a specified adoption goal. We show that DVM is NP-Hard and is related to the existing influence maximization problem. Motivated by this connection, we develop the idea of Dynamic Gradient Influencing (DGI) that uses gradient ranking to find optimal perturbations and targets low-budget and high influence non-adopters in discrete steps. We use an efficient strategy for computing node budgets and develop the “Meta-Influence” heuristic for assessing a node’s downstream influence. We evaluate DGI against multiple baselines and demonstrate gains on average of 24% on budget and 37% on AUC on real world attributed networks. Our code is publicly available at https: //github.com/saurabhsharma1993/dynamic_viral_marketing.more » « lessFree, publicly-accessible full text available April 22, 2026
-
Language model approaches have recently been integrated into binary analysis tasks, such as function similarity detection and function signature recovery. These models typically employ a two-stage training process: pre-training via Masked Language Modeling (MLM) on machine code and fine-tuning for specific tasks. While MLM helps to understand binary code struc- tures, it ignores essential code characteristics, including control and data flow, which negatively affect model generalization. Recent work leverages domain-specific features (e.g., control flow graphs and dynamic execution traces) in transformer-based approaches to improve binary code semantic understanding. However, this approach involves complex feature engineering, a cumbersome and time-consuming process that can introduce predictive uncertainty when dealing with stripped or obfuscated code, leading to a performance drop. In this paper, we introduce PROTST, a novel transformer-based methodology for binary code embedding. PROTST employs a hierarchical training process based on a unique tree-like structure, where knowledge progressively flows from fundamental tasks at the root to more specialized tasks at the leaves. This progressive teacher-student paradigm allows the model to build upon previously learned knowledge, resulting in high-quality embeddings that can be effectively leveraged for diverse downstream binary analysis tasks. The effectiveness of PROTST is evaluated in seven binary analysis tasks, and the results show that PROTST yields an average validation score (F1, MRR, and Recall@1) improvement of 14.8% compared to traditional two-stage training and an average validation score of 10.7% compared to multimodal two-stage frameworks.more » « lessFree, publicly-accessible full text available March 4, 2026
-
Continual learning (CL) learns a sequence of tasks incre- mentally. This paper studies the challenging CL setting of class-incremental learning (CIL). CIL has two key chal- lenges: catastrophic forgetting (CF) and inter-task class sep- aration (ICS). Despite numerous proposed methods, these issues remain persistent obstacles. This paper proposes a novel CIL method, called Kernel Linear Discriminant Analy- sis (KLDA), that can effectively avoid CF and ICS problems. It leverages only the powerful features learned in a foundation model (FM). However, directly using these features proves suboptimal. To address this, KLDA incorporates the Radial Basis Function (RBF) kernel and its Random Fourier Fea- tures (RFF) to enhance the feature representations from the FM, leading to improved performance. When a new task ar- rives, KLDA computes only the mean for each class in the task and updates a shared covariance matrix for all learned classes based on the kernelized features. Classification is performed using Linear Discriminant Analysis. Our empir- ical evaluation using text and image classification datasets demonstrates that KLDA significantly outperforms baselines. Remarkably, without relying on replay data, KLDA achieves accuracy comparable to joint training of all classes, which is considered the upper bound for CIL performance. The KLDA code is available at https://github.com/salehmomeni/klda.more » « lessFree, publicly-accessible full text available February 25, 2026