skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, November 14 until 2:00 AM ET on Saturday, November 15 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on December 1, 2026

Title: On the uniqueness of AntiVirus labels: How many labels do we need to fingerprint an AV?
The biggest drawback of AntiViruses (AVs) experiments is label heterogeneity–each AV labels the same samples very distinctly. Whereas AV labeling issues have been well studied from the sample point of view, they have not been widely studied from the AV perspective, i.e., to what extent label diversity allows AV identification. Thus, we question: (1) How unique among all the AVs are the labels produced by the same given AV? (2) Can we fingerprint AVs based on their assigned labels? and (3) How many labels are required to fingerprint an AV? In this work, we answer these questions via experiments with a dataset of 720000 AV-assigned labels for Windows malware spread over 15 years (2006-2020). We discovered that: (1) AVs can be fingerprinted by their assigned labels with 100% accuracy in many cases; (2) AVs can be fingerprinted with a confidence score of 99% using only 1% of the dataset; (3) AV fingerprinting rates vary over time, as the label changes caused by the AV updates have a key effect on AV recognition, causing some AV models to lose their ability to recognize their AV generated labels over time; and (4) Android AVs can be fingerprinted the same way as Windows AVs, but that Linux labels are harder to be grouped. We expect our work might shed light on the label heterogeneity problem, incentivize further developments to mitigate it, and provide future works with data to support their design decisions.  more » « less
Award ID(s):
2327427
PAR ID:
10611606
Author(s) / Creator(s):
Publisher / Repository:
Springer
Date Published:
Journal Name:
Journal of Computer Virology and Hacking Techniques
Edition / Version:
1.0
Volume:
21
Issue:
1
ISSN:
2263-8733
Subject(s) / Keyword(s):
Antivirus
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Label differential privacy is a relaxation of differential privacy for machine learning scenarios where the labels are the only sensitive information that needs to be protected in the training data. For example, imagine a survey from a participant in a university class about their vaccination status. Some attributes of the students are publicly available but their vaccination status is sensitive information and must remain private. Now if we want to train a model that predicts whether a student has received vaccination using only their public information, we can use label-DP. Recent works on label-DP use different ways of adding noise to the labels in order to obtain label-DP models. In this work, we present novel techniques for training models with label-DP guarantees by leveraging unsupervised learning and semi-supervised learning, enabling us to inject less noise while obtaining the same privacy, therefore achieving a better utility-privacy trade-off. We first introduce a framework that starts with an unsupervised classifier f0 and dataset D with noisy label set Y , reduces the noise in Y using f0 , and then trains a new model f using the less noisy dataset. Our noise reduction strategy uses the model f0 to remove the noisy labels that are incorrect with high probability. Then we use semi-supervised learning to train a model using the remaining labels. We instantiate this framework with multiple ways of obtaining the noisy labels and also the base classifier. As an alternative way to reduce the noise, we explore the effect of using unsupervised learning: we only add noise to a majority voting step for associating the learned clusters with a cluster label (as opposed to adding noise to individual labels); the reduced sensitivity enables us to add less noise. Our experiments show that these techniques can significantly outperform the prior works on label-DP. 
    more » « less
  2. Driverless or fully automated vehicles (AVs) are expected to fundamentally change how individuals and households travel and how vehicles use roadway infrastructure. The first goal of this study is to develop a modeling framework of activity-constrained household travel in a future multi-modal network with private AVs, shared-use AVs, transit, and intermodal AV-transit travel options. The second goal is to analyze the potential impacts of AVs—including intermodal AV-transit travel—on (a) household-level travel behavior, (b) household travel costs, (c) demand for transport modes, including transit, and (d) vehicle kilometers traveled or VKT. To meet the first goal, we propose and formulate the Household Activity Pattern Problem with AV-enabled Intermodal Trips (HAPP-AV-IT) that incorporates AV deadheading and intermodal AV-transit trips. The modeling framework extends prior HAPP-based formulations that model household-level travel decisions as vehicle (and person) routing and scheduling problems, similar to the pickup and delivery problem with time-windows. To meet the second goal, we apply the HAPP-AV-IT to two case studies and conduct many computational experiments. We use synthetic activity location data for synthetic households and a fictitious medium-size network with a road network, transit network, residential locations, activity locations, and parking locations. The computational results illustrate (a) the critical role that household AV ownership plays in terms of household travel decisions, modal demand, and VKT, (b) that with AVs, deadheading accounts for 30–40 % of vehicle operating distances, (c) that around 10 % of households in the study region benefit from AV-based intermodal trips, and (d) that those 10 % of households see 5 % reductions in household travel costs and 25 % reductions in VKT on average in the most transit friendly scenario. This last finding suggests that intermodal AV-transit trips may exist in a driverless vehicle future, and therefore, transit agencies and transportation planners should consider how to serve this market. We also propose and test a simple heuristic algorithm that quickly solves HAPP-AV-IT problem instances. 
    more » « less
  3. A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text. We propose a novel schema label generation model which generates possible schema labels based on dataset table content. We incorporate the generated schema labels into a mixed ranking model which not only considers the relevance between the query and dataset metadata but also the similarity between the query and generated schema labels. To evaluate our method on real-world datasets, we create a new benchmark specifically for the dataset retrieval task. Experiments show that our approach can effectively improve the precision and NDCG scores of the dataset retrieval task compared with baseline methods. We also test on a collection of Wikipedia tables to show that the features generated from schema labels can improve the unsupervised and supervised web table retrieval task as well. 
    more » « less
  4. Multi-sensor fusion has been widely used by autonomous vehicles (AVs) to integrate the perception results from different sensing modalities including LiDAR, camera and radar. Despite the rapid development of multi-sensor fusion systems in autonomous driving, their vulnerability to malicious attacks have not been well studied. Although some prior works have studied the attacks against the perception systems of AVs, they only consider a single sensing modality or a camera-LiDAR fusion system, which can not attack the sensor fusion system based on LiDAR, camera, and radar. To fill this research gap, in this paper, we present the first study on the vulnerability of multi-sensor fusion systems that employ LiDAR, camera, and radar. Specifically, we propose a novel attack method that can simultaneously attack all three types of sensing modalities using a single type of adversarial object. The adversarial object can be easily fabricated at low cost, and the proposed attack can be easily performed with high stealthiness and flexibility in practice. Extensive experiments based on a real-world AV testbed show that the proposed attack can continuously hide a target vehicle from the perception system of a victim AV using only two small adversarial objects. 
    more » « less
  5. Abstract Pilot projects have emerged in cities globally as a way to experiment with the utilization of a suite of smart mobility and emerging transportation technologies. Automated vehicles (AVs) have become central tools for such projects as city governments and industry explore the use and impact of this emerging technology. This paper presents a large-scale assessment of AV pilot projects in U.S. cities to understand how pilot projects are being used to examine the risks and benefits of AVs, how cities integrate these potentially transformative technologies into conventional policy and planning, and how and what they are learning about this technology and its future opportunities and risks. Through interviews with planning practitioners and document analysis, we demonstrate that the approaches cities take for AVs differ significantly, and often lack coherent policy goals. Key findings from this research include: (1) a disconnect between the goals of the pilot projects and a city’s transportation goals; (2) cities generally lack a long-term vision for how AVs fit into future mobility systems and how they might help address transportation goals; (3) an overemphasis of non-transportation benefits of AV pilots projects; (4) AV pilot projects exhibit a lack of policy learning and iteration; and (5) cities are not leveraging pilot projects for public benefits. Overall, urban and transportation planners and decision makers show a clear interest to discover how AVs can be used to address transportation challenges in their communities, but our research shows that while AV pilot projects purport to do this, while having numerous outcomes, they have limited value for informing transportation policy and planning questions around AVs. We also find that AV pilot projects, as presently structured, may constrain planners’ ability to re-think transportation systems within the context of rapid technological change. 
    more » « less