skip to main content


Search for: All records

Award ID contains: 1936968

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
    Web tracking and advertising (WTA) nowadays are ubiquitously performed on the web, continuously compromising users' privacy. Existing defense solutions, such as widely deployed blocking tools based on filter lists and alternative machine learning based solutions proposed in prior research, have limitations in terms of accuracy and effectiveness. In this work, we propose WtaGraph, a web tracking and advertising detection framework based on Graph Neural Networks (GNNs). We first construct an attributed homogenous multi-graph (AHMG) that represents HTTP network traffic, and formulate web tracking and advertising detection as a task of GNN-based edge representation learning and classification in AHMG. We then design four components in WtaGraph so that it can (1) collect HTTP network traffic, DOM, and JavaScript data, (2) construct AHMG and extract corresponding edge and node features, (3) build a GNN model for edge representation learning and WTA detection in the transductive learning setting, and (4) use a pre-trained GNN model for WTA detection in the inductive learning setting. We evaluate WtaGraph on a dataset collected from Alexa Top 10K websites, and show that WtaGraph can effectively detect WTA requests in both transductive and inductive learning settings. Manual verification results indicate that WtaGraph can detect new WTA requests that are missed by filter lists and recognize non-WTA requests that are mistakenly labeled by filter lists. Our ablation analysis, evasion evaluation, and real-time evaluation show that WtaGraph can have a competitive performance with flexible deployment options in practice. 
    more » « less
  2. Crowdsourcing is popular for large-scale data collection and labeling, but a major challenge is on detecting low-quality submissions. Recent studies have demonstrated that behavioral features of workers are highly correlated with data quality and can be useful in quality control. However, these studies primarily leveraged coarsely extracted behavioral features, and did not further explore quality control at the fine-grained level, i.e., the annotation unit level. In this paper, we investigate the feasibility and benefits of using fine-grained behavioral features, which are the behavioral features finely extracted from a worker's individual interactions with each single unit in a subtask, for quality control in crowdsourcing. We design and implement a framework named Fine-grained Behavior-based Quality Control (FBQC) that specifically extracts fine-grained behavioral features to provide three quality control mechanisms: (1) quality prediction for objective tasks, (2) suspicious behavior detection for subjective tasks, and (3) unsupervised worker categorization. Using the FBQC framework, we conduct two real-world crowdsourcing experiments and demonstrate that using fine-grained behavioral features is feasible and beneficial in all three quality control mechanisms. Our work provides clues and implications for helping job requesters or crowdsourcing platforms to further achieve better quality control. 
    more » « less
  3. null (Ed.)
    Crowdsourcing platforms are powerful tools for academic researchers. Proponents claim that crowdsourcing helps researchers quickly and affordably recruit enough human subjects with diverse backgrounds to generate significant statistical power, while critics raise concerns about unreliable data quality, labor exploitation, and unequal power dynamics between researchers and workers. We examine these concerns along three dimensions: methods, fairness, and politics. We find that researchers offer vastly different compensation rates for crowdsourced tasks, and address potential concerns about data validity by using platform-specific tools and user verification methods. Additionally, workers depend upon crowdsourcing platforms for a significant portion of their income, are motivated more by fear of losing access to work than by specific compensation rates, and are frustrated by a lack of transparency and occasional unfair treatment from job requesters. Finally, we discuss critical computing scholars’ proposals to address crowdsourcing’s problems, challenges with implementing these resolutions, and potential avenues for future research. 
    more » « less
  4. Existing studies have demonstrated that using traditional machine learning techniques, phishing detection simply based on the features of URLs can be very effective. In this paper, we explore the deep learning approach and build four RNN (Recurrent Neural Network) models that only use lexical features of URLs for detecting phishing attacks. We collect 1.5 million URLs as the dataset and show that our RNN models can achieve a higher than 99% detection accuracy without the need of any expert knowledge to manually identify the features. However, it is well known that RNNs and other deep learning techniques are still largely in black boxes. Understanding the internals of deep learning models is important and highly desirable to the improvement and proper application of the models. Therefore, in this work, we further develop several unique visualization techniques to intensively interpret how RNN models work internally in achieving the outstanding phishing detection performance. Especially, we identify and answer six important research questions, showing that our four RNN models (1) are complementary to each other and can be combined into an ensemble model with even better accuracy, (2) can well capture the relevant features that were manually extracted and used in the traditional machine learning approach for phishing detection, and (3) can help identify useful new features to enhance the accuracy of the traditional machine learning approach. Our techniques and experience in this work could be helpful for researchers to effectively apply deep learning techniques in addressing other real-world security or privacy problems. 
    more » « less
  5. Attention check questions have become commonly used in online surveys published on popular crowdsourcing platforms as a key mechanism to filter out inattentive respondents and improve data quality. However, little research considers the vulnerabilities of this important quality control mechanism that can allow attackers including irresponsible and malicious respondents to automatically answer attention check questions for efficiently achieving their goals. In this paper, we perform the first study to investigate such vulnerabilities, and demonstrate that attackers can leverage deep learning techniques to pass attention check questions automatically. We propose AC-EasyPass, an attack framework with a concrete model, that combines convolutional neural network and weighted feature reconstruction to easily pass attention check questions. We construct the first attention check question dataset that consists of both original and augmented questions, and demonstrate the effectiveness of AC-EasyPass. We explore two simple defense methods, adding adversarial sentences and adding typos, for survey designers to mitigate the risks posed by AC-EasyPass; however, these methods are fragile due to their limitations from both technical and usability perspectives, underlining the challenging nature of defense. We hope our work will raise sufficient attention of the research community towards developing more robust attention check mechanisms. More broadly, our work intends to prompt the research community to seriously consider the emerging risks posed by the malicious use of machine learning techniques to the quality, validity, and trustworthiness of crowdsourcing and social computing. 
    more » « less
  6. Abstract Web measurement is a powerful approach to studying various tracking practices that may compromise the privacy of millions of users. Researchers have built several measurement frameworks and performed a few studies to measure web tracking on the desktop environment. However, little is known about web tracking on the mobile environment, and no tool is readily available for performing a comparative measurement study on mobile and desktop environments. In this work, we built a framework called WTPatrol that allows us and other researchers to perform web tracking measurement on both mobile and desktop environments. Using WTPatrol, we performed the first comparative measurement study of web tracking on 23,310 websites that have both mobile version and desktop version web-pages. We conducted an in-depth comparison of the web tracking practices of those websites between mobile and desktop environments from two perspectives: web tracking based on JavaScript APIs and web tracking based on HTTP cookies. Overall, we found that mobile web tracking has its unique characteristics especially due to mobile-specific trackers, and it has become increasingly as prevalent as desktop web tracking. However, the potential impact of mobile web tracking is more severe than that of desktop web tracking because a user may use a mobile device frequently in different places and be continuously tracked. We further gave some suggestions to web users, developers, and researchers to defend against web tracking. 
    more » « less
  7. The number of smart home IoT (Internet of Things) devices has been growing fast in recent years. Along with the great benefits brought by smart home devices, new threats have appeared. One major threat to smart home users is the compromise of their privacy by traffic analysis (TA) attacks. Researchers have shown that TA attacks can be performed successfully on either plain or encrypted traffic to identify smart home devices and infer user activities. Tunneling traffic is a very strong countermeasure to existing TA attacks. However, in this work, we design a Signature based Tunneled Traffic Analysis (STTA) attack that can be effective even on tunneled traffic. Using a popular smart home traffic dataset, we demonstrate that our attack can achieve an 83% accuracy on identifying 14 smart home devices. We further design a simple defense mechanism based on adding uniform random noise to effectively protect against our TA attack without introducing too much overhead. We prove that our defense mechanism achieves approximate differential privacy. 
    more » « less
  8. Creating effective access control policies is a significant challenge to many organizations. Over-privilege increases security risk from compromised credentials, insider threats, and accidental misuse. Under-privilege prevents users from performing their duties. Policies must balance between these competing goals of minimizing under-privilege vs. over-privilege. The Attribute Based Access Control (ABAC) model has been gaining popularity in recent years because of its advantages in granularity, flexibility, and usability. ABAC allows administrators to create policies based on attributes of users, operations, resources, and the environment. However, in practice, it is often very difficult to create effective ABAC policies in terms of minimizing under-privilege and over-privilege especially for large and complex systems because their ABAC privilege spaces are typically gigantic. In this paper, we take a rule mining approach to mine systems' audit logs for automatically generating ABAC policies which minimize both under-privilege and over-privilege. We propose a rule mining algorithm for creating ABAC policies with rules, a policy scoring algorithm for evaluating ABAC policies from the least privilege perspective, and performance optimization methods for dealing with the challenges of large ABAC privilege spaces. Using a large dataset of 4.7 million Amazon Web Service (AWS) audit log events, we demonstrate that our automated approach can effectively generate least privilege ABAC policies, and can generate policies with less over-privilege and under-privilege than a Role Based Access Control (RBAC) approach. Overall, we hope our work can help promote a wider and faster deployment of the ABAC model, and can help unleash the advantages of ABAC to better protect large and complex computing systems. 
    more » « less