skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Yue, Chuan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The European General Data Protection Regulation (GDPR) mandates a data controller (e.g., an app developer) to provide all information specified in Articles (Arts.) 13 and 14 to data subjects (e.g., app users) regarding how their data are being processed and what are their rights. While some studies have started to detect the fulfillment of GDPR requirements in a privacy policy, their exploration only focused on a subset of mandatory GDPR requirements. In this paper, our goal is to explore the state of GDPR-completeness violations in mobile apps' privacy policies. To achieve our goal, we design the PolicyChecker framework by taking a rule and semantic role based approach. PolicyChecker automatically detects completeness violations in privacy policies based not only on all mandatory GDPR requirements but also on all if-applicable GDPR requirements that will become mandatory under specific conditions. Using PolicyChecker, we conduct the first large-scale GDPR-completeness violation study on 205,973 privacy policies of Android apps in the UK Google Play store. PolicyChecker identified 163,068 (79.2%) privacy policies containing data collection statements; therefore, such policies are regulated by GDPR requirements. However, the majority (99.3%) of them failed to achieve the GDPR-completeness with at least one unsatisfied requirement; 98.1% of them had at least one unsatisfied mandatory requirement, while 73.0% of them had at least one unsatisfied if-applicable requirement logic chain. We conjecture that controllers' lack of understanding of some GDPR requirements and their poor practices in composing a privacy policy can be the potential major causes behind the GDPR-completeness violations. We further discuss recommendations for app developers to improve the completeness of their apps' privacy policies to provide a more transparent personal data processing environment to users. 
    more » « less
  2. Mobile and web apps are increasingly relying on the data generated or provided by users such as from their uploaded documents and images. Unfortunately, those apps may raise significant user privacy concerns. Specifically, to train or adapt their models for accurately processing huge amounts of data continuously collected from millions of app users, app or service providers have widely adopted the approach of crowdsourcing for recruiting crowd workers to manually annotate or transcribe the sampled ever-changing user data. However, when users' data are uploaded through apps and then become widely accessible to hundreds of thousands of anonymous crowd workers, many human-in-the-loop related privacy questions arise concerning both the app user community and the crowd worker community. In this paper, we propose to investigate the privacy risks brought by this significant trend of large-scale crowd-powered processing of app users' data generated in their daily activities. We consider the representative case of receipt scanning apps that have millions of users, and focus on the corresponding receipt transcription tasks that appear popularly on crowdsourcing platforms. We design and conduct an app user survey study (n=108) to explore how app users perceive privacy in the context of using receipt scanning apps. We also design and conduct a crowd worker survey study (n=102) to explore crowd workers' experiences on receipt and other types of transcription tasks as well as their attitudes towards such tasks. Overall, we found that most app users and crowd workers expressed strong concerns about the potential privacy risks to receipt owners, and they also had a very high level of agreement with the need for protecting receipt owners' privacy. Our work provides insights on app users' potential privacy risks in crowdsourcing, and highlights the need and challenges for protecting third party users' privacy on crowdsourcing platforms. We have responsibly disclosed our findings to the related crowdsourcing platform and app providers. 
    more » « less
  3. null (Ed.)
    Web tracking and advertising (WTA) nowadays are ubiquitously performed on the web, continuously compromising users' privacy. Existing defense solutions, such as widely deployed blocking tools based on filter lists and alternative machine learning based solutions proposed in prior research, have limitations in terms of accuracy and effectiveness. In this work, we propose WtaGraph, a web tracking and advertising detection framework based on Graph Neural Networks (GNNs). We first construct an attributed homogenous multi-graph (AHMG) that represents HTTP network traffic, and formulate web tracking and advertising detection as a task of GNN-based edge representation learning and classification in AHMG. We then design four components in WtaGraph so that it can (1) collect HTTP network traffic, DOM, and JavaScript data, (2) construct AHMG and extract corresponding edge and node features, (3) build a GNN model for edge representation learning and WTA detection in the transductive learning setting, and (4) use a pre-trained GNN model for WTA detection in the inductive learning setting. We evaluate WtaGraph on a dataset collected from Alexa Top 10K websites, and show that WtaGraph can effectively detect WTA requests in both transductive and inductive learning settings. Manual verification results indicate that WtaGraph can detect new WTA requests that are missed by filter lists and recognize non-WTA requests that are mistakenly labeled by filter lists. Our ablation analysis, evasion evaluation, and real-time evaluation show that WtaGraph can have a competitive performance with flexible deployment options in practice. 
    more » « less
  4. Crowdsourcing is popular for large-scale data collection and labeling, but a major challenge is on detecting low-quality submissions. Recent studies have demonstrated that behavioral features of workers are highly correlated with data quality and can be useful in quality control. However, these studies primarily leveraged coarsely extracted behavioral features, and did not further explore quality control at the fine-grained level, i.e., the annotation unit level. In this paper, we investigate the feasibility and benefits of using fine-grained behavioral features, which are the behavioral features finely extracted from a worker's individual interactions with each single unit in a subtask, for quality control in crowdsourcing. We design and implement a framework named Fine-grained Behavior-based Quality Control (FBQC) that specifically extracts fine-grained behavioral features to provide three quality control mechanisms: (1) quality prediction for objective tasks, (2) suspicious behavior detection for subjective tasks, and (3) unsupervised worker categorization. Using the FBQC framework, we conduct two real-world crowdsourcing experiments and demonstrate that using fine-grained behavioral features is feasible and beneficial in all three quality control mechanisms. Our work provides clues and implications for helping job requesters or crowdsourcing platforms to further achieve better quality control. 
    more » « less
  5. Existing studies have demonstrated that using traditional machine learning techniques, phishing detection simply based on the features of URLs can be very effective. In this paper, we explore the deep learning approach and build four RNN (Recurrent Neural Network) models that only use lexical features of URLs for detecting phishing attacks. We collect 1.5 million URLs as the dataset and show that our RNN models can achieve a higher than 99% detection accuracy without the need of any expert knowledge to manually identify the features. However, it is well known that RNNs and other deep learning techniques are still largely in black boxes. Understanding the internals of deep learning models is important and highly desirable to the improvement and proper application of the models. Therefore, in this work, we further develop several unique visualization techniques to intensively interpret how RNN models work internally in achieving the outstanding phishing detection performance. Especially, we identify and answer six important research questions, showing that our four RNN models (1) are complementary to each other and can be combined into an ensemble model with even better accuracy, (2) can well capture the relevant features that were manually extracted and used in the traditional machine learning approach for phishing detection, and (3) can help identify useful new features to enhance the accuracy of the traditional machine learning approach. Our techniques and experience in this work could be helpful for researchers to effectively apply deep learning techniques in addressing other real-world security or privacy problems. 
    more » « less
  6. Abstract Web measurement is a powerful approach to studying various tracking practices that may compromise the privacy of millions of users. Researchers have built several measurement frameworks and performed a few studies to measure web tracking on the desktop environment. However, little is known about web tracking on the mobile environment, and no tool is readily available for performing a comparative measurement study on mobile and desktop environments. In this work, we built a framework called WTPatrol that allows us and other researchers to perform web tracking measurement on both mobile and desktop environments. Using WTPatrol, we performed the first comparative measurement study of web tracking on 23,310 websites that have both mobile version and desktop version web-pages. We conducted an in-depth comparison of the web tracking practices of those websites between mobile and desktop environments from two perspectives: web tracking based on JavaScript APIs and web tracking based on HTTP cookies. Overall, we found that mobile web tracking has its unique characteristics especially due to mobile-specific trackers, and it has become increasingly as prevalent as desktop web tracking. However, the potential impact of mobile web tracking is more severe than that of desktop web tracking because a user may use a mobile device frequently in different places and be continuously tracked. We further gave some suggestions to web users, developers, and researchers to defend against web tracking. 
    more » « less
  7. null (Ed.)
    Crowdsourcing platforms are powerful tools for academic researchers. Proponents claim that crowdsourcing helps researchers quickly and affordably recruit enough human subjects with diverse backgrounds to generate significant statistical power, while critics raise concerns about unreliable data quality, labor exploitation, and unequal power dynamics between researchers and workers. We examine these concerns along three dimensions: methods, fairness, and politics. We find that researchers offer vastly different compensation rates for crowdsourced tasks, and address potential concerns about data validity by using platform-specific tools and user verification methods. Additionally, workers depend upon crowdsourcing platforms for a significant portion of their income, are motivated more by fear of losing access to work than by specific compensation rates, and are frustrated by a lack of transparency and occasional unfair treatment from job requesters. Finally, we discuss critical computing scholars’ proposals to address crowdsourcing’s problems, challenges with implementing these resolutions, and potential avenues for future research. 
    more » « less
  8. The number of smart home IoT (Internet of Things) devices has been growing fast in recent years. Along with the great benefits brought by smart home devices, new threats have appeared. One major threat to smart home users is the compromise of their privacy by traffic analysis (TA) attacks. Researchers have shown that TA attacks can be performed successfully on either plain or encrypted traffic to identify smart home devices and infer user activities. Tunneling traffic is a very strong countermeasure to existing TA attacks. However, in this work, we design a Signature based Tunneled Traffic Analysis (STTA) attack that can be effective even on tunneled traffic. Using a popular smart home traffic dataset, we demonstrate that our attack can achieve an 83% accuracy on identifying 14 smart home devices. We further design a simple defense mechanism based on adding uniform random noise to effectively protect against our TA attack without introducing too much overhead. We prove that our defense mechanism achieves approximate differential privacy. 
    more » « less
  9. Attention check questions have become commonly used in online surveys published on popular crowdsourcing platforms as a key mechanism to filter out inattentive respondents and improve data quality. However, little research considers the vulnerabilities of this important quality control mechanism that can allow attackers including irresponsible and malicious respondents to automatically answer attention check questions for efficiently achieving their goals. In this paper, we perform the first study to investigate such vulnerabilities, and demonstrate that attackers can leverage deep learning techniques to pass attention check questions automatically. We propose AC-EasyPass, an attack framework with a concrete model, that combines convolutional neural network and weighted feature reconstruction to easily pass attention check questions. We construct the first attention check question dataset that consists of both original and augmented questions, and demonstrate the effectiveness of AC-EasyPass. We explore two simple defense methods, adding adversarial sentences and adding typos, for survey designers to mitigate the risks posed by AC-EasyPass; however, these methods are fragile due to their limitations from both technical and usability perspectives, underlining the challenging nature of defense. We hope our work will raise sufficient attention of the research community towards developing more robust attention check mechanisms. More broadly, our work intends to prompt the research community to seriously consider the emerging risks posed by the malicious use of machine learning techniques to the quality, validity, and trustworthiness of crowdsourcing and social computing. 
    more » « less
  10. Creating effective access control policies is a significant challenge to many organizations. Over-privilege increases security risk from compromised credentials, insider threats, and accidental misuse. Under-privilege prevents users from performing their duties. Policies must balance between these competing goals of minimizing under-privilege vs. over-privilege. The Attribute Based Access Control (ABAC) model has been gaining popularity in recent years because of its advantages in granularity, flexibility, and usability. ABAC allows administrators to create policies based on attributes of users, operations, resources, and the environment. However, in practice, it is often very difficult to create effective ABAC policies in terms of minimizing under-privilege and over-privilege especially for large and complex systems because their ABAC privilege spaces are typically gigantic. In this paper, we take a rule mining approach to mine systems' audit logs for automatically generating ABAC policies which minimize both under-privilege and over-privilege. We propose a rule mining algorithm for creating ABAC policies with rules, a policy scoring algorithm for evaluating ABAC policies from the least privilege perspective, and performance optimization methods for dealing with the challenges of large ABAC privilege spaces. Using a large dataset of 4.7 million Amazon Web Service (AWS) audit log events, we demonstrate that our automated approach can effectively generate least privilege ABAC policies, and can generate policies with less over-privilege and under-privilege than a Role Based Access Control (RBAC) approach. Overall, we hope our work can help promote a wider and faster deployment of the ABAC model, and can help unleash the advantages of ABAC to better protect large and complex computing systems. 
    more » « less