skip to main content


Search for: All records

Award ID contains: 1801644

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. QR Codes have become a pervasive mechanism for encoding machine-readable digital data in the offline world. As the Internet age has taught us, mechanisms that become pervasive very often engender privacy concerns regarding their use. As such, here we conduct an investigation of the privacy implications of the QR Code ecosystem as it exists today. We find that there are several shortener services with substantial popularity, and investigate the extent to which these shortener services conduct various types of tracking of individuals who interact with the created QR Codes. Additionally, we collect 948 QR codes posted within the world, and evaluate them for various types of tracking as well. Overall, we find no evidence that QR codes are a substantial or unique privacy threat when compared to other link sharing mechanisms available online. Even so, the theoretical potential for surreptitious tracking exists, and more in depth study of the QR Code ecosystem will allow for deeper investigation of the relationship between online and offline tracking. 
    more » « less
  2. null (Ed.)
    With the ubiquity of data breaches, forgotten-about files stored in the cloud create latent privacy risks. We take a holistic approach to help users identify sensitive, unwanted files in cloud storage. We first conducted 17 qualitative interviews to characterize factors that make humans perceive a file as sensitive, useful, and worthy of either protection or deletion. Building on our findings, we conducted a primarily quantitative online study. We showed 108 long-term users of Google Drive or Dropbox a selection of files from their accounts. They labeled and explained these files’ sensitivity, usefulness, and desired management (whether they wanted to keep, delete, or protect them). For each file, we collected many metadata and content features, building a training dataset of 3,525 labeled files. We then built Aletheia, which predicts a file’s perceived sensitivity and usefulness, as well as its desired management. Aletheia improves over state-of-the-art baselines by 26% to 159%, predicting users’ desired file-management decisions with 79% accuracy. Notably, predicting subjective perceptions of usefulness and sensitivity led to a 10% absolute accuracy improvement in predicting desired file-management decisions. Aletheia’s performance validates a human-centric approach to feature selection when using inference techniques on subjective security-related tasks. It also improves upon the state of the art in minimizing the attack surface of cloud accounts. 
    more » « less
  3. null (Ed.)
    Current approaches to A/B testing in networks focus on limiting interference, the concern that treatment effects can ”spill over” from treatment nodes to control nodes and lead to biased causal effect estimation. Prominent methods for network experiment design rely on two-stage randomization, in which sparsely-connected clusters are identified and cluster randomization dictates the node assignment to treatment and control. Here, we show that cluster randomization does not ensure sufficient node randomization and it can lead to selection bias in which treatment and control nodes represent different populations of users. To address this problem, we propose a principled framework for network experiment design which jointly minimizes interference and selection bias. We introduce the concepts of edge spillover probability and cluster matching and demonstrate their importance for designing network A/B testing. Our experiments on a number of real-world datasets show that our proposed framework leads to significantly lower error in causal effect estimation than existing solutions. 
    more » « less
  4. Click data collected by modern recommendation systems are an important source of observational data that can be utilized to train learning-to-rank (LTR) systems. However, these data suffer from a number of biases that can result in poor performance for LTR systems. Recent methods for bias correction in such systems mostly focus on position bias, the fact that higher ranked results (e.g., top search engine results) are more likely to be clicked even if they are not the most relevant results given a user’s query. Less attention has been paid to correcting for selection bias, which occurs because clicked documents are reflective of what documents have been shown to the user in the first place. Here, we propose new counterfactual approaches which adapt Heckman's two-stage method and accounts for selection and position bias in LTR systems. Our empirical evaluation shows that our proposed methods are much more robust to noise and have better accuracy compared to existing unbiased LTR algorithms, especially when there is moderate to no position bias. 
    more » « less
  5. When users post on social media, they protect their privacy by choosing an access control setting that is rarely revisited. Changes in users' lives and relationships, as well as social media platforms themselves, can cause mismatches between a post's active privacy setting and the desired setting. The importance of managing this setting combined with the high volume of potential friend-post pairs needing evaluation necessitate a semi-automated approach. We attack this problem through a combination of a user study and the development of automated inference of potentially mismatched privacy settings. A total of 78 Facebook users reevaluated the privacy settings for five of their Facebook posts, also indicating whether a selection of friends should be able to access each post. They also explained their decision. With this user data, we designed a classifier to identify posts with currently incorrect sharing settings. This classifier shows a 317% improvement over a baseline classifier based on friend interaction. We also find that many of the most useful features can be collected without user intervention, and we identify directions for improving the classifier's accuracy. 
    more » « less
  6. The causal effect of a treatment can vary from person to per-son based on their individual characteristics and predispositions. Mining for patterns of individual-level effect differences, a problem known as heterogeneous treatment effect estimation, has many important applications, from precision medicine to recommender systems. In this paper we define and study a variant of this problem in which an individual-level threshold in treatment needs to be reached, in order to trigger an effect. One of the main contributions of our work is that we do not only estimate heterogeneous treatment effects with fixed treatments but can also prescribe individualized treatments. We propose a tree-based learning method to find the heterogeneity in the treatment effects. Our experimental results on multiple datasets show that our approach can learn the triggers better than existing approaches. 
    more » « less
  7. Every day people share personal stories online, reaching millions of users around the world through blogs, social media and news websites. Why are some of these stories more attractive to readers than others? What features of these personal narratives make readers empathize with the storyteller? Do the readers’ personal characteristics and experiences play a role in feeling connection to the story they read? Experimental studies in psychology show that there are several factors that increase empathy in the aggregate, but there is a need for deeper understanding of empathetic feelings at the individual level of storyteller, story, and reader. Here, we present the design and analysis of a survey that studied the impact of story features and reader predispositions and perceptions on the empathy they feel when reading online stories. We use causal trees to find the individual-level causal factors for empathy and to understand the heterogeneity in the treatment effects. One of our main findings is that empathy is contextual and, while reader personality plays a significant role in evoking empathy, the mood of the reader prior to reading the story and linguistic story features have an impact as well. The results of our analyses can be used to help people create content that others care about and to help them communicate more effectively 
    more » « less