skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fraud De-Anonymization for Fun and Profit
The persistence of search rank fraud in online, peer-opinion systems, made possible by crowdsourcing sites and specialized fraud workers, shows that the current approach of detecting and filtering fraud is inefficient. We introduce a fraud de-anonymization approach to disincentivize search rank fraud: attribute user accounts flagged by fraud detection algorithms in online peer-opinion systems, to the human workers in crowdsourcing sites, who control them. We model fraud de-anonymization as a maximum likelihood estimation problem, and introduce UODA, an unconstrained optimization solution. We develop a graph based deep learning approach to predict ownership of account pairs by the same fraudster and use it to build discriminative fraud de-anonymization (DDA) and pseudonymous fraudster discovery algorithms (PFD). To address the lack of ground truth fraud data and its pernicious impacts on online systems that employ fraud detection, we propose the first cheating-resistant fraud de-anonymization validation protocol, that transforms human fraud workers into ground truth, performance evaluation oracles. In a user study with 16 human fraud workers, UODA achieved a precision of 91%. On ground truth data that we collected starting from other 23 fraud workers, our co-ownership predictor significantly outperformed a state-of-the-art competitor, and enabled DDA and PFD to discover tens of new fraud workers, and attribute thousands of suspicious user accounts to existing and newly discovered fraudsters.  more » « less
Award ID(s):
1840714
PAR ID:
10095261
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security
Page Range / eLocation ID:
115 to 130
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Black Hat App Search Optimization (ASO) in the form of fake reviews and sockpuppet accounts, is prevalent in peer-opinion sites, e.g., app stores, with negative implications on the digital and real lives of their users. To detect and filter fraud, a growing body of research has provided insights into various aspects of fraud posting activities, and made assumptions about the working procedures of the fraudsters from online data. However, such assumptions often lack empirical evidence from the actual fraud perpetrators. To address this problem, in this paper, we present results of both a qualitative study with 18 ASO workers we recruited from 5 freelancing sites, concerning activities they performed on Google Play, and a quantitative investigation with fraud-related data collected from other 39 ASO workers. We reveal findings concerning various aspects of ASO worker capabilities and behaviors, including novel insights into their working patterns, and supporting evidence for several existing assumptions. Further, we found and report participant-revealed techniques to bypass Google-imposed verifications, concrete strategies to avoid detection, and even strategies that leverage fraud detection to enhance fraud efficacy. We report a Google site vulnerability that enabled us to infer the mobile device models used to post more than 198 million reviews in Google Play, including 9,942 fake reviews. We discuss the deeper implications of our findings, including their potential use to develop the next generation fraud detection and prevention systems. 
    more » « less
  2. Online merchants face difficulties in using existing card fraud detection algorithms, so in this paper we propose a novel proactive fraud detection model using what we call invariant diversity to reveal patterns among attributes of the devices (computers or smartphones) that are used in conducting the transactions. The model generates a regression function from a diversity index of various attribute combinations, and use it to detect anomalies inherent in certain fraudulent transactions. This approach allows for proactive fraud detection using a relatively small number of unsupervised transactions and is resistant to fraudsters' device obfuscation attempt. We tested our system successfully on real online merchant transactions and it managed to find several instances of previously undetected fraudulent transactions. 
    more » « less
  3. Attributed networks are a type of graph structured data used in many real-world scenarios. Detecting anomalies on attributed networks has a wide spectrum of applications such as spammer detection and fraud detection. Although this research area draws increasing attention in the last few years, previous works are mostly unsupervised because of expensive costs of labeling ground truth anomalies. Many recent studies have shown different types of anomalies are often mixed together on attributed networks and such invaluable human knowledge could provide complementary insights in advancing anomaly detection on attributed networks. To this end, we study the novel problem of modeling and integrating human knowledge of different anomaly types for attributed network anomaly detection. Specifically, we first model prior human knowledge through a novel data augmentation strategy. We then integrate the modeled knowledge in a Siamese graph neural network encoder through a well-designed contrastive loss. In the end, we train a decoder to reconstruct the original networks from the node representations learned by the encoder, and rank nodes according to its reconstruction error as the anomaly metric. Experiments on five real-world datasets demonstrate that the proposed framework outperforms the state-of-the-art anomaly detection algorithms. 
    more » « less
  4. null (Ed.)
    Targeting the right group of workers for crowdsourcing often achieves better quality results. One unique example of targeted crowdsourcing is seeking community-situated workers whose familiarity with the background and the norms of a particular group can help produce better outcome or accuracy. These community-situated crowd workers can be recruited in different ways from generic online crowdsourcing platforms or from online recovery communities. We evaluate three different approaches to recruit generic and community-situated crowd in terms of the time and the cost of recruitment, and the accuracy of task completion. We consider the context of Alcoholics Anonymous (AA), the largest peer support group for recovering alcoholics, and the task of identifying and validating AA meeting information. We discuss the benefits and trade-offs of recruiting paid vs. unpaid community-situated workers and provide implications for future research in the recovery context and relevant domains of HCI, and for the design of crowdsourcing ICT systems. 
    more » « less
  5. The effectiveness of clarification question models in engaging users within search systems is currently constrained, casting doubt on their overall usefulness. To improve the performance of these models, it is crucial to employ assessment approaches that encompass both real-time feedback from users (online evaluation) and the characteristics of clarification questions evaluated through human assessment (offline evaluation). However, the relationship between online and offline evaluations has been debated in information retrieval. This study aims to investigate how this discordance holds in search clarification. We use user engagement as ground truth and employ several offline labels to investigate to what extent the offline ranked lists of clarification resemble the ideal ranked lists based on online user engagement. Contrary to the current understanding that offline evaluations fall short of supporting online evaluations, we indicate that when identifying the most engaging clarification questions from the user’s perspective, online and offline evaluations correspond with each other. We show that the query length does not influence the relationship between online and offline evaluations, and reducing uncertainty in online evaluation strengthens this relationship. We illustrate that an engaging clarification needs to excel from multiple perspectives, and SERP quality and characteristics of the clarification are equally important. We also investigate if human labels can enhance the performance of Large Language Models (LLMs) and Learning-to-Rank (LTR) models in identifying the most engaging clarification questions from the user’s perspective by incorporating offline evaluations as input features. Our results indicate that LTR models do not perform better than individual offline labels. However, GPT, an LLM, emerges as the standout performer, surpassing all LTR models and offline labels. 
    more » « less