skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Title: Beyond Bot Detection: Combating Fraudulent Online Survey Takers
Different techniques have been recommended to detect fraudulent responses in online surveys, but little research has been taken to systematically test the extent to which they actually work in practice. In this paper, we conduct an empirical evaluation of 22 antifraud tests in two complementary online surveys. The first survey recruits Rust programmers on public online forums and social media networks. We find that fraudulent respondents involve both bot and human characteristics. Among different anti-fraud tests, those designed based on domain knowledge are the most effective. By combining individual tests, we can achieve a detection performance as good as commercial techniques while making the results more explainable. To explore these tests under a broader context, we ran a different survey on Amazon Mechanical Turk (MTurk). The results show that for a generic survey without requiring users to have any domain knowledge, it is more difficult to distinguish fraudulent responses. However, a subset of tests still remain effective.  more » « less
Award ID(s):
1955965 2030521
PAR ID:
10321051
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Web Conference 2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Online surveys are a popular method for collecting data in the social sciences. Despite its cost-effectiveness, concerns regarding the legitimacy of data from online surveys are increasing. One such concern is fraudulent responses or “spam” by malicious agents intentionally deceiving the survey process to gain monetary incentives or sway research results. The research costs of “spam”—their influence on research conclusions and their threat to scientific integrity—are not well understood. Here we show the differences in financial and research costs of spam using data from an online survey of transportation workers that was cleaned using a stringent battery of spam detection techniques that utilized commercially available features and a custom spam detection algorithm. We found that we would have wasted about 73% of our budget on incentivizing spammers if we had stopped data collection upon reaching the intended sample size. We also found significant differences in research conclusions related to the relationships between key organizational constructs, including affective commitment, job satisfaction, and turnover intention, between subsamples with and without spam. Our results demonstrate that researchers who are unaware of spam or do not adequately clean their data may spend substantially more monetary and human resources, as well as derive misleading conclusions. This study highlights the importance of survey researchers being cognizant of spam responses and employing robust spam detection techniques to ensure the scientific integrity of non-probability online survey research. 
    more » « less
  2. Effective fraud prevention and participant validation are essential for ensuring data quality in today's highly-digitized research landscape. Increasingly sophisticated bots and high levels of fraudulent participants have generated a need for more complex and nuanced methods to combat fraudulent activity. In this paper, we share our experiences with fraudulent survey responses, which we encountered in our work around abortion storytelling, and the multi-stage protocol that we developed to validate participants. We found that effective fraud prevention should start early and include a variety of flagging methods to encourage holistic pattern-searching in data. Researchers should overestimate the amount of time they will need to validate participants and consider asking participants to assist in the validation process. We encourage researchers to be transparent about the interpretive nature of this work. To this end, we contribute a Participant Validation Guide in supplemental materials for community members to adapt in their own practices. 
    more » « less
  3. Privacy and security researchers often rely on data collected through online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) and Prolific. Prior work---which used data collected in the United States between 2013 and 2017---found that MTurk responses regarding security and privacy were generally representative for people under 50 or with some college education. However, the landscape of online crowdsourcing has changed significantly over the last five years, with the rise of Prolific as a major platform and the increasing presence of bots. This work attempts to replicate the prior results about the external validity of online privacy and security surveys. We conduct an online survey on MTurk (n=800), a gender-balanced survey on Prolific (n=800), and a representative survey on Prolific (n=800) and compare the responses to a probabilistic survey conducted by the Pew Research Center (n=4272). We find that MTurk response quality has degraded over the last five years, and our results do not replicate the earlier finding about the generalizability of MTurk responses. By contrast, we find that data collected through Prolific is generally representative for questions about user perceptions and experiences, but not for questions about security and privacy knowledge. We also evaluate the impact of Prolific settings, attention check questions, and statistical methods on the external validity of online surveys, and we develop recommendations about best practices for conducting online privacy and security surveys. 
    more » « less
  4. The rapid growth in popularity and hype surrounding digital assets such as art, video, and music in the form of non-fungible tokens (NFTs) has made them a lucrative investment opportunity, with NFT-based sales surpassing $25B in 2021 alone. However, the volatility and general lack of technical understanding of the NFT ecosystem have led to the spread of various scams. The success of an NFT heavily depends on its online virality. As a result, creators use dedicated promotion services to drive engagement to their projects on social media websites, such as Twitter. However, these services are also utilized by scammers to promote fraudulent projects that attempt to steal users' cryptocurrency assets, thus posing a major threat to the ecosystem of NFT sales. In this paper, we conduct a longitudinal study of 439 promotion services (accounts) on Twitter that have collectively promoted 823 unique NFT projects through giveaway competitions over a period of two months. Our findings reveal that more than 36% of these projects were fraudulent, comprising of phishing, rug pull, and pre-mint scams. We also found that a majority of accounts engaging with these promotions (including those for fraudulent NFT projects) are bots that artificially inflate the popularity of the fraudulent NFT collections by increasing their likes, followers, and retweet counts. This manipulation results in significant engagement from real users, who then invest in these scams. We also identify several shortcomings in existing anti-scam measures, such as blocklists, browser protection tools, and domain hosting services, in detecting NFT-based scams. We utilize our findings to develop and open-source a machine learning classifier tool that was able to proactively detect 382 new fraudulent NFT projects on Twitter. 
    more » « less
  5. null (Ed.)
    Outreach and communication with the public have substantial value in polar research, in which studies often find changes of global importance that are happening far out of sight from the majority of people living at lower latitudes. Seeking evidence on the effectiveness of outreach programs, the U.S. National Science Foundation sponsored large-scale survey assessments before and after the International Polar Year in 2007/2008. Polar-knowledge questions have subsequently been tested and refined through other nationwide and regional surveys. More than a decade of such work has established that basic but fairly specific knowledge questions, with all answer choices sounding plausible but one being uniquely correct, can yield highly replicable results. Those results, however, paint a mixed picture of knowledge. Some factual questions seem to be interpreted by many respondents as if they had been asked for their personal beliefs about climate change, so their responses reflect sociopolitical identity rather than physical-world knowledge. Other factual questions, by design, do not link in obvious ways to climate-change beliefs—so responses have simpler interpretations in terms of knowledge gaps, and education needs. 
    more » « less