skip to main content

Title: A comparison of classifiers and features for authorship authentication of social networking messages

This paper develops algorithms and investigates various classifiers to determine the authenticity of short social network postings, an average of 20.6 words, from Facebook. This paper presents and discusses several experiments using a variety of classifiers. The goal of this research is to determine the degree to which such postings can be authenticated as coming from the purported user and not from an intruder. Various sets of stylometry and ad hoc social networking features were developed to categorize 9259 posts from 30 Facebook authors as authentic or non‐authentic. An algorithm to utilize machine‐learning classifiers for investigating this problem is described, and an additional voting algorithm that combines three classifiers is investigated. This research is one of the first works that focused on authorship authentication in short messages, such as postings on social network sites. The challenges of applying traditional stylometry techniques on short messages are discussed. Experimental results demonstrate an average accuracy rate of 79.6% among 30 users. Further empirical analyses evaluate the effect of sample size, feature selection, user writing style, and classification method on authorship authentication, indicating varying degrees of success compared with previous studies. Copyright © 2016 John Wiley & Sons, Ltd.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Concurrency and Computation: Practice and Experience
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Facebook has become an important part of our daily life. From knowing the status of our relatives, showing off a new car, to connecting with a high school classmate, abundant personally identifiable information (PII) are made visible to others by posts, images and news. However, this free flow of information has also created significant cyber-security challenges that make us vulnerable to social engineering and cyber crimes. To confront these challenges, we propose a new behavioral biometric that verifies a user based on his or her widget interaction behavior when using Facebook. Specifically, we monitor activities on the user’s Facebook account using our own logging software and verify the user’s claimed identity by binary classifiers trained with two algorithms (SVM-rbf and the GBM– Gradient Boosting Machines). Our novel dataset consists of eight users over a month of data collection with an average of 2.95k rows of data per user. We convert these activities data into meaningful features such as day-of-week, hour-of-day, and widget types and duration of mouse staying on a widget. The performance shows that our novel widget interaction modality is promising for authentication. The SVM-rbf classifiers achieve a mean Equal Error Rate (EER) and mean Accuracy (ACC) of 3.91% and 97.79%, while the GBM classifiers a mean EER and ACC of 2.76% and 97.88%, respectively. In addition, we perform an ablation study to understand the impact of individual features on authentication performance. The importance of features are ranked in the descending order of hour-of-day, day-of-week, and widget types and duration. 
    more » « less
  2. Social media platforms provide users with various ways of interacting with each other, such as commenting, reacting to posts, sharing content, and uploading pictures. Facebook is one of the most popular platforms, and its users frequently share and reshare posts, including research articles. Moreover, the reactions feature on Facebook allows users to express their feelings towards the content they view, providing valuable data for analysis. This study aims to predict the emotional impact of Facebook posts relating to research articles. We collected data on Facebook posts related to various scientific research domains, including Health Sciences, Social Sciences, Dentistry, Arts, and Humanities. We observed Facebook users’ reactions towards research articles and posts and found that ‘Like’ reactions were the most common. We also noticed that research articles from the Dentistry research domain received a lot of ‘Haha’ reactions. We used machine learning models to predict the sentiment of Facebook posts related to research articles. We used features such as the research article’s title sentiment, abstract sentiment, abstract length, author count, and research domain to build the models. We used five classifiers: Random Forest, Decision Tree, K-Nearest Neighbors, Logistic Regression, and Naïve Bayes. The models were evaluated using accuracy, precision, recall, and F-1 score metrics. The Random Forest classifier was the best model for two- and three-class labels, achieving accuracy measures of 86% and 66%, respectively. We also evaluated the feature importance for the Random Forest model and found that the sentiment of the research article’s title is crucial in predicting the sentiment of the Facebook post. This study has substantial implications for public engagement in science-related messages. The emotional reactions of Facebook users towards research articles and posts can provide valuable insights into public engagement in science, and predicting the emotional impact of Facebook posts related to research articles can help researchers understand how the public perceives scientific research. The findings of the study can aid researchers in effectively communicating their research and engaging the public in scientific discourse. 
    more » « less
  3. To enhance the usability of password authentication, typo-tolerant password authentication schemes permit certain deviations in the user-supplied password, to account for common typographical errors yet still allow the user to successfully log in. In prior work, analysis by Chatterjee et al. demonstrated that typo-tolerance indeed notably improves password usability, yet (surprisingly) does not appear to significantly degrade authentication security. In practice, major web services such as Facebook have employed typo-tolerant password authentication systems. In this paper, we revisit the security impact of typo-tolerant password authentication. We observe that the existing security analysis of such systems considers only password spraying attacks. However, this threat model is incomplete, as password authentication systems must also contend with credential stuffing and tweaking attacks. Factoring in these missing attack vectors, we empirically re-evaluate the security impact of password typo-tolerance using password leak datasets, discovering a significantly larger degradation in security. To mitigate this issue, we explore machine learning classifiers that predict when a password's security is likely affected by typo-tolerance. Our resulting models offer various suitable operating points on the functionality-security tradeoff spectrum, ultimately allowing for partial deployment of typo-tolerant password authentication, preserving its functionality for many users while reducing the security risks. 
    more » « less
  4. Abstract

    Social media platforms like Twitter and Facebook provide risk communicators with the opportunity to quickly reach their constituents at the time of an emerging infectious disease. On these platforms, messages gain exposure through message passing (called “sharing” on Facebook and “retweeting” on Twitter). This raises the question of how to optimize risk messages for diffusion across networks and, as a result, increase message exposure. In this study we add to this growing body of research by identifying message‐level strategies to increase message passing during high‐ambiguity events. In addition, we draw on the extended parallel process model to examine how threat and efficacy information influence the passing of Zika risk messages. In August 2016, we collected 1,409 Twitter messages about Zika sent by U.S. public health agencies’ accounts. Using content analysis methods, we identified intrinsic message features and then analyzed the influence of those features, the account sending the message, the network surrounding the account, and the saliency of Zika as a topic, using negative binomial regression. The results suggest that severity and efficacy information increase how frequently messages get passed on to others. Drawing on the results of this study, previous research on message passing, and diffusion theories, we identify a framework for risk communication on social media. This framework includes four key variables that influence message passing and identifies a core set of message strategies, including message timing, to increase exposure to risk messages on social media during high‐ambiguity events.

    more » « less
  5. Utilization of Internet in everyday life has made us vulnerable in terms of security and privacy of our data and systems. For example, large-scale data breaches have occurred at Yahoo and Equifax because of lacking of robust and secure data protection within systems. Therefore, it is imperative to find solutions to further boost data security and protect privacy of our systems. To this end, we propose to authenticate users by utilizing score-level fusions based on mouse dynamics (e.g., mouse movement on a screen) and widget interactions (e.g., when clicking or hovering over different icons on a screen) on two novel datasets. In this study, we focus on two common applications, PayPal (a money transaction website) and Facebook (a social media platform). Though we fuse the same modalities for both applications, the purpose of investigating PayPal is to demonstrate how we can authenticate users when the users interact with the app for only a short period of time, while the purpose of investigating Facebook is to authenticate users based on social media browsing activities. We have a total of 10 users for PayPal with an average of 12 minutes of data per user and a total of 15 users for Facebook with an average of 2 hours of data per user. By fusing a single mouse trajectory with the associated widget interactions that occur during the trajectory, our mean EERs (Equal Error Rates) with a score-level fusion of mouse dynamics and widget interactions are 7.64% (SVM-rbf) and 3.25% (GBM), for PayPal, and 5.49% (SVM-rbf) and 2.54% (GBM), for Facebook. To further improve the performance of our fusion, we combine decision scores from multiple consecutive trajectories, which yields a 0% mean EER after 11 decision scores across all the users for both PayPal and Facebook. 
    more » « less