Phishing emails have certain characteristics, including wording related to urgency and unrealistic promises (i.e., “too good to be true”), that attempt to lure victims. To test whether these characteristics affected users’ suspiciousness of emails, users participated in a phishing judgment task in which we manipulated 1) email type (legitimate, phishing), 2) consequence amount (small, medium, large), 3) consequence type (gain, loss), and 4) urgency (present, absent). We predicted users would be most suspicious of phishing emails that were urgent and offered large gains. Results supporting the hypotheses indicate that users were more suspicious of phishing emails with a gain consequence type or large consequence amount. However, urgency was not a significant predictor of suspiciousness for phishing emails, but was for legitimate emails. These results have important cybersecurity-related implications for penetration testing and user training.
more »
« less
Phishing Email Detection Using Robust NLP Techniques
Even with many successful phishing email detectors, phishing emails still cost businesses and individuals millions of dollars per year. Most of these models seem to ignore features like word count, stopword count, and punctuations; they use features like n-grams and part of speech tagging. Previous phishing email research ignores or removes the stopwords, and features relating to punctuation only count as a minor part of the detector. Even with a strong unconventional focus on features like word counts, stopwords, punctuation, and uniqueness factors, an ensemble learning model based on a linear kernel SVM gave a true positive rate of 83% and a true negative rate of 96%. Moreover, these features are robustly detected even in noisy email data. It is much easier to detect our features than correct part-of-speech tags or named entities in emails.
more »
« less
- Award ID(s):
- 1659755
- PAR ID:
- 10137609
- Date Published:
- Journal Name:
- IEEE International Conference on Data Mining Workshops
- Page Range / eLocation ID:
- 7 to 12
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Phishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails received, they are not perfect and phishing remains one of the largest sources of security risk in technology and communication systems. To better understand the cognitive process that end users can use to identify phishing messages, I interviewed 21 IT experts about instances where they successfully identified emails as phishing in their own inboxes. IT experts naturally follow a three-stage process for identifying phishing emails. In the first stage, the email recipient tries to make sense of the email, and understand how it relates to other things in their life. As they do this, they notice discrepancies: little things that are ``off'' about the email. As the recipient notices more discrepancies, they feel a need for an alternative explanation for the email. At some point, some feature of the email --- usually, the presence of a link requesting an action --- triggers them to recognize that phishing is a possible alternative explanation. At this point, they become suspicious (stage two) and investigate the email by looking for technical details that can conclusively identify the email as phishing. Once they find such information, then they move to stage three and deal with the email by deleting it or reporting it. I discuss ways this process can fail, and implications for improving training of end users about phishing.more » « less
-
Workplace environments are characterized by frequent interruptions that can lead to stress. However, measures of stress due to interruptions are typically obtained through self-reports, which can be affected by memory and emotional biases. In this paper, we use a thermal imaging system to obtain objective measures of stress and investigate personality differences in contexts of high and low interruptions. Since a major source of workplace interruptions is email, we studied 63 participants while multitasking in a controlled office environment with two different email contexts: managing email in batch mode or with frequent interruptions. We discovered that people who score high in Neuroticism are significantly more stressed in batching environments than those low in Neuroticism. People who are more stressed finish emails faster. Last, using Linguistic Inquiry Word Count on the email text, we find that higher stressed people in multitasking environments use more anger in their emails. These findings help to disambiguate prior conflicting results on email batching and stress.more » « less
-
null (Ed.)Phishing is a serious challenge that remains largely unsolved despite the efforts of many researchers. In this paper, we present datasets and tools to help phishing researchers. First, we describe our efforts on creating high quality, diverse and representative email and URL/website datasets for phishing and making them publicly available. Second, we describe PhishBench, a benchmarking framework, which automates the extraction of more than 200 features, implements more than 30 classifiers, and 12 evaluation metrics, for detection of phishing emails, websites and URLs. Using PhishBench, the research community can easily run their models and benchmark their work against the work of others, who have used common dataset sources for emails (Nazario, SpamAssassin, WikiLeaks, etc.) and URLs (PhishTank, APWG, Alexa, etc.).more » « less
-
null (Ed.)Phishing emails are scam communications that pretend to be something they are not in order to get people to take actions they otherwise would not. We surveyed a demographically matched sample of 297 people from across the United States and asked them to share their descriptions of a specific experience with a phishing email. Analyzing these experiences, we found that email users' experiences detecting phishing messages have many properties in common with how IT experts identify phishing. We also found that email users bring unique knowledge and valuable capabilities to this identification process that neither technical controls nor IT experts have. We suggest that targeting training toward how to use this uniqueness is likely to improve phishing prevention.more » « less