skip to main content


Title: fauci-email: a json digest of Anthony Fauci's released emails

We provide a processed JSON version of the 3234 page PDF document of Anthony Fauci's emails that were released in 2021 to provide a better understanding of the United States government response to the COVID-19 pandemic. The main JSON file contains a collection of 1289 email threads with 2761 emails among the threads, which includes 101 duplicate emails. For each email, we provide information about the sender, recipients, CC-list, subject, email body text, and email time stamp (when available). We also provide a number of derived datasets stored in individual JSON files: 5 different types of derived email networks, 1 email hypergraph, 1 temporal graph, and 3 tensors. Details for the data conversion process, the construction of the derived datasets, and subsequent analyses can all be found in an online technical report at https://arxiv.org/abs/2108.01239. Updated code for processing and analyzing the data can be found at https://github.com/nveldt/fauci-email.

Research additionally supported by ARO Award W911NF-19-1-0057, ARO MURI, and NSF CAREER Award IIS-2045555, as well as NSF awards CCF-1909528, IIS-2007481, and the Sloan Foundation. 
more » « less
Award ID(s):
2007481 1909528
NSF-PAR ID:
10351793
Author(s) / Creator(s):
; ;
Publisher / Repository:
Zenodo
Date Published:
Subject(s) / Keyword(s):
Anthony Fauci email COVID-19
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Communication tools such as email facilitate communication and collaboration between speakers of different languages, who use two primary strategies—English as a common language and machine translation (MT) tools—to help them overcome language barriers. However, each of these communication strategies creates its own challenges for cross-lingual communication. In this paper, we compare how people’s interpretations of an email sender’s social intention, and their evaluation of the email and the senders, differ when using a common language versus MT in email communication. We conducted an online experiment in which monolingual native English speakers read and rated request emails written by native English speakers, emails written by bilingual Chinese speakers in English, and emails written in Chinese then machine-translated into English. We found that participants interpreted the social intentions of the email sender less accurately for machine-translated emails than for emails written by non-native speakers in English. Participants also rated the senders and emails less positively overall for machine-translated emails compared to emails written by non-native speakers in English. Based on these findings, we suggest design possibilities that could better aid multilingual communication. 
    more » « less
  2. Phishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails received, they are not perfect and phishing remains one of the largest sources of security risk in technology and communication systems. To better understand the cognitive process that end users can use to identify phishing messages, I interviewed 21 IT experts about instances where they successfully identified emails as phishing in their own inboxes. IT experts naturally follow a three-stage process for identifying phishing emails. In the first stage, the email recipient tries to make sense of the email, and understand how it relates to other things in their life. As they do this, they notice discrepancies: little things that are ``off'' about the email. As the recipient notices more discrepancies, they feel a need for an alternative explanation for the email. At some point, some feature of the email --- usually, the presence of a link requesting an action --- triggers them to recognize that phishing is a possible alternative explanation. At this point, they become suspicious (stage two) and investigate the email by looking for technical details that can conclusively identify the email as phishing. Once they find such information, then they move to stage three and deal with the email by deleting it or reporting it. I discuss ways this process can fail, and implications for improving training of end users about phishing. 
    more » « less
  3. Phishing emails have certain characteristics, including wording related to urgency and unrealistic promises (i.e., “too good to be true”), that attempt to lure victims. To test whether these characteristics affected users’ suspiciousness of emails, users participated in a phishing judgment task in which we manipulated 1) email type (legitimate, phishing), 2) consequence amount (small, medium, large), 3) consequence type (gain, loss), and 4) urgency (present, absent). We predicted users would be most suspicious of phishing emails that were urgent and offered large gains. Results supporting the hypotheses indicate that users were more suspicious of phishing emails with a gain consequence type or large consequence amount. However, urgency was not a significant predictor of suspiciousness for phishing emails, but was for legitimate emails. These results have important cybersecurity-related implications for penetration testing and user training. 
    more » « less
  4. null (Ed.)
    Phishing is a serious challenge that remains largely unsolved despite the efforts of many researchers. In this paper, we present datasets and tools to help phishing researchers. First, we describe our efforts on creating high quality, diverse and representative email and URL/website datasets for phishing and making them publicly available. Second, we describe PhishBench, a benchmarking framework, which automates the extraction of more than 200 features, implements more than 30 classifiers, and 12 evaluation metrics, for detection of phishing emails, websites and URLs. Using PhishBench, the research community can easily run their models and benchmark their work against the work of others, who have used common dataset sources for emails (Nazario, SpamAssassin, WikiLeaks, etc.) and URLs (PhishTank, APWG, Alexa, etc.). 
    more » « less
  5. Workplace environments are characterized by frequent interruptions that can lead to stress. However, measures of stress due to interruptions are typically obtained through self-reports, which can be affected by memory and emotional biases. In this paper, we use a thermal imaging system to obtain objective measures of stress and investigate personality differences in contexts of high and low interruptions. Since a major source of workplace interruptions is email, we studied 63 participants while multitasking in a controlled office environment with two different email contexts: managing email in batch mode or with frequent interruptions. We discovered that people who score high in Neuroticism are significantly more stressed in batching environments than those low in Neuroticism. People who are more stressed finish emails faster. Last, using Linguistic Inquiry Word Count on the email text, we find that higher stressed people in multitasking environments use more anger in their emails. These findings help to disambiguate prior conflicting results on email batching and stress. 
    more » « less