skip to main content


Title: Who's Calling? Characterizing Robocalls through Audio and Metadata Analysis.
Unsolicited calls are one of the most prominent security issues facing individuals today. Despite wide-spread anecdotal discussion of the problem, many important questions remain unanswered. In this paper, we present the first large-scale, longitudinal analysis of unsolicited calls to a honeypot of up to 66,606 lines over 11 months. From call metadata we characterize the long-term trends of unsolicited calls, develop the first techniques to measure voicemail spam, wangiri attacks, and identify unexplained high-volume call incidences. Additionally, we mechanically answer a subset of the call attempts we receive to cluster related calls into operational campaigns, allowing us to characterize how these campaigns use telephone numbers. Critically, we find no evidence that answering unsolicited calls increases the amount of unsolicited calls received, overturning popular wisdom. We also find that we can reliably isolate individual call campaigns, in the process revealing the extent of two distinct Social Security scams while empirically demonstrating the majority of campaigns rarely reuse phone numbers. These analyses comprise powerful new tools and perspectives for researchers, investigators, and a beleaguered public.  more » « less
Award ID(s):
1849994
NSF-PAR ID:
10226592
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 2020 USENIX Security Symposium
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Unsolicited bulk telephone calls — termed "robocalls" — nearly outnumber legitimate calls, overwhelming telephone users. While the vast majority of these calls are illegal, they are also ephemeral. Although telephone service providers, regulators, and researchers have ready access to call metadata, they do not have tools to investigate call content at the vast scale required. This paper presents SnorCall, a framework that scalably and efficiently extracts content from robocalls. SnorCall leverages the Snorkel framework that allows a domain expert to write simple labeling functions to classify text with high accuracy. We apply SnorCall to a corpus of transcripts covering 232,723 robocalls collected over a 23-month period. Among many other findings, SnorCall enables us to obtain first estimates on how prevalent different scam and legitimate robocall topics are, determine which organizations are referenced in these calls, estimate the average amounts solicited in scam calls, identify shared infrastructure between campaigns, and monitor the rise and fall of election-related political calls. As a result, we demonstrate how regulators, carriers, anti-robocall product vendors, and researchers can use SnorCall to obtain powerful and accurate analyses of robocall content and trends that can lead to better defenses. 
    more » « less
  2. null (Ed.)
    Abstract—System call checking is extensively used to protect the operating system kernel from user attacks. However, existing solutions such as Seccomp execute lengthy rule-based checking programs against system calls and their arguments, leading to substantial execution overhead. To minimize checking overhead, this paper proposes Draco, a new architecture that caches system call IDs and argument values after they have been checked and validated. System calls are first looked-up in a special cache and, on a hit, skip all checks. We present both a software and a hardware implementation of Draco. The latter introduces a System Call Lookaside Buffer (SLB) to keep recently-validated system calls, and a System Call Target Buffer to preload the SLB in advance. In our evaluation, we find that the average execution time of macro and micro benchmarks with conventional Seccomp checking is 1.14_ and 1.25_ higher, respectively, than on an insecure baseline that performs no security checks. With our software Draco, the average execution time reduces to 1.10_ and 1.18_ higher, respectively, than on the insecure baseline. With our hardware Draco, the execution time is within 1% of the insecure baseline. 
    more » « less
  3. The increasingly sophisticated Android malware calls for new defensive techniques that are capable of protecting mobile users against novel threats. In this paper, we first extract the runtime Application Programming Interface (API) call sequences from Android apps, and then analyze higher-level semantic relations within the ecosystem to comprehensively characterize the apps. To model different types of entities (i.e., app, API, device, signature, affiliation) and rich relations among them, we present a structured heterogeneous graph (HG) for modeling. To efficiently classify nodes (e.g., apps) in the constructed HG, we propose the HG-Learning method to first obtain in-sample node embeddings and then learn representations of out-of-sample nodes without rerunning/adjusting HG embeddings at the first attempt. We later design a deep neural network classifier taking the learned HG representations as inputs for real-time Android malware detection. Comprehensive experiments on large-scale and real sample collections from Tencent Security Lab are performed to compare various baselines. Promising results demonstrate that our developed system AiDroid which integrates our proposed method outperforms others in real-time Android malware detection. 
    more » « less
  4. null (Ed.)
    e present a novel AI-based methodology that identifies phases of a host-level cyber attack simply from system call logs. System calls emanating from cyber attacks on hosts such as honey pots are often recorded in audit logs. Our methodology first involves efficiently loading, caching, processing, and querying system events contained in audit logs in support of computer forensics. Output of queries remains at the system call level and is difficult to process. The next step is to infer a sequence of abstracted actions, which we colloquially call a storyline, from the system calls given as observations to a latent-state probabilistic model. These storylines are then accurately identified with class labels using a learned classifier. We qualitatively and quantitatively evaluate methods and models for each step of the methodology using 114 different attack phases collected by logging the attacks of a red team on a server, on some likely benign sequences containing regular user activities, and on traces from a recent DARPA project. The resulting end-to-end system, which we call Cyberian, identifies the attack phases with a high level of accuracy illustrating the benefit that this machine learning-based methodology brings to security forensics. 
    more » « less
  5. Tor M. Aamodt ; Natalie D. Enright Jerger ; Michael M. Swift (Ed.)
    System calls are a critical building block in many serious security attacks, such as control-flow hijacking and privilege escalation attacks. Security-sensitive system calls (e.g., execve, mprotect), especially play a major role in completing attacks. Yet, few defense efforts focus to ensure their legitimate usage, allowing attackers to maliciously leverage system calls in attacks. In this paper, we propose a novel System Call Integrity, which enforces the correct use of system calls throughout runtime. We propose three new contexts enforcing (1) which system call is called and how it is invoked (Call Type), (2) how a system call is reached (Control Flow), and (3) that arguments are not corrupted (Argument Integrity). Our defense mechanism thwarts attacks by breaking the critical building block in their attack chains. We implement Bastion, as a compiler and runtime monitor system, to demonstrate the efficacy of the three system call contexts. Our security case study shows that Bastion can effectively stop all the attacks including real-world exploits and recent advanced attack strategies. Deploying Bastion on three popular system call-intensive programs, NGINX, SQLite, and vsFTPd, we show Bastion is secure and practical, demonstrating overhead of 0.60%, 2.01%, and 1.65%, respectively 
    more » « less