skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: C2Store: C2 Server Profiles at Your Fingertips
How can we build a definitive capability for tracking C2 servers? Having a large-scale continuously updating capability would be essential for understanding the spatiotemporal behaviors of C2 servers and, ultimately, for helping contain botnet activities. Unfortunately, existing information from threat intelligence feeds and previous works is often limited to a specific set of botnet families or short-term data collections. Responding to this need, we present C2Store, an initiative to provide the most comprehensive information on C2 servers. Our work makes the following contributions: (a) we develop techniques to collect, verify, and combine C2 server addresses from five types of sources, including uncommon platforms, such as GitHub and Twitter; (b) we create an open-access annotated database of 335,967 C2 servers across 133 malware families, which supports semantically-rich and smart queries; (c) we identify surprising behaviors of C2 servers with respect to their spatiotemporal patterns and behaviors. First, we successfully mine Twitter and GitHub and identify C2 servers with a precision of 97% and 94%, respectively. Furthermore, we find that the threat feeds identify only 24% of the servers in our database, with Twitter and GitHub providing 32%. A surprising observation is the identification of 250 IP addresses, each of which hosts more than 5 C2 servers for different botnet families at the same time. Overall, we envision C2Store as an ongoing effort that will facilitate research by providing timely, historical, and comprehensive C2 server information by critically combining multiple sources of information.  more » « less
Award ID(s):
2132642
PAR ID:
10650531
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Networking
Volume:
1
Issue:
CoNEXT3
ISSN:
2834-5509
Page Range / eLocation ID:
1 to 21
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Successful malware campaigns often rely on the ability of infected hosts to locate and contact their command-and-control (C2) servers. Malware campaigns often use DNS domains for this purpose, but DNS domains may be taken down by the registrar that sold them. In response to this threat, malware operators have begun using blockchain-based naming systems to store C2 server names. Blockchain naming systems are a threat to malware defenders because they are not subject to a centralized authority, such as a registrar, that can take down abused domains, either voluntarily or under legal pressure. In fact, blockchains are robust against a variety of interventions that work on DNS domains, which is bad news for defenders. We analyze the ecosystem of blockchain naming systems and identify new locations for defenders to stage interventions against malware. In particular, we find that malware is obligated to use centralized or semi-centralized infrastructure to connect to blockchain naming systems and modify the records stored within. In fact, scattered interventions have already been staged against this centralized infrastructure: we present case studies of several such instances. We also present a study of how blockchain naming systems are currently abused by malware operators, and discuss the factors that would cause a blockchain naming system to become an unstoppable threat. We conclude that existing blockchain naming systems still provide opportunities for defenders to prevent malware from contacting its C2 servers. 
    more » « less
  2. The term "threat intelligence" has swiftly become a staple buzzword in the computer security industry. The entirely reasonable premise is that, by compiling up-to-date information about known threats (i.e., IP addresses, domain names, file hashes, etc.), recipients of such information may be able to better defend their systems from future attacks. Thus, today a wide array of public and commercial sources distribute threat intelligence data feeds to support this purpose. However, our understanding of this data, its characterization and the extent to which it can meaningfully support its intended uses, is still quite limited. In this paper, we address these gaps by formally defining a set of metrics for characterizing threat intelligence data feeds and using these measures to systematically characterize a broad range of public and commercial sources. Further, we ground our quantitative assessments using external measurements to qualitatively investigate issues of coverage and accuracy. Unfortunately, our measurement results suggest that there are significant limitations and challenges in using existing threat intelligence data for its purported goals. 
    more » « less
  3. The constant and rapid evolution of technology has led to some amazing achievements. Normal people can communicate with others across the globe, relatively cheap Internet of Things (IoT) devices can be used to secure homes, track fitness and health, control appliances, etc., many people have access to a seemingly endless wealth of information in small devices in their pockets, organizations can provide high availability for important services by spinning up/down servers in minutes to scale with demand through cloud services, etc. However, not everyone who uses these technologies does so with a pure heart and good intentions, many people use them to commit or help commit crimes. A nefarious individual may use cloud services to host a highly available Command and Control (C2) server, a messaging app to form and communicate with a gang or hacking group, or IoT devices as part of a botnet designed to perform Distributed Denial of Service (DDoS) attacks. When these technologies are used in the commission of a crime, they hold valuable information that needs to be recovered forensically to use as evidence to convict the perpetrators. Unfortunately, that ever-evolving technology poses many challenges for digital forensics. This paper identifies and presents many of the challenges faced in digital forensics involving mobile devices, IoT devices, and cloud services in addition to proposing a framework for solving the IoT Forensic Data Analysis problem. 
    more » « less
  4. This paper introduces protocols for authenticated private information retrieval. These schemes enable a client to fetch a record from a remote database server such that (a) the server does not learn which record the client reads, and (b) the client either obtains the "authentic" record or detects server misbehavior and safely aborts. Both properties are crucial for many applications. Standard private-information-retrieval schemes either do not ensure this form of output authenticity, or they require multiple database replicas with an honest majority. In contrast, we offer multi-server schemes that protect security as long as at least one server is honest. Moreover, if the client can obtain a short digest of the database out of band, then our schemes require only a single server. Performing an authenticated private PGP-public-key lookup on an OpenPGP key server's database of 3.5 million keys (3 GiB), using two non-colluding servers, takes under 1.2 core-seconds of computation, essentially matching the time taken by unauthenticated private information retrieval. Our authenticated single-server schemes are 30-100 more costly than state-of-the-art unauthenticated single-server schemes, though they achieve incomparably stronger integrity properties. 
    more » « less
  5. Social media, especially Twitter, has always been a part of the professional lives of software developers, with prior work reporting on a diversity of usage scenarios, including sharing information, staying current, and promoting one’s work. However, previous studies of Twitter use by software developers typically lack information about activities of the study subjects (and their outcomes) on other platforms. To enable such future research, in this paper we propose a computational approach to cross-link users across Twitter and GitHub, revealing (at least) 70,427 users active on both. As a preliminary analysis of this dataset, we report on a case study of 786 tweets by open-source developers about GitHub work, combining automatic characterization of tweet authors in terms of their relationship to the GitHub items linked in their tweets with qualitative analysis of the tweet contents. We find that different developer roles tend to have different tweeting behaviors, with repository owners being perhaps the most distinctive group compared to other project contributors and followers. We also note a sizeable group of people who follow others on GitHub and tweet about these people’s work, but do not otherwise contribute to those open-source projects. Our results and public dataset open up multiple future research directions. 
    more » « less