skip to main content

Title: Reading the Tea Leaves: A Comparative Analysis of Threat Intelligence
The term "threat intelligence" has swiftly become a staple buzzword in the computer security industry. The entirely reasonable premise is that, by compiling up-to-date information about known threats (i.e., IP addresses, domain names, file hashes, etc.), recipients of such information may be able to better defend their systems from future attacks. Thus, today a wide array of public and commercial sources distribute threat intelligence data feeds to support this purpose. However, our understanding of this data, its characterization and the extent to which it can meaningfully support its intended uses, is still quite limited. In this paper, we address these gaps by formally defining a set of metrics for characterizing threat intelligence data feeds and using these measures to systematically characterize a broad range of public and commercial sources. Further, we ground our quantitative assessments using external measurements to qualitatively investigate issues of coverage and accuracy. Unfortunately, our measurement results suggest that there are significant limitations and challenges in using existing threat intelligence data for its purported goals.
; ; ; ; ; ;
Award ID(s):
1705050 1629973
Publication Date:
Journal Name:
USENIX Security Symposium
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. BACKGROUND Electromagnetic (EM) waves underpin modern society in profound ways. They are used to carry information, enabling broadcast radio and television, mobile telecommunications, and ubiquitous access to data networks through Wi-Fi and form the backbone of our modern broadband internet through optical fibers. In fundamental physics, EM waves serve as an invaluable tool to probe objects from cosmic to atomic scales. For example, the Laser Interferometer Gravitational-Wave Observatory and atomic clocks, which are some of the most precise human-made instruments in the world, rely on EM waves to reach unprecedented accuracies. This has motivated decades of research to develop coherent EM sources over broad spectral ranges with impressive results: Frequencies in the range of tens of gigahertz (radio and microwave regimes) can readily be generated by electronic oscillators. Resonant tunneling diodes enable the generation of millimeter (mm) and terahertz (THz) waves, which span from tens of gigahertz to a few terahertz. At even higher frequencies, up to the petahertz level, which are usually defined as optical frequencies, coherent waves can be generated by solid-state and gas lasers. However, these approaches often suffer from narrow spectral bandwidths, because they usually rely on well-defined energy states of specific materials, which results inmore »a rather limited spectral coverage. To overcome this limitation, nonlinear frequency-mixing strategies have been developed. These approaches shift the complexity from the EM source to nonresonant-based material effects. Particularly in the optical regime, a wealth of materials exist that support effects that are suitable for frequency mixing. Over the past two decades, the idea of manipulating these materials to form guiding structures (waveguides) has provided improvements in efficiency, miniaturization, and production scale and cost and has been widely implemented for diverse applications. ADVANCES Lithium niobate, a crystal that was first grown in 1949, is a particularly attractive photonic material for frequency mixing because of its favorable material properties. Bulk lithium niobate crystals and weakly confining waveguides have been used for decades for accessing different parts of the EM spectrum, from gigahertz to petahertz frequencies. Now, this material is experiencing renewed interest owing to the commercial availability of thin-film lithium niobate (TFLN). This integrated photonic material platform enables tight mode confinement, which results in frequency-mixing efficiency improvements by orders of magnitude while at the same time offering additional degrees of freedom for engineering the optical properties by using approaches such as dispersion engineering. Importantly, the large refractive index contrast of TFLN enables, for the first time, the realization of lithium niobate–based photonic integrated circuits on a wafer scale. OUTLOOK The broad spectral coverage, ultralow power requirements, and flexibilities of lithium niobate photonics in EM wave generation provides a large toolset to explore new device functionalities. Furthermore, the adoption of lithium niobate–integrated photonics in foundries is a promising approach to miniaturize essential bench-top optical systems using wafer scale production. Heterogeneous integration of active materials with lithium niobate has the potential to create integrated photonic circuits with rich functionalities. Applications such as high-speed communications, scalable quantum computing, artificial intelligence and neuromorphic computing, and compact optical clocks for satellites and precision sensing are expected to particularly benefit from these advances and provide a wealth of opportunities for commercial exploration. Also, bulk crystals and weakly confining waveguides in lithium niobate are expected to keep playing a crucial role in the near future because of their advantages in high-power and loss-sensitive quantum optics applications. As such, lithium niobate photonics holds great promise for unlocking the EM spectrum and reshaping information technologies for our society in the future. Lithium niobate spectral coverage. The EM spectral range and processes for generating EM frequencies when using lithium niobate (LN) for frequency mixing. AO, acousto-optic; AOM, acousto-optic modulation; χ (2) , second-order nonlinearity; χ (3) , third-order nonlinearity; EO, electro-optic; EOM, electro-optic modulation; HHG, high-harmonic generation; IR, infrared; OFC, optical frequency comb; OPO, optical paramedic oscillator; OR, optical rectification; SCG, supercontinuum generation; SHG, second-harmonic generation; UV, ultraviolet.« less
  2. null (Ed.)
    The implementation of Internet of Things (IoT) devices in medical environments, has introduced a growing list of security vulnerabilities and threats. The lack of an extensible big data resource that captures medical device vulnerabilities limits the use of Artificial Intelligence (AI) based cyber defense systems in capturing, detecting, and preventing known and future attacks. We describe a system that generates a repository of Cyber Threat Intelligence (CTI) about various medical devices and their known vulnerabilities from sources such as manufacturer and ICS-CERT vulnerability alerts. We augment the intelligence repository with data sources such as Wikidata and public medical databases. The combined resources are integrated with threat intelligence in our Cybersecurity Knowledge Graph (CKG) from previous research. The augmented graph embeddings are useful in querying relevant information and can help in various AI assisted cybersecurity tasks. Given the integration of multiple resources, we found the augmented CKG produced higher quality graph representations. The augmented CKG produced a 31% increase in the Mean Average Precision (MAP) value, computed over an information retrieval task.
  3. One of the staples of network defense is blocking traffic to and from a list of "known bad" sites on the Internet. However, few organizations are in a position to produce such a list themselves, so pragmatically this approach depends on the existence of third-party "threat intelligence" providers who specialize in distributing feeds of unwelcome IP addresses. However, the choice to use such a strategy, let alone which data feeds are trusted for this purpose, is rarely made public and thus little is understood about the deployment of these techniques in the wild. To explore this issue, we have designed and implemented a technique to infer proactive traffic blocking on a remote host and, through a series of measurements, to associate that blocking with the use of particular IP blocklists. In a pilot study of 220K US hosts, we find as many as one fourth of the hosts appear to blocklist based on some source of threat intelligence data, and about 2% use one of the 9 particular third-party blocklists that we evaluated.
  4. Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools, and create methods for continuous integration of new information extracted from text.
  5. Aquatic environments encompass the world’s most extensive habitats, rich with sounds produced by a diversity of animals. Passive acoustic monitoring (PAM) is an increasingly accessible remote sensing technology that uses hydrophones to listen to the underwater world and represents an unprecedented, non-invasive method to monitor underwater environments. This information can assist in the delineation of biologically important areas via detection of sound-producing species or characterization of ecosystem type and condition, inferred from the acoustic properties of the local soundscape. At a time when worldwide biodiversity is in significant decline and underwater soundscapes are being altered as a result of anthropogenic impacts, there is a need to document, quantify, and understand biotic sound sources–potentially before they disappear. A significant step toward these goals is the development of a web-based, open-access platform that provides: (1) a reference library of known and unknown biological sound sources (by integrating and expanding existing libraries around the world); (2) a data repository portal for annotated and unannotated audio recordings of single sources and of soundscapes; (3) a training platform for artificial intelligence algorithms for signal detection and classification; and (4) a citizen science-based application for public users. Although individually, these resources are often met on regionalmore »and taxa-specific scales, many are not sustained and, collectively, an enduring global database with an integrated platform has not been realized. We discuss the benefits such a program can provide, previous calls for global data-sharing and reference libraries, and the challenges that need to be overcome to bring together bio- and ecoacousticians, bioinformaticians, propagation experts, web engineers, and signal processing specialists (e.g., artificial intelligence) with the necessary support and funding to build a sustainable and scalable platform that could address the needs of all contributors and stakeholders into the future.« less