skip to main content


Title: Intrusion-Tolerant and Confidentiality-Preserving Publish/Subscribe Messaging
We present Chios, an intrusion-tolerant publish/subscribe system which protects against Byzantine failures. Chios is the first publish/subscribe system achieving decentralized confidentiality with fine-grained access control and strong publication order guarantees. This is in contrast to existing publish/subscribe systems achieving much weaker security and reliability properties. Chios is flexible and modular, consisting of four fully-fledged publish/subscribe configurations (each designed to meet different goals). We have deployed and evaluated our system on Amazon EC2. We compare Chios with various publish/subscribe systems. Chios is as efficient as an unreplicated, single-broker publish/subscribe implementation, only marginally slower than Kafka and Kafka with passive replication, and at least an order of magnitude faster than all Hyperledger Fabric modules and publish/subscribe systems using Fabric.  more » « less
Award ID(s):
1919159
NSF-PAR ID:
10272155
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2020 International Symposium on Reliable Distributed Systems (SRDS)
Page Range / eLocation ID:
319 to 328
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. OpenMSIStream provides seamless connection of scientific data stores with streaming infrastructure to allow researchers to leverage the power of decoupled, real-time data streaming architectures. Data streaming is the process of transmitting, ingesting, and processing data continuously rather than in batches. Access to streaming data has revolutionized many industries in the past decade and created entirely new standards of practice and types of analytics. While not yet commonly used in scientific research, data streaming has the potential to become a key technology to drive rapid advances in scientific data collection (e.g., Brookhaven National Lab (2022)). This paucity of streaming infrastructures linking complex scientific systems is due to a lack of tools that facilitate streaming in the diverse and distributed systems common in modern research. OpenMSIStream closes this gap between underlying streaming systems and common scientific infrastructure. Closing this gap empowers novel streaming applications for scientific data including automation of data curation, reduction, and analysis; real-time experiment monitoring and control; and flexible deployment of AI/ML to guide autonomous research. Streaming data generally refers to data continuously generated from multiple sources and passed in small packets (termed messages). Streaming data messages are typically organized in groups called topics and persist for periods of time conducive to processing for multiple uses either sequentially or in small groups. The resulting flows of raw data, metadata, and processing results form “ecosystems” that automate varied data-driven tasks. A strength of data streaming ecosystems is the use of publish-subscribe (“pub/sub”) messaging backbones that decouple data senders (publishers) and recipients (subscribers). Popular message-focused middleware solutions such as RabbitMQ (VMware, 2022), Apache Pulsar (Apache Software Foundation, 2022b), and Apache Kafka (Apache Software Foundation, 2022a) all provide differing capabilities as backbones. OpenMSIStream provides robust and efficient, yet easy, access to the rich data streaming systems of Apache Kafka. 
    more » « less
  2. Applications and middleware services, such as data placement engines, I/O scheduling, and prefetching engines, require low-latency access to telemetry data in order to make optimal decisions. However, typical monitoring services store their telemetry data in a database in order to allow applications to query them, resulting in significant latency penalties. This work presents Apollo: a low-latency monitoring service that aims to provide applications and middleware libraries with direct access to relational telemetry data. Monitoring the system can create interference and overhead, slowing down raw performance of the resources for the job. However, having a current view of the system can aid middleware services in making more optimal decisions which can ultimately improve the overall performance. Apollo has been designed from the ground up to provide low latency, using Publish–Subscribe (Pub-Sub) semantics, and low overhead, using adaptive intervals in order to change the length of time between polling the resource for telemetry data and machine learning in order to predict changes to the telemetry data between actual resource polling. This work also provides some high level abstractions called I/O curators, which can further aid middleware libraries and applications to make optimal decisions. Evaluations showcase that Apollo can achieve sub-millisecond latency for acquiring complex insights with a memory overhead of ~57MB and CPU overhead being only 7% more than existing state-of-the-art systems. 
    more » « less
  3. null (Ed.)
    Autonomous vehicle (AV) software systems are emerging to enable rapidly developed self-driving functionalities. Since such systems are responsible for safety-critical decisions, it is necessary to secure them in face of cyber attacks. Through an empirical study of representative AV software systems Baidu Apollo and Autoware, we discover a common over-privilege problem with the publish-subscribe communication model widely adopted by AV systems: due to the coarse-grained message design for the publish-subscribe communication, some message fields are over-granted with publish/subscribe permissions. To comply with the least-privilege principle and reduce the attack surface resulting from such problem, we argue that the publish/subscribe permissions should be defined and enforced at the granularity of message fields instead of messages. To systematically address such publish-subscribe over-privilege problems, we present AVGuardian, a system that includes (1) a static analysis tool that detects over-privilege instances in AV software and generates the corresponding access control policies at the message field granularity, and (2) a low-overhead, module-transparent, runtime publish/subscribe permission policy enforcement mechanism to perform online policy violation detection and prevention. Using our detection tool, we are able to automatically detect 581 over-privilege instances in total in Baidu Apollo. To demonstrate the severity, we further constructed several concrete exploits that can lead to vehicle collision and identity theft for AV owners, which have been reported to Baidu Apollo and confirmed as valid. For defense, we prototype and evaluate the policy enforcement mechanism, and find that it has very low overhead, does not affect original AV decision logic, and also is resilient to message replay attacks. 
    more » « less
  4. Power grids are undergoing major changes due to rapid growth in renewable energy and improvements in battery technology. Prompted by the increasing complexity of power systems, decentralized IoT solutions are emerging, which arrange local communities into transactive microgrids. The core functionality of these solutions is to provide mechanisms for matching producers with consumers while ensuring system safety. However, there are multiple challenges that these solutions still face: privacy, trust, and resilience. The privacy challenge arises because the time series of production and consumption data for each participant is sensitive and may be used to infer personal information. Trust is an issue because a producer or consumer can renege on the promised energy transfer. Providing resilience is challenging due to the possibility of failures in the infrastructure that is required to support these market based solutions. In this paper, we develop a rigorous solution for transactive microgrids that addresses all three challenges by providing an innovative combination of MILP solvers, smart contracts, and publish-subscribe middleware within a framework of a novel distributed application platform, called Resilient Information Architecture Platform for Smart Grid. Towards this purpose, we describe the key architectural concepts, including fault tolerance, and show the trade-off between market efficiency and resource requirements. 
    more » « less
  5. null (Ed.)
    Graph-based namespaces are being increasingly used to represent the organization of complex and ever-growing information eco-systems and individual user roles. Timely and accurate information dissemination requires an architecture with appropriate naming frameworks, adaptable to changing roles, focused on content rather than network addresses. Today's complex information organization structures make such dissemination very challenging. To address this, we propose POISE, a name-based publish/subscribe architecture for efficient topic-based and recipient-based content dissemination. POISE proposes an information layer, improving on state-of-the-art Information-Centric Networking solutions in two major ways: 1) support for complex graph-based namespaces, and 2) automatic name-based load-splitting. POISE supports in-network graph-based naming, leveraged in a dissemination protocol which exploits information layer rendezvous points (RPs) that perform name expansions. For improved robustness and scalability, POISE supports adaptive load-sharing via multiple RPs, each managing a dynamically chosen subset of the namespace graph. Excessive workload may cause one RP to turn into a ``hot spot'', impeding performance and reliability. To eliminate such traffic concentration, we propose an automated load-splitting mechanism, consisting of an enhanced, namespace graph partitioning complemented by a seamless, loss-less core migration procedure. Due to the nature of our graph partitioning and its complex objectives, off-the-shelf graph partitioning, e.g., METIS, is inadequate. We propose a hybrid, iterative bi-partitioning solution, consisting of an initial and a refinement phase. We also implemented POISE on a DPDK-based platform. Using the important application of emergency response, our experimental results show that POISE outperforms state-of-the-art solutions, demonstrating its effectiveness in timely delivery and load-sharing. 
    more » « less