skip to main content

Title: Resilient distributed state estimation with mobile agents: overcoming Byzantine adversaries, communication losses, and intermittent measurements
Applications in environmental monitoring, surveillance and patrolling typically require a network of mobile agents to collectively gain information regarding the state of a static or dynamical process evolving over a region. However, these networks of mobile agents also introduce various challenges, including intermittent observations of the dynamical process, loss of communication links due to mobility and packet drops, and the potential for malicious or faulty behavior by some of the agents. The main contribution of this paper is the development of resilient, fully-distributed, and provably correct state estimation algorithms that simultaneously account for each of the above considerations, and in turn, offer a general framework for reasoning about state estimation problems in dynamic, failure-prone and adversarial environments. Specifically, we develop a simple switched linear observer for dealing with the issue of time-varying measurement models, and resilient filtering techniques for dealing with worst-case adversarial behavior subject to time-varying communication patterns among the agents. Our approach considers both communication patterns that recur in a deterministic manner, and patterns that are induced by random packet drops. For each scenario, we identify conditions on the dynamical system, the patrols, the nominal communication network topology, and the failure models that guarantee applicability of our proposed techniques. Finally, we complement our theoretical results with detailed simulations that illustrate the efficacy of our algorithms in the presence of the technical challenges described above.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Autonomous Robots
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The conventional machine learning (ML) and deep learning (DL) methods use large amount of data to construct desirable prediction models in a central fusion center for recognizing human activities. However, such model training encounters high communication costs and leads to privacy infringement. To address the issues of high communication overhead and privacy leakage, we employed a widely popular distributed ML technique called Federated Learning (FL) that generates a global model for predicting human activities by combining participated agents’ local knowledge. The state-of-the-art FL model fails to maintain acceptable accuracy when there is a large number of unreliable agents who can infuse false model, or, resource-constrained agents that fails to perform an assigned computational task within a given time window. We developed an FL model for predicting human activities by monitoring agent’s contributions towards model convergence and avoiding the unreliable and resource-constrained agents from training. We assign a score to each client when it joins in a network and the score is updated based on the agent’s activities during training. We consider three mobile robots as FL clients that are heterogeneous in terms of their resources such as processing capability, memory, bandwidth, battery-life and data volume. We consider heterogeneous mobile robots for understanding the effects of real-world FL setting in presence of resource-constrained agents. We consider an agent unreliable if it repeatedly gives slow response or infuses incorrect models during training. By disregarding the unreliable and weak agents, we carry-out the local training of the FL process on selected agents. If somehow, a weak agent is selected and started showing straggler issues, we leverage asynchronous FL mechanism that aggregate the local models whenever it receives a model update from the agents. Asynchronous FL eliminates the issue of waiting for a long time to receive model updates from the weak agents. To the end, we simulate how we can track the behavior of the agents through a reward-punishment scheme and present the influence of unreliable and resource-constrained agents in the FL process. We found that FL performs slightly worse than centralized models, if there is no unreliable and resource-constrained agent. However, as the number of malicious and straggler clients increases, our proposed model performs more effectively by identifying and avoiding those agents while recognizing human activities as compared to the stateof-the-art FL and ML approaches. 
    more » « less
  2. Packet drops caused by congestion are a fundamental problem in network operation. Yet, it is difficult to detect where drops are happening, let alone which flows are most affected. Detecting the small-timescale drops caused by short bursts of traffic is even more challenging, and traditional monitoring techniques can easily miss them. To uncover packet drops as they occur inside a switch, the analysis must be real-time, fine-grained, and efficient. However, modern switches have distributed packet-processing pipelines that see either the arriving or departing traffic, but not the packet drops. Plus, they do not have enough memory to store per-flow state. Our MIDST system addresses these challenges through a distributed compact data structure with lightweight coordination between ingress and egress pipelines. MIDST identifies the flows experiencing loss, as well as the bursty flows responsible, across different burst durations. Our evaluation with real-world traces and TCP connections shows that MIDST uses little memory (e.g., 320KB) while providing high accuracy (95% to 98%) under varying loss rates and burst durations. We evaluate a low-rate DDoS attack and demonstrate the potential use of our measurement results for attack detection and mitigation. 
    more » « less
  3. We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluate Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale. 
    more » « less
  4. Abstract

    Context.Large multi-site neuroimaging datasets have significantly advanced our quest to understand brain-behavior relationships and to develop biomarkers of psychiatric and neurodegenerative disorders. Yet, such data collections come at a cost, as the inevitable differences across samples may lead to biased or erroneous conclusions.Objective.We aim to validate the estimation of individual brain network dynamics fingerprints and appraise sources of variability in large resting-state functional magnetic resonance imaging (rs-fMRI) datasets by providing a novel point of view based on data-driven dynamical models.Approach.Previous work has investigated this critical issue in terms of effects on static measures, such as functional connectivity and brain parcellations. Here, we utilize dynamical models (hidden Markov models—HMM) to examine how diverse scanning factors in multi-site fMRI recordings affect our ability to infer the brain’s spatiotemporal wandering between large-scale networks of activity. Specifically, we leverage a stable HMM trained on the Human Connectome Project (homogeneous) dataset, which we then apply to an heterogeneous dataset of traveling subjects scanned under a multitude of conditions.Main Results.Building upon this premise, we first replicate previous work on the emergence of non-random sequences of brain states. We next highlight how these time-varying brain activity patterns are robust subject-specific fingerprints. Finally, we suggest these fingerprints may be used to assess which scanning factors induce high variability in the data.Significance.These results demonstrate that we can (i) use large scale dataset to train models that can be then used to interrogate subject-specific data, (ii) recover the unique trajectories of brain activity changes in each individual, but also (iii) urge caution as our ability to infer such patterns is affected by how, where and when we do so.

    more » « less
  5. Wireless networks are being applied in various industrial sectors, and they are posed to support mission-critical industrial IoT applications which require ultra-reliable, low-latency communications (URLLC). Ensuring predictable per-packet communication reliability is a basis of predictable URLLC, and scheduling and power control are two basic enablers. Scheduling and power control, however, are subject to challenges such as harsh environments, dynamic channels, and distributed network settings in industrial IoT. Existing solutions are mostly based on heuristic algorithms or asymptotic analysis of network performance, and there lack field-deployable algorithms for ensuring predictable per-packet reliability. Towards addressing the gap, we examine the cross-layer design of joint scheduling and power control and analyze the associated challenges. We introduce the Perron–Frobenius theorem to demonstrate that scheduling is a must for ensuring predictable communication reliability, and by investigating characteristics of interference matrices, we show that scheduling with close-by links silent effectively constructs a set of links whose required reliability is feasible with proper transmission power control. Given that scheduling alone is unable to ensure predictable communication reliability while ensuring high throughput and addressing fast-varying channel dynamics, we demonstrate how power control can help improve both the reliability at each time instant and throughput in the long-term. Based on the analysis, we propose a candidate framework of joint scheduling and power control, and we demonstrate how this framework behaves in guaranteeing per-packet communication reliability in the presence of wireless channel dynamics of different time scales. Collectively, these findings provide insight into the cross-layer design of joint scheduling and power control for ensuring predictable per-packet reliability in the presence of wireless network dynamics and uncertainties. 
    more » « less