NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RoboRebound: Multi-Robot System Defense with Bounded-Time Interaction

https://doi.org/10.1145/3689031.3696079

Gandhi, Neeraj; Cai, Yifan; Haeberlen, Andreas; Phan, Linh_Thi Xuan (March 2025, ACM)

Byzantine Fault Tolerance (BFT) is a classic technique for defending distributed systems against a wide range of faults and attacks. However, existing solutions are designed for systems where nodes can interact only by exchanging messages. They are not directly applicable to systems where nodes have sensors and actuators and can also interact in the physical world – perhaps by blocking each other’s path or by crashing into each other. In this paper, we take a first stab at extending BFT to this larger class of systems. We focus on multi-robot systems (MRS), an emerging technology that is increasingly being deployed for applications such as target tracking, warehouse logistics, and exploration. An MRS can consist of dozens of interacting robots and is thus a bona-fide distributed system. The classic masking guarantee is not practical in a MRS, but we propose a variant called bounded-time interaction that can be implemented, and we present an algorithm that achieves it, in combination with a few small hardware tweaks. We built a simulator and prototyped wheeled robots to show that our algorithm is effective, and that it has a reasonable overhead.
more » « less
Free, publicly-accessible full text available March 30, 2026
Rotor Fault Detection and Isolation in Aerial Vehicles with Dozens of Rotors

https://doi.org/10.1109/LARS64411.2024.10786448

Gandhi, Neeraj; Xu, Jiawei; Saldaña, David; Phan, Linh_Thi Xuan (November 2024, IEEE)

Aerial vehicles with dozens of rotors are becoming increasingly common in important applications such as transportation and construction. One challenge with building such a system is to ensure that the system is robust against faults: as the number of rotors increases, the likelihood of a rotor failing during operation also increases; despite the spare thrust capacity provided by the redundant rotors, a rotor fault can significantly impact the motion and safety of the system. This paper presents an efficient fault detection and isolation (FDI) method for aerial vehicles with a large number of rotors. Our approach relies on two key insights: First, the effect of a faulty rotor directly affects the tracking error in roll and in pitch. This property can be used to order our faulty rotor search space. Second, the error in either roll or pitch is related to both the distance from the (relevant) axis and the severity of a fault. With these observations, we can use probe faults to isolate faulty rotors. Evaluation results show that our technique can efficiently detect and isolate faults in multi-rotor aerial vehicles with up to 64 rotors (8 more rotors than in existing FDI work), and that it can help improve robustness. To the best of our knowledge, our FDI method is the first that scales to several dozens of rotors.
more » « less
Full Text Available
Online Rotor Fault Detection and Isolation for Vertical Takeoff and Landing Vehicles

https://doi.org/10.1109/IROS58592.2024.10802021

Lian, Jiaqi; Gandhi, Neeraj; Wang, Yifan; Xuan_Phan, Linh Thi (October 2024, IEEE)

Vertical take-off and landing (VTOL) vehicles are becoming increasingly popular for real-world transport; but, as with any vehicle, guaranteeing safety is both extremely critical and highly challenging due to issues like rotor faults. Existing fault detection and isolation (FDI) techniques usually focus on multirotor systems or fixed wing systems, rather than the hybrid VTOLs. Since VTOLs have both rotors and ailerons, a fault in a rotor may be masked by the (correctly working) ailerons, making it much more difficult to detect faults. However, this masking only works when ailersons are used (e.g., during cruising), leaving the takeoff and landing vulnerable to crashes. This paper presents an online rotor fault detection and isolation (FDI) method for VTOLs. The approach uses pose analysis and aileron command data to quickly and accurately identify the faulty rotor and to compute the severity of the fault. Our method works for hard-to-detect fault scenarios, such as small-severity faults that are masked during cruise flight but not during vertical motion. We evaluated our technique in a SITL PX4 simulation of a modified Deltaquad QuadPlane. The results show that our FDI technique can quickly detect and isolate faults in real time (within 1s-2.5s) and achieve high isolation success rate (91.67%) across six rotors, and that it can estimate the severity of faults to within 2%. When applying a simple recovery process post-isolation, the system consistently achieved safe landing.
more » « less
Full Text Available
DNA: Dynamic Resource Allocation for Soft Real-Time Multicore Systems

https://doi.org/10.1109/RTAS52030.2021.00024

Gifford, Robert; Gandhi, Neeraj; Phan, Linh Thi; Haeberlen, Andreas (May 2021, Proceedings of the 27th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS '21))
null (Ed.)
Modern latency-sensitive and real-time systems often use multi-core platforms; thus, tasks on different cores share certain hardware resources, such as the memory bus and certain cache levels. This has two undesirable consequences: (1) tasks can interfere with each other, causing high latency for the system as a whole, and (2) it becomes difficult to meet deadlines, since the worst-case timing of a given task depends on the worst task it might have to compete with. Static partitioning isolates tasks from each other by allocating a certain fraction of the resources to each; however, many tasks execute in different phases (e.g., memory-intensive and CPU-intensive) that have different requirements. Thus, system designers are left with a choice between overprovisioning, based on the most demanding phase, or suboptimal performance. In this paper, we propose a pair of techniques, called DNA and DADNA, to address the above challenge. DNA increases throughput and decreases latency, by building an execution profile of each task to identify the phases, and then dynamically allocating resources based on which task can benefit the most; DADNA further adds support for soft real-time workloads by taking deadlines into account. We have built a prototype of both techniques in the Xen hypervisor; our experimental results show that, compared to a state-of-the-art solution, DNA and DADNA can substantially improve schedulability, reduce job deadline miss ratios, and cut latencies by more than a factor of two even in extremely overloaded situations.
more » « less
Full Text Available
REBOUND: Defending Distributed Systems Against Attacks with Bounded-Time Recovery

https://doi.org/10.1145/3447786.3456257

Gandhi, Neeraj; Roth, Edo; Sandler, Brian; Haeberlen, Andreas; Phan, Linh Thi (April 2021, Proceedings of the 16th European Conference on Computer Systems (EuroSys'21))
null (Ed.)
This paper shows how to use bounded-time recovery (BTR) to defend distributed systems against non-crash faults and attacks. Unlike many existing fault-tolerance techniques, BTR does not attempt to completely mask all symptoms of a fault; instead, it ensures that the system returns to the correct behavior within a bounded amount of time. This weaker guarantee is sufficient, e.g., for many cyber-physical systems, where physical properties - such as inertia and thermal capacity - prevent quick state changes and thus limit the damage that can result from a brief period of undefined behavior. We present an algorithm called REBOUND that can provide BTR for the Byzantine fault model. REBOUND works by detecting faults and then reconfiguring the system to exclude the faulty nodes. This supports very fine-grained responses to faults: for instance, the system can move or replace existing tasks, or drop less critical tasks entirely to conserve resources. REBOUND can take useful actions even when a majority of the nodes is compromised, and it requires less redundancy than full fault-tolerance.
more » « less
Full Text Available
Self-Reconfiguration in Response to Faults in Modular Aerial Systems

https://doi.org/10.1109/LRA.2020.2970685

Gandhi, Neeraj; Saldana, David; Kumar, Vijay; Phan, Linh Thi (April 2020, IEEE Robotics and Automation Letters)

Full Text Available
RTNF: Predictable Latency for Network Function Virtualization

https://doi.org/10.1109/RTAS.2019.00038

Abedi, Saeed; Gandhi, Neeraj; Demoulin, Henri Maxime; Li, Yang; Wu, Yang; Phan, Linh Thi (April 2019, IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS))

Full Text Available

Search for: All records