skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Separation of learning and control for cyber-physical systems
Most cyber–physical systems (CPS) encounter a large volume of data which is added to the system gradually in real time and not altogether in advance. In this paper, we provide a theoretical framework that yields optimal control strategies for such CPS at the intersection of control theory and learning. In the proposed framework, we use the actual CPS, i.e., the ‘‘true" system that we seek to optimally control online, in parallel with a model of the CPS that is available. We then institute an information state for the system which does not depend on the control strategy. An important consequence of this independence is that for any given choice of a control strategy and a realization of the system’s variables until time t, the information states at future times do not depend on the choice of the control strategy at time t but only on the realization of the decision at time t, and thus they are related to the concept of separation between estimation of the state and control. Namely, the future information states are separated from the choice of the current control strategy. Such control strategies are called separated control strategies. Hence, we can derive offline the optimal control strategy of the system with respect to the information state, which might not be precisely known due to model uncertainties or complexity of the system, and then use standard learning approaches to learn the information state online while data are added gradually to the system in real time. We show that after the information state becomes known, the separated control strategy of the CPS model derived offline is optimal for the actual system. We illustrate the proposed framework in a dynamic system consisting of two subsystems with a delayed sharing information structure.  more » « less
Award ID(s):
2149520 2219761
PAR ID:
10421256
Author(s) / Creator(s):
Date Published:
Journal Name:
Automatica
Volume:
151
Issue:
110912
ISSN:
0005-1098
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The predictive monitoring problem asks whether a deployed system is likely to fail over the next T seconds under some environmental conditions. This problem is of the utmost importance for cyber-physical systems, and has inspired real-time architectures capable of adapting to such failures upon forewarning. In this paper, we present a linear model-predictive scheme for the real-time monitoring of linear systems governed by time-triggered controllers and time-varying disturbances. The scheme uses a combination of offline (advance) and online computations to decide if a given plant model has entered a state from which no matter what control is applied, the disturbance has a strategy to drive the system to an unsafe region. Our approach is independent of the control strategy used: this allows us to deal with plants that are controlled using model-predictive control techniques or even opaque machine-learning based control algorithms that are hard to reason with using existing reachable set estimation algorithms. Our online computation reuses the symbolic reachable sets computed offline. The real-time monitor instantiates the reachable set with a concrete state estimate, and repeatedly performs emptiness checks with respect to a safety property. We classify the various alarms raised by our approach in terms of what they imply about the system as a whole. We implement our real-time monitoring approach over numerous linear system benchmarks and show that the computation can be performed rapidly in practice. Furthermore, we also examine the alarms reported by our approach and show how some of the alarms can be used to improve the controller. 
    more » « less
  2. Abstract People tend to employ suboptimal attention control strategies during visual search. Here we question why people are suboptimal, specifically investigating how knowledge of the optimal strategies and the time available to apply such strategies affect strategy use. We used the Adaptive Choice Visual Search (ACVS), a task designed to assess attentional control optimality. We used explicit strategy instructions to manipulate explicit strategy knowledge, and we used display previews to manipulate time to apply the strategies. In the first two experiments, the strategy instructions increased optimality. However, the preview manipulation did not significantly boost optimality for participants who did not receive strategy instruction. Finally, in Experiments 3A and 3B, we jointly manipulated preview and instruction with a larger sample size. Preview and instruction both produced significant main effects; furthermore, they interacted significantly, such that the beneficial effect of instructions emerged with greater preview time. Taken together, these results have important implications for understanding the strategic use of attentional control. Individuals with explicit knowledge of the optimal strategy are more likely to exploit relevant information in their visual environment, but only to the extent that they have the time to do so. 
    more » « less
  3. Hybrid electric vehicles can achieve better fuel economy than conventional vehicles by utilizing multiple power sources. While these power sources have been controlled by rule-based or optimization-based control algorithms, recent studies have shown that machine learning-based control algorithms such as online Deep Reinforcement Learning (DRL) can effectively control the power sources as well. However, the optimization and training processes for the online DRL-based powertrain control strategy can be very time and resource intensive. In this paper, a new offline–online hybrid DRL strategy is presented where offline vehicle data are exploited to build an initial model and an online learning algorithm explores a new control policy to further improve the fuel economy. In this manner, it is expected that the agent can learn an environment consisting of the vehicle dynamics in a given driving condition more quickly compared to the online algorithms, which learn the optimal control policy by interacting with the vehicle model from zero initial knowledge. By incorporating a priori offline knowledge, the simulation results show that the proposed approach not only accelerates the learning process and makes the learning process more stable, but also leads to a better fuel economy compared to online only learning algorithms. 
    more » « less
  4. Operating distributed cloudlets at optimal cost is nontrivial when facing not only the dynamic and unpredictable resource prices and user requests, but also the low efficiency of today's immature cloudlet infrastructures. We propose to control cloudlet networks at multiple granularities - fine-grained control of servers inside cloudlets and coarse-grained control of cloudlets themselves. We model this problem as a mixed-integer nonlinear program with the switching cost over time. To solve this problem online, we firstly linearize, "regularize", and decouple it into a series of one-shot subproblems that we solve at each corresponding time slot, and afterwards we design an iterative, dependent rounding framework using our proposed randomized pairwise rounding algorithm to convert the fractional control decisions into the integral ones at each time slot. Via rigorous theoretical analysis, we exhibit our approach's performance guarantee in terms of the competitive ratio and the multiplicative integrality gap towards the offline optimal integral decisions. Extensive evaluations with real-world data confirm the empirical superiority of our approach over the single granularity server control and the state-of-the-art algorithms. 
    more » « less
  5. Abstract Traditionally, neuroscience and psychology have studied the human brain during periods of “online” attention to the environment, while participants actively engage in processing sensory stimuli. However, emerging evidence shows that the waking brain also intermittently enters an “offline” state, during which sensory processing is inhibited and our attention shifts inward. In fact, humans may spend up to half of their waking hours offline [Wamsley, E. J., & Summer, T. Spontaneous entry into an “offline” state during wakefulness: A mechanism of memory consolidation? Journal of Cognitive Neuroscience, 32, 1714–1734, 2020; Killingsworth, M. A., & Gilbert, D. T. A wandering mind is an unhappy mind. Science, 330, 932, 2010]. The function of alternating between online and offline forms of wakefulness remains unknown. We hypothesized that rapidly switching between online and offline states enables the brain to alternate between the competing demands of encoding new information and consolidating already-encoded information. A total of 46 participants (34 female) trained on a memory task just before a 30-min retention interval, during which they completed a simple attention task while undergoing simultaneous high-density EEG and pupillometry recording. We used a data-driven method to parse this retention interval into a sequence of discrete online and offline states, with a 5-sec temporal resolution. We found evidence for three distinct states, one of which was an offline state with features well-suited to support memory consolidation, including increased EEG slow oscillation power, reduced attention to the external environment, and increased pupil diameter (a proxy for increased norepinephrine). Participants who spent more time in this offline state following encoding showed improved memory at delayed test. These observations are consistent with the hypothesis that even brief, seconds-long entry into an offline state may support the early stages of memory consolidation. 
    more » « less