This content will become publicly available on May 13, 2025
- Award ID(s):
- 2333980
- PAR ID:
- 10499418
- Publisher / Repository:
- International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys)
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Cyber-Physical Systems (CPS) are integrations of computation, networking, and physical processes. The autonomy and self-adaptation capabilities of CPS mark a significant evolution from traditional control systems. Machine learning significantly enhances the functionality and efficiency of Cyber-Physical Systems (CPS). Large Language Models (LLM), like GPT-4, can augment CPS’s functionality to a new level by providing advanced intelligence support. This fact makes the applications above potentially unsafe and thus untrustworthy if deployed to the real world. We propose a comprehensive and general assurance framework for LLM-enabled CPS. The framework consists of three modules: (i) the context grounding module assures the task context has been accurately grounded (ii) the temporal Logic requirements specification module forms the temporal requirements into logic specifications for prompting and further verification (iii) the formal verification module verifies the output of the LLM and provides feedback as a guideline for LLM. The three modules execute iteratively until the output of LLM is verified. Experiment results demonstrate that our assurance framework can assure the LLM-enabled CPS.more » « less
-
Owing1 to an immense growth of internet-connected and learning-enabled cyber-physical systems (CPSs) [1], several new types of attack vectors have emerged. Analyzing security and resilience of these complex CPSs is difficult as it requires evaluating many subsystems and factors in an integrated manner. Integrated simulation of physical systems and communication network can provide an underlying framework for creating a reusable and configurable testbed for such analyses. Using a model-based integration approach and the IEEE High-Level Architecture (HLA) [2] based distributed simulation software; we have created a testbed for integrated evaluation of large-scale CPS systems. Our tested supports web-based collaborative metamodeling and modeling of CPS system and experiments and a cloud computing environment for executing integrated networked co-simulations. A modular and extensible cyber-attack library enables validating the CPS under a variety of configurable cyber-attacks, such as DDoS and integrity attacks. Hardware-in-the-loop simulation is also supported along with several hardware attacks. Further, a scenario modeling language allows modeling of alternative paths (Courses of Actions) that enables validating CPS under different what-if scenarios as well as conducting cyber-gaming experiments. These capabilities make our testbed well suited for analyzing security and resilience of CPS. In addition, the web-based modeling and cloud-hosted execution infrastructure enables one to exercise the entire testbed using simply a web-browser, with integrated live experimental results display.more » « less
-
Abstract Digitally enabled technologies are increasingly cyber-physical systems (CPSs). They are networked in nature and made up of geographically dispersed components that manage and control data received from humans, equipment, and the environment. Researchers evaluating such technologies are thus challenged to include CPS subsystems and dynamics that might not be obvious components of a product system. Although analysts might assume CPS have negligible or purely beneficial impact on environmental outcomes, such assumptions require justification. As the physical environmental impacts of digital processes (e.g. cryptocurrency mining) gain attention, the need for explicit attention to CPS in environmental assessment becomes more salient. This review investigates how the peer-reviewed environmental assessment literature treats environmental implications of CPS, with a focus on journal articles published in English between 2010 and 2020. We identify nine CPS subsystems and dynamics addressed in this literature: energy system, digital equipment, non-digital equipment, automation and management, network infrastructure, direct costs, social and health effects, feedbacks, and cybersecurity. Based on these categories, we develop a ‘cyber-consciousness score’ reflecting the extent to which the 115 studies that met our evaluation criteria address CPS, then summarize analytical methods and modeling techniques drawn from reviewed literature to facilitate routine inclusion of CPS in environmental assessment. We find that, given challenges in establishing system boundaries, limited standardization of how to evaluate CPS dynamics, and failure to recognize the role of CPS in a product system under evaluation, the extant environmental assessment literature in peer-reviewed journals largely ignores CPS subsystems and dynamics when evaluating digital or digitally-enabled technologies.
-
Cyber-physical systems (CPS) have experienced rapid growth in recent decades. However, like any other computer-based systems, malicious attacks evolve mutually, driving CPS to undesirable physical states and potentially causing catastrophes. Although the current state-of-the-art is well aware of this issue, the majority of researchers have not focused on CPS recovery, the procedure we defined as restoring a CPS’s physical state back to a target condition under adversarial attacks. To call for attention on CPS recovery and identify existing efforts, we have surveyed a total of 30 relevant papers. We identify a major partition of the proposed recovery strategies: shallow recovery vs. deep recovery, where the former does not use a dedicated recovery controller while the latter does. Additionally, we surveyed exploratory research on topics that facilitate recovery. From these publications, we discuss the current state-of-the-art of CPS recovery, with respect to applications, attack type, attack surfaces and system dynamics. Then, we identify untouched sub-domains in this field and suggest possible future directions for researchers.
-
Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml, Racket, and several others). This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called MultiPL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. MultiPL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. Using datasets generated with MultiPL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that MultiPL-T continues to outperform other fine-tuning approaches for low-resource languages. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.more » « less