Streamlined Integration of GR(1) Synthesis and Reinforcement Learning for Optimizing Critical Cyber-Physical SystemsFT
This program is tentative and subject to change.
Cyber-physical systems (CPSs) must meet safety-critical requirements and optimization goals. Reactive controller synthesis (RCS) from temporal logic (TL) specifications can automatically construct agent behaviors that ensure safety and liveness. However, RCS struggles with time and probability aspects. Reinforcement Learning (RL) can optimize system behaviors, but cannot guarantee correctness. RL and RCS can be combined by shielding an agent with a synthesized controller, ensuring that the agent behavior satisfies the TL specification. However, this requires aligning different models: the TL specification and the often hand-crafted RL training environment code. We propose a three-step method, centered around code generation and manual refinement, for streamlining this alignment: (1) Formalize discrete environment assumptions and guarantees into a Spectra TL specification. (2) From it, automatically generate Gymnasium-compatible training environment code. For the target code, we employ BPpy, which enables a unique one-to-one mapping of the Spectra properties to individual code modules. This allows users to refine time and probability aspects as well as optimization goals without misaligning the models. (3) Synthesize a permissive controller from the TL specification and integrate it as a shield for the agent. In contrast to previous work, our approach (a) guarantees not only safety, but also liveness; (b) even if the system is too complex for RCS, we can leverage the generated BPpy code as a model-at-runtime that shields the agent from unsafe actions.