SHERPA: A Model-Driven Framework for Large Language Model Execution (MODELS 2025 - Research Papers)

Who

Boqi Chen, Kua Chen, José Antonio Hernández López, Gunter Mussbacher, Daniel Varro, Amir Feizpour

Track

MODELS 2025 Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 8 Oct 2025 14:54 - 15:12 at DCIH 102 - Session 3: Large Language Models and Modeling Chair(s): Bentley Oakes

Abstract

Recently, large language models (LLMs) have achieved widespread application across various fields. Despite their impressive capabilities, LLMs suffer from a lack of structured reasoning ability, particularly for complex tasks requiring domain-specific best practices, which are often unavailable in the training data. Although multi-step prompting methods incorporating human best practices, such as chain-of-thought and tree-of-thought, have gained popularity, they lack a general mechanism to control LLM behavior. In this paper, we propose SHERPA, a model-driven framework to improve the LLM performance on complex tasks by explicitly incorporating domain-specific best practices into hierarchical state machines. By structuring the LLM execution processes using state machines, SHERPA enables more fine-grained control over their behavior via rules or decisions driven by machine learning-based approaches, including LLMs. We show that SHERPA is applicable to a wide variety of tasks—specifically, code generation, class name generation, and question answering—replicating previously proposed approaches while further improving the performance. We demonstrate the effectiveness of SHERPA for the aforementioned tasks using various LLMs. Our systematic evaluation compares different state machine configurations against baseline approaches without state machines. Results show that integrating well-designed state machines significantly improves the quality of LLM outputs. Additionally, SHERPA is particularly beneficial for complex tasks with well-established human best practices but lacking data used for training LLMs.

Link to Preprint

https://arxiv.org/abs/2509.00272

Boqi Chen

McGill University

Canada

Kua Chen

McGill University

Canada

José Antonio Hernández López

Department of Computer Science and Systems, University of Murcia

Spain

Gunter Mussbacher

McGill University

Canada

Daniel Varro

Linköping University / McGill University

Sweden

Amir Feizpour

Aggregate Intellect

Canada

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 8 Oct
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Session 3: Large Language Models and ModelingResearch Papers / New Ideas and Emerging Results (NIER) at DCIH 102 Chair(s): Bentley Oakes Polytechnique Montréal Hybrid

14:00 18m Talk		MCeT: Behavioral Model Correctness Evaluation using Large Language ModelsFT Research Papers Khaled Ahmed Huawei Research Canada, University of British Columbia (UBC), Jialing Song Huawei Technologies Canada, Boqi Chen McGill University, Ou Wei Huawei Technologies Canada, Bingzhou Zheng Huawei Technologies Canada Pre-print
14:18 18m Talk		Model-Driven Quantum Code Generation Using Large Language Models and Retrieval-Augmented Generation New Ideas and Emerging Results (NIER) Nazanin Siavash University of Colorado Colorado Springs (UCCS), Armin Moin University of Colorado Colorado Springs
14:36 18m Talk		Towards LLM-enhanced Conflict Detection and Resolution in Model Versioning New Ideas and Emerging Results (NIER) Martin Eisenberg Johannes Kepler University, Linz, Stefan Klikovits Johannes Kepler University, Linz, Manuel Wimmer JKU Linz, Konrad Wieland LieberLieber Software GmbH
14:54 18m Talk		SHERPA: A Model-Driven Framework for Large Language Model ExecutionFT Research Papers Boqi Chen McGill University, Kua Chen McGill University, José Antonio Hernández López Department of Computer Science and Systems, University of Murcia, Gunter Mussbacher McGill University, Daniel Varro Linköping University / McGill University, Amir Feizpour Aggregate Intellect Pre-print
15:12 18m Talk		Accurate and Consistent Graph Model Generation from Text with Large Language ModelsFT Research Papers Boqi Chen McGill University, Ou Wei Huawei Technologies Canada, Bingzhou Zheng Huawei Technologies Canada, Gunter Mussbacher McGill University Pre-print