A Model Cleansing Pipeline for Model-driven Engineering: Mitigating the Garbage In, Garbage Out Problem for Open Model Repositories (MODELS 2025 - Research Papers)

Sun 5 - Fri 10 October 2025 Grand Rapids, Michigan, United States

Who

Andjela Djelic, Syed Juned Ali, Charlotte Verbruggen, Julia Neidhardt, Dominik Bork

Track

MODELS 2025 Research Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 8 Oct 2025 16:54 - 17:12 at DCIH 102 - Session 5: Quality Assurance and Model Management Chair(s): Houari Sahraoui

Abstract

In data-driven research within Model-Driven Engineering (MDE), the extraction of conceptual models, such as UML diagrams, from software repositories is a crucial step for analyzing software design, evolution, and quality. However, these extracted models often contain inconsistencies, redundancies, and noise because most model repositories are not curated. Without effective data cleansing, the reliability of empirical and machine learning (ML)-based MDE studies working with these repositories is seriously threatened. This paper proposes a data cleansing pipeline designed to effectively cleanse model repositories. Our approach systematically addresses common data quality issues by offering a sequence of automated pre-processing, validation, and filtering steps based on rule-based heuristics and ML techniques. By integrating conceptual modeling-specific data cleansing techniques into an automated pipeline, our approach reduces manual intervention, enhances reproducibility, and supports scalable analysis of model repositories. In an experimental evaluation of open-source UML diagram repositories, we demonstrate the effectiveness of our method in cleansing models. In two reproducibility studies, we further show the statistically significant effect the use of our MCP4CM pipeline has on downstream tasks.

Andjela Djelic

TU Wien

Austria

Syed Juned Ali

TU Wien

Austria

Charlotte Verbruggen

TU Wien

Austria

Julia Neidhardt

TU Wien

Austria

Dominik Bork

TU Wien

Austria

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 8 Oct
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	Session 5: Quality Assurance and Model ManagementNew Ideas and Emerging Results (NIER) / Research Papers / Journal-First at DCIH 102 Chair(s): Houari Sahraoui DIRO, Université de Montréal Hybrid

16:00 18m Talk		Streamlined Integration of GR(1) Synthesis and Reinforcement Learning for Optimizing Critical Cyber-Physical SystemsFT Research Papers Eric Roslin Wete Poaka Leibniz Universität Hannover, Joel Greenyer FHDW Hannover, Tom Yaacov Ben-Gurion University of the Negev, Daniel Kudenko L3S Research Center, Leibniz Universität Hannover, Germany, Wolfgang Nejdl Leibniz Universität Hannover
16:18 18m Talk		Refactoring with Confidence: An Assistant for Repair-Integrated Refactoring in Block-based Industrial ModelsPT Research Papers Michael Oberlehner LIT CPS Lab, Johannes Kepler University Linz, Bianca Wiesmayr Johannes Kepler University Linz, Alois Zoitl LIT CPS Lab, Johannes Kepler University Linz
16:36 18m Talk		Inclusive Model-Driven Engineering for Accessible Software New Ideas and Emerging Results (NIER) Dominik Bork TU Wien, Stefan Klikovits Johannes Kepler University, Linz, Judith Michael University of Regensburg, Lukas Netz RWTH Aachen University, Bernhard Rumpe RWTH Aachen University Pre-print
16:54 18m Talk		A Model Cleansing Pipeline for Model-driven Engineering: Mitigating the Garbage In, Garbage Out Problem for Open Model RepositoriesFT Research Papers Andjela Djelic TU Wien, Syed Juned Ali TU Wien, Charlotte Verbruggen TU Wien, Julia Neidhardt TU Wien, Dominik Bork TU Wien
17:12 18m Talk		Modeling with Gentleman: a web-based projectional editor Journal-First Louis-Edouard Lafontant University of Montreal, Eugene Syriani Université de Montréal DOI