diplomová práce

Automatizace vyhodnocení nástroje DiffKemp na open-source projektech

Text práce 991.22 kB

Autor práce: Ing. Lukáš Petr

Ak. rok: 2024/2025

Vedoucí: Ing. Viktor Malík, Ph.D.

Oponent: Ing. David Kozák

Abstrakt:

Cílem této práce je navrhnout a vytvořit dvě automatizace pro DiffKemp, nástroj kontrolující sémantickou rovnost, které by byly užitečné pro jeho vývoj. Prvním cílem je automatizovat vyhodnocování dopadu nových vylepšení tohoto nástroje. Druhým úkolem je vytvořit řešení, které by automaticky vyhodnotilo nové verze projektů s otevřeným kódem pomocí nástroje DiffKemp, ulehčilo vyhodnocení jejich výsledků, umožnilo je klasifikovat a uložit. První cíl byl dosažen a je zdokumentován v této práci, druhá část je naimplementována, ale není zdokumentována. Pro automatizaci byla vybrána datová sada EqBench
a jádra systému RHEL. Pro vyhodnocení dopadu byla vybrána metoda, která porovnává výsledky těchto projektů získané použitím původní verze nástroje DiffKemp s vylepšenou verzí. Byla vytvořena GitHub aplikace s využitím platformy Probot a konejneru Podman. Vytvořené řešení bylo nakonec vyhodnoceno na předchozích vylepšeních, která byla do DiffKempu přidána.

Klíčová slova:

DiffKemp, sémantické rozdíly, automatizace, vývoj softwaru, požadavek na stažení, vyhodnocení, průběžná integrace, Linuxové jádro, sbírka programů EqBench, GitHub App

Termín obhajoby

23.06.2025

Výsledek obhajoby

obhájeno (práce byla úspěšně obhájena)

znamkaCznamka

Klasifikace

C

Průběh obhajoby

Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných, např. ohledně automatického provádění frameworku DiffKemp. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm C - dobře.

Otázky k obhajobě

  1. In the submitted technical solution, the diffkemp-dev-bot is implemented using the technologies described in the thesis (TypeScript, Probot, etc.), but the diffkemp-automation component is built with a different development stack, namely Python and Flask. In my opinion, this polyglot approach adds complexity to the system. Could you please explain the reasoning behind choosing multiple development stacks?
  2. Could you please briefly showcase your solution for the automatic execution of DiffKemp on new versions or patches of the selected projects?

Jazyk práce

angličtina

Fakulta

Ústav

Studijní program

Informační technologie a umělá inteligence (MITAI)

Specializace

Informační systémy a databáze (NISD)

Složení komise

doc. Dr. Ing. Dušan Kolář (předseda)
RNDr. Marek Rychlý, Ph.D. (člen)
Ing. Zbyněk Křivka, Ph.D. (člen)
Ing. Šárka Květoňová, Ph.D. (člen)
Ing. Radek Hranický, Ph.D. (člen)
Ing. Jiří Hynek, Ph.D. (člen)

Posudek vedoucího
Ing. Viktor Malík, Ph.D.

The student did a good amount of work and implemented working solutions combining multiple non-trivial technologies which can be deployed to production with minimal changes required. Unfortunately, the overall impression of the work is spoiled by the unfinished second part which is missing evaluation and description in the final report. Despite this shortcoming, I believe that the student has still proven sufficient engineering skills and capabilities and I would recommend accepting this diploma thesis. Except for the last few weeks, the student's approach was excellent and therefore I propose grade B (very good).

Kritérium hodnocení Slovní hodnocení
Informace k zadání

The thesis consists of implementing two automation solutions for an existing static code analyser DiffKemp, which is a research project whose priority is applicability in industrial environment. The created automations should help achieving this goal by simplifying evaluation of DiffKemp on existing open-source projects. On top of studying and understanding DiffKemp itself, the automations required to design and implement end-to-end solutions in different real-world environments (GitHub Actions, custom web service). That required the student to familiarize himself with a handful of technologies, evaluate them, and pick the best ones for implementation. Therefore, I consider the complexity of the assignment slightly above average. While both automations were implemented and are now in the process of being deployed, the second part is not described in the thesis text and no experiments were performed with it. This was caused by the student's sudden drop of productivity during the final weeks of the semester. Despite this shortcoming, I believe that the extent of the parts that were implemented and are described in the text is sufficient for a diploma thesis project.

Aktivita při dokončování

While student's activity during the first three quarters of the academic year was exceptional, it was exactly the opposite during the final weeks when the student was supposed to perform additional experiments and finish the text. This sudden drop of productivity (student was still responsive and acknowledged this problem) caused the thesis not to be entirely finished. Especially some parts of the final report were written in a harsh manner without me having sufficient time to review them.

Publikační činnost, ocenění

The implementation of the first part (the one described in the thesis) is currently a private GitHub project under DiffKemp organization and undergoes review from DiffKemp maintainers. Once the reviews are finished, the project will be made public. The same should eventually happen to the second part of the thesis.

Práce s literaturou

The thesis is mostly about the implementation so not much literature was required to be studied. On the other hand, the assignment required to explore a rather large amount of different technologies, projects, and libraries which the student did in a very thourough manner.

Aktivita během řešení, konzultace, komunikace

Since the beginning of the academic year, the student was very active, we met periodically and he was almost always able to present new progress in the thesis.

Výsledný počet bodů navržený vedoucím: 80

Známka navržená vedoucím: B

Posudek oponenta
Ing. David Kozák

Overall, the thesis is well-structured and easy to follow, with outcomes that should be immediately useful to the DiffKemp developers. Unfortunately, goal 3) was not addressed in the text (although it was properly implemented), which prevents me from recommending a higher grade. That said, even without the inclusion of goal 3), the thesis is of appropriate length, covers sufficient detail, and meets the expectations for a high-quality master’s thesis. Given the circumstances and the amount of work the student has done, I believe the grade could be further improved during the thesis defence if the student delivers a strong presentation.

Kritérium hodnocení Slovní hodnocení Body
Rozsah splnění požadavků zadání

Stupeň hodnocení: zadání téměř splněno s drobnými výhradami

My main objection is the lack of discussion about goal 3) in the text, which means that only the automated evaluation of new DiffKemp features is covered, but not the automated execution of DiffKemp on new versions or patches of selected projects.

Rozsah technické zprávy

Stupeň hodnocení: je v obvyklém rozmezí

The thesis is of typical length, with all chapters containing meaningful content and being appropriately structured. It reads well and is highly informative.

Prezentační úroveň technické zprávy

The thesis is well-structured into logical sections that provide a natural reading flow. The topic is clearly described, and the text is easy to follow.

90
Formální úprava technické zprávy

The text is neatly typeset and written in clear, appropriate language.

90
Práce s literaturou

The thesis includes an in-depth review of relevant topics and the current state of DiffKemp development. It is evident that the student dedicated a sufficient amount of time to understanding the domain before proceeding to the implementation.

95
Realizační výstup

The technical solution is well-structured, with clearly documented and easy-to-follow code. It should be both reusable and easily extensible. Moreover, it addresses all the goals outlined in the thesis assignment, making it clear that, although the text does not cover goal 3), the implementation itself is thorough and complete.

95
Využitelnost výsledků

The output of this work is expected to be valuable to the developers of DiffKemp, making the thesis a meaningful and valuable contribution to the project.

Náročnost zadání

Stupeň hodnocení: obtížnější zadání

I find the thesis assignment challenging, as it involves two core goals that, while somewhat overlapping, address different use cases. Each of these goals would be a complex task on its own, so developing a solution that fulfils both represents a significant effort.

Otázky k obhajobě:
  1. In the submitted technical solution, the diffkemp-dev-bot is implemented using the technologies described in the thesis (TypeScript, Probot, etc.), but the diffkemp-automation component is built with a different development stack, namely Python and Flask. In my opinion, this polyglot approach adds complexity to the system. Could you please explain the reasoning behind choosing multiple development stacks?
  2. Could you please briefly showcase your solution for the automatic execution of DiffKemp on new versions or patches of the selected projects?
Výsledný počet bodů navržený oponentem: 79

Známka navržená oponentem: C

Odpovědnost: Mgr. et Mgr. Hana Odstrčilová