diplomová práce

Heuristiky využívající strojového učení pro GraalVM Native Image

Autor práce: Ing. Tomáš Kender

Ak. rok: 2024/2025

Abstrakt:

Důležitou součástí optimalizace kódu jsou volání metod. Každé volání metody má navíc výpočetní režii, které se lze vyhnout inlinováním, tj. nahrazením volání metody tělem metody. Tato práce se zaměřuje na vylepšení heuristiky používané k inlinování metod po mocí strojového učení v nástroji Native Image, který je součástí sady nástrojů GraalVM. Za tímto účelem optimalizuje vnitřnou reprezentaci kódu pomocí neuronových sítí za ložených na grafu. K trénování těchto sítí jsme navrhli kódové řešení inspirované ge netickými algoritmy. Toto řešení nasazuje vygenerované modely, vyhodnocuje je pomocí srovnávacího testu a nejlepší modely z aktuální generace používá jako referenční pro bu doucí generace modelů. Trénujeme a testujeme dvě varianty architektur modelů, jedna je tradiční neuronová síť s dopředným posuvem a druhá je konvoluční grafová síť. Pro každý typ ověřujeme nejlepší sítové konfigurace na nové sadě scénářů, než která byla použita pro trénování.

Klíčová slova:

Java, GraalVM, kompilátor, function inlining, Torch, graf, neuronová síť, optimalizace

Termín obhajoby

24.06.2025

Výsledek obhajoby

obhájeno (práce byla úspěšně obhájena)

znamkaCznamka

Klasifikace

Průběh obhajoby

Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm C.

Otázky k obhajobě

The used machine learning approach based on genetic algorithms is quite non-traditional. Have you considered some of the more traditional algorithms for training the neural networks using unsupervised learning?
In your method, you implement a number of artificial safeguards to drive the learning in a safe direction, however, these seem to be one of the most important limitations of the approach. In the report, you also propose to lift some of these safeguards for further work to improve the results. Can you elaborate on which particular safeguards could be potentially lifted and how it could be done?
Kolik benchmarků jste použil?
Nejsou tři programy málo pro důkladné otestování?
Mění inlinování semantiku programu?

Jazyk práce

angličtina

Fakulta

Fakulta informačních technologií

Ústav

Ústav inteligentních systémů

Studijní program

Informační technologie a umělá inteligence (MITAI)

Specializace

Kybernetická bezpečnost (NSEC)

Složení komise

doc. Mgr. Kamil Malinka, Ph.D. (člen)
Ing. Zbyněk Křivka, Ph.D. (člen)
Ing. Radek Hranický, Ph.D. (člen)
Ing. Matěj Grégr, Ph.D. (člen)
Dr. Ing. Petr Peringer (člen)
doc. Ing. František Zbořil, CSc. (předseda)

Posudek vedoucího
Ing. David Kozák

The student delivered both the implementation and the written thesis at a satisfactory level of quality. However, additional work is needed for the trained models to be fully integrated into GraalVM Native Image. Nevertheless, the student successfully fulfilled all the goals outlined in the thesis assignment and contributed several enhancements beyond the original scope.

Kritérium hodnocení	Slovní hodnocení
Informace k zadání	The topic of this thesis was challenging, requiring the student to study the state of the art in both compilers and machine learning. The student met all the goals of the thesis assignment and successfully delivered a working prototype. However, the results indicate that further work is needed before the prototype can be integrated into the official release of GraalVM Native Image. That being said, I am overall satisfied with the outcome of the work.
Aktivita při dokončování	The student completed both the implementation and the written thesis on time, allowing me to review both components without time pressure.
Publikační činnost, ocenění	The resulting tool has been open-sourced on GitHub and is expected to serve as an inspiration for future work in this area.
Práce s literaturou	The student worked independently, conducted thorough literature reviews, and appropriately cited relevant sources throughout the thesis. I particularly appreciate the student's initiative in studying and selecting appropriate machine learning techniques.
Aktivita během řešení, konzultace, komunikace	The student demonstrated a responsible approach, consistently attended meetings, and was always well prepared.

Výsledný počet bodů navržený vedoucím: 83

Známka navržená vedoucím: B

Posudek oponenta
Ing. Viktor Malík, Ph.D.

Overall, the thesis proposes an interesting aproach for employing machine learning in the context of compiler optimizations, in particular function inlining. The student has designed and implemented a machine learning pipeline based on genetic algorithms with two different kinds of neural networks. Even though the impression of the thesis is worsened by a weaker text and the obtained results are not very promising for practical usage, the student has managed to explore the possibilities of applying machine learning in quite a non-traditional area and his obtained results can help in future development and research of this direction. He has certainly proved his engineering capabilities and therefore I recommend to accept the thesis with the final grade C (good).

Kritérium hodnocení	Slovní hodnocení	Body
Rozsah splnění požadavků zadání	Stupeň hodnocení: zadání splněno
Rozsah technické zprávy	Stupeň hodnocení: splňuje pouze minimální požadavky The text is a bit shorter, which is not necessarily wrong, however in this case, I would appreciate a bit better description of some parts. Especially the chapter on GraalVM and Native Image is quite incomplete and I struggled to understand some of the important parts. In addition, the text could greatly benefit from more detailed examples (using some simple programs) illustrating the concepts described throughout the entire thesis.
Prezentační úroveň technické zprávy	The presentation level of the report is not very good. The text is sometimes quite hard to read, many parts are confusing and not well linked together. Often, there are figures and tables which are not referenced from the text or are referenced only several pages later. The division into chapters is weird, for instance Chapters 5 and 6 describe the two different neural network layouts that the student experimented with, however, Chapter 5 is titled "Implementation" while Chapter 6 is titled "Inlining Based on Graph Neural Networks". Also, the description of the underlying concepts (Native Image, function inlining, machine learning principles and algorithms) is not unified and is scattered among the individual chapters. Overall, the structure of the text could use a lot of improvement which would greatly help understanding the thesis and would improve the overall impression of the thesis.	65
Formální úprava technické zprávy	The typesetting of the thesis is average, it doesn't contain any substantial defects, however, there's a number of smaller issues such as some amount of unnecessary vertical whitespace at the end of pages, overlapping texts in Figure 2.2, etc. The text itself is written in a decent English with only a small amount of lingustic or grammar mistakes.	75
Práce s literaturou	The student has explored a decent amount of related work, especially on compiler optimizations and on using machine learning within their context. However, I would expect and welcome a more detailed description (preferably an entire section) of the relevant approaches, possibly including a comparison with the author's proposed approach.	75
Realizační výstup	The student has designed a machine learning pipeline based on genetic algorithms and implemented two solutions using two different neural network layouts - one simple feed-forward network and the second more complicated convolutional graph neural network. On top of that, he modified the GraalVM Native Image compiler to communicate with the neural network for querying inlining decisions using a simple REST API. The amount of the implemented code is quite small (some 400 lines of Python code for the pipeline and one extra loop in GraalVM), however, it has taken a significant amount of experimenting to reach it. The student has also evaluated the final models on three different benchmarks.	70
Využitelnost výsledků	Unfortunately, the proposed results do not seem to be usable in practice. It turned out that the simple naive feed-forward network performs pretty much the same as the complex graph convolutional network which should, at least in theory, be much better. From my perspective, this is caused by a combination of the selection of a very simple fitness function in the genetic algorithm and the introduction of artificial guardrails which did not allow the algorithm to explore a sufficient variety of network configurations. On top of that, it is unclear whether changing some of these properties of the algorithm would lead to inferrence of networks which would perform better than the reference inlining heuristic. Despite that, since this field is very much unexplored, even this result can have certain value for further work.
Náročnost zadání	Stupeň hodnocení: obtížnější zadání The goal of the thesis was to develop a machine learning based heuristic for improving one aspect of Java programs compilation in GraalVM Native Image. This required to explore different non-trivial technologies - internals of GraalVM and Native Image itself as well as various machine learning approaches - and to propose and implement a machine learning pipeline. I believe that the scope of this task is above an average diploma thesis.

Otázky k obhajobě:

The used machine learning approach based on genetic algorithms is quite non-traditional. Have you considered some of the more traditional algorithms for training the neural networks using unsupervised learning?
In your method, you implement a number of artificial safeguards to drive the learning in a safe direction, however, these seem to be one of the most important limitations of the approach. In the report, you also propose to lift some of these safeguards for further work to improve the results. Can you elaborate on which particular safeguards could be potentially lifted and how it could be done?

Výsledný počet bodů navržený oponentem: 70

Známka navržená oponentem: C

Odpovědnost: Mgr. et Mgr. Hana Odstrčilová

VUT

Fakulty a vysokoškolské ústavy

Součásti

Heuristiky využívající strojového učení pro GraalVM Native Image