Přístupnostní navigace
E-application
Search Search Close
Master's Thesis
Author of thesis: Bc. Samuel Kuchta
Acad. year: 2025/2026
Supervisor: Ing. Michal Hradiš, Ph.D.
Reviewer: doc. Ing. Michal Španěl, Ph.D.
This thesis contributes a purpose-built environment, GridMazeWorld, designed to confront reinforcement learning agents with multiple, interdependent memory demands under partial observability. The environment combines procedurally generated mazes, non‑local button–door dependencies, periodic dynamics, regrowing resources, and an energy budget, requiring simultaneous spatial, sequential, and relational memory. Alongside a comprehensive survey of memory‑focused benchmarks and mechanisms, we implement and compare recurrent (LSTM), and attention‑based (Transformer) architectures under controlled training with curriculum management. Our empirical study examines the influence of hyperparameter choices, network scaling, grid size, and curriculum design on the agents’ ability to form and exploit internal memory, offering a systematic analysis of memory in reinforcement learning under partial observability.
partial observability, reinforcement learning, memory, GridMazeWorld, recurrent neural networks.
Date of defence
24.06.2026
Result of the defence
Defended (thesis was successfully defended)
Grading
D
Process of defence
Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm D.
Topics for thesis defence
Language of thesis
English
Faculty
Fakulta informačních technologií
Department
Department of Computer Graphics and Multimedia
Study programme
Information Technology and Artificial Intelligence (MITAI)
Specialization
Computer Vision (NVIZ)
Composition of Committee
prof. Ing. Adam Herout, Ph.D. (předseda) prof. Ing. Martin Čadík, Ph.D. (místopředseda) doc. RNDr. Milan Češka, Ph.D. (člen) prof. Dr. Ing. Pavel Zemčík, dr. h. c. (člen) Ing. David Bařina, Ph.D. (člen) Ing. Tomáš Milet, Ph.D. (člen)
Supervisor’s reportIng. Michal Hradiš, Ph.D.
Student má jasný zájem o řešené téma a odhodlání hlouběji pochopit současný stav poznání. Z různých důvodů byla ale jeho aktivita sporadická a práci dokončoval na poslední chvíli.
Téma práce je poměrně náročné a vychází ze zájmů studenta. Cílem bylo zkoumat pokročilé aspekty metod posilovaného učení. Téma je náročné kvůli nárokům na dobré pochopení řešené oblasti, obtížné interpretovatelnosti chování používaných metod a také obtížnému návrhu experimentů. Student navrhl vhodné experimenty, ale jejich provedení bylo kvůli výpadkům v jeho aktivitě oproti plánům omezené.
Podstatná část experimentů proběhla na poslední chvíli a práce byla odevzdána po termínu.
Student si aktivně vyhledal potřebné zdroje a využil je.
Konzultace byly velmi omezené a aktivita studenta byla sporadická.
Grade proposed by supervisor: D
Reviewer’s reportdoc. Ing. Michal Španěl, Ph.D.
Mr Kuchta did a great job designing and implementing the enviroment GridMazeWorld. He clearly has deep knowledge of the RL grid-like environments and their challenges. The second part, dedicated to RL experiments, appears unfinished: basic experiments were conducted, but the unsatisfactory transformer results were not further addressed. Time pressure likely accounts for both the RL experiments and the many shortcomings in the technical report.
Evaluation level: zadání splněno
The work designs and effectively implements a procedurally generated environment with a dynamic curriculum manager and experiments with baseline LSTM and transformer agents. The main emphasis is on the first part.
Evaluation level: je v obvyklém rozmezí
The work contains several chapters dedicated to RL and an overview of existing methods, but the author merely meets the common requirement of providing an overview without apparent effort to apply the acquired knowledge to his own design or identify the most suitable SotA approaches.
Chapter 6 presents the environment design clearly with well-motivated choices, though figures and pseudocode or flowcharts would improve clarity.
Figures 7.1 and 7.2 are overly simplistic and do not adequately describe the model architectures.
The structure of Chapters 8 and 9 is unnatural. Experiments are described on half a page with findings summarised in one sentence, while all relevant graphs appear in the next chapter without explanation of what they reveal or how to interpret them. It would be more appropriate to combine the graphs for different parameter settings (e.g., Figure 9.6) into a single graph, with one curve for each parameter.
I didn't find any formulation of the rewards the agent is getting - missing such a “detail” is a pity.
The typographic and linguistic aspects of the work are good.
The literature is broad, but the author's presentation does not indicate how it influenced his RL model design. Chapter 5 on existing environments is well written, briefly summarising many variants and their limitations, and noting the potential of his own solution, GridMazeWorld. However, an overview of which specific RL models and techniques have been successful on benchmarks is missing - this would have been an ideal starting point for the author's own baseline.
The program solution is the most extensive result and deserves recognition. The algorithmic design of the environment generator is well thought out, implemented in C++ for efficiency, with parallel generation and execution, and includes Python bindings.
Code quality and structure are generally good. However, large parts of the C++ code are undocumented. The GridMazeWorld class is too large and should be split into smaller components: world representation, generation algorithms, execution logic, and customisable parts such as reward calculation.
The environment generator and runtime scripts offer an interesting alternative to existing solutions. The author also presents some nice ideas for extending the environment!
Evaluation level: obtížnější zadání
The topic places high demands on the literature review of reinforcement learning and on the realisation of the environment itself. Understanding and training RL techniques that leverage memory requires a deep understanding of those principles.
Grade proposed by reviewer: C
Responsibility: Mgr. et Mgr. Hana Odstrčilová