Master's Thesis

Počítačové vidění nad Malleable Glyphs

Author of thesis: Bc. Přemek Janda

Acad. year: 2025/2026

Supervisor: prof. Ing. Adam Herout, Ph.D.

Abstract:

While humans can intuitively estimate continuous values from visuals such as glyphs, for computer vision systems, this task presents a significant challenge, especially under limitations such as sparse or out-of-distribution input data. To tackle this problem, this thesis proposes a deep learning approach framing glyph analysis as a continuous regression task. The work introduces a generation pipeline for rasterized malleable glyphs and evaluates architectures based on Convolutional Neural Networks (CNN) and Vision Transformers (ViT). A core contribution of this work is the design of a VAE-assisted architecture utilizing a probabilistically regularized latent space, which decouples the geometric identity of a glyph from its magnitude. Through a series of experiments, the thesis evaluates the perception and model's capacity for interpolation and zero-shot transfer. The final results confirm that lightweight CNN backbones coupled with structured latent space division yield the highest stability and generalization performance.

Keywords:

Malleable glyphs, Computer vision, Deep learning, Information visualization, Value estimation, Regression, Synthetic data, CNN, Vision Transformer

Date of defence

24.06.2026

Result of the defence

Defended (thesis was successfully defended)

znamkaCznamka

Grading

Process of defence

Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm C.

Topics for thesis defence

Could you precisely explain the terms “visual attribute”, “rendering style”, “visual features”, and “glyph type”? Are these general terms, or are they specific to the rendering package used?
Did all experiments use augmentations, and what was the augmentation strength? How do the augmentations interact with glyphs that rely on the augmented properties, such as size, rotation, or color?
Does it make sense to test out-of-distribution performance using such large distribution shifts, for example from “letters” to “VUT”?
The text mentions that the networks could potentially process images captured by a camera. How would this work for glyphs that rely on properties such as size, rotation, or color?
Does it make sense for both humans and, especially, the proposed neural networks to estimate the “value” of a glyph without clearly indicating the limits of the glyph? How can a network infer the absolute value of a novel glyph?
In the perception experiment, was the glyph ordering the same for all participants? Could the other option be more suitable?
Proč jste práci vytisknul černobíle?
V jakém formátu byste chtěl odevzdávat závěrečnou práci?
Jaký sémantický smysl dává interpolace mezi glyphy?

Language of thesis

English

Faculty

Fakulta informačních technologií

Department

Department of Computer Graphics and Multimedia

Study programme

Information Technology and Artificial Intelligence (MITAI)

Specialization

Computer Vision (NVIZ)

Composition of Committee

prof. Ing. Adam Herout, Ph.D. (předseda)
prof. Ing. Martin Čadík, Ph.D. (místopředseda)
doc. RNDr. Milan Češka, Ph.D. (člen)
prof. Dr. Ing. Pavel Zemčík, dr. h. c. (člen)
Ing. David Bařina, Ph.D. (člen)
Ing. Tomáš Milet, Ph.D. (člen)

Supervisor’s report
prof. Ing. Adam Herout, Ph.D.

Řešiteli se podařilo odvést množství práce a navrhnout několik postupů pro experimentální práci s počítačovým viděním na malleable glyphs. Řešení by možná prospělo intenzivnější konzultování s vedoucím.

Evaluation criteria	Verbal classification
Informace k zadání	Zadání bylo výzkumného charakteru; řešitel měl přijít s inovativními postupy k řešení nového problému. Řešiteli se podařilo navrhnout několik zajímavých přístupů a jeho výstupy jsou hodnotné.
Aktivita při dokončování	Práce byla dokončena včas. I při dokončování pracoval řešitel samostatně a znění technické zprávy téměř nekonzultoval.
Publikační činnost, ocenění	N/A
Práce s literaturou	Řešitel dobře proniknul do problematiky, seznámil se s množstvím postupů strojového učení a počítačového vidění a osvojil si praktickou práci v této oblasti.
Aktivita během řešení, konzultace, komunikace	Řešitel pracoval hodně samostatně a na konzultace docházel ne často a až po urgencích.

Points proposed by supervisor: 75

Grade proposed by supervisor: C

Reviewer’s report
Ing. Michal Hradiš, Ph.D.

The student explored a novel area and tested interesting approaches in large number of experiments. However, the general motivation is not clear, some experiments may be flawed and the presentation of results is rather confusing. The text is hard to follow and related work is missing for TTA and auxiliary reconstruction loss.

Evaluation criteria	Verbal classification	Points
Rozsah splnění požadavků zadání	Evaluation level: zadání splněno
Rozsah technické zprávy	Evaluation level: přesahuje obvyklé rozmezí The thesis is too long; it could have been shorter and more concise.
Prezentační úroveň technické zprávy	The text is often rather confusing, imprecise and hard to follow. Sometimes it is not clear if the text explains "general" ideas or some specific instances and it does not distinguish precisely between own and previous ideas. The structure is often confusing. Selected specific comments (only some): Datasets in experiments are not defined well enough. Description of experiments and their analysis is often confusing. Introduction is not clear and motivation is weird. Why are there LLMs Chapter 2 Glyphs and Visual Perception should be more focused to topics needed to understand the thesis. Should be more precise. 2.2 Representational Formats - vector and raster graphics probably not relevant. If it is an overview, other possibilities exist: implicit shapes, physical, ... Figure 2.4 - The caption is probably for different figure. 2.4.2 - Refences some "artificial neural network", but the reader has no clue what that is. Page 13 - The explanation of "rendering style" is confusing and unclear. Chapter 3 unnecessarily presents some basic concepts. DeiT, DINO, and MAE are not "architectures". 3.1.3 Adaptation for Value Estimation (Regression) - creating a regression head and training it is trivial. Using sigmoid at the output is problematic. This should be a review of existing options, but it is not. 3.2.2 Variational Autoencoders (VAE) - Using "reconstruction" as auxiliary task is not novel idea. The text does not mention that. Spherical Linear Interpolation (SLERP) - are these "novel" ideas. No prior work mentioned. 3.3 Final Considerations - why does it mention vector images - using them as network input was (is) never considered. 4.1.2 Issue of Interpolation and Extrapolation - This section is imprecise, confusing and the thesis would be better without it. 4.1.3 Curse of Dimensionality - Similar to 4.1.2 and probably not relevant. 4.1.6 Input Representation Analysis - not relevant. 4.1.7 Resolution - Needs source or evidence for "To keep glyphs visually distinct, the resolution should be at least 50 px." Figures 4.5 to 4.7 - Why do train and data have the same distribution in all cases? 4.2.2 - The zip files are an irrelevant technical detail. B-splines are mentioned only 2x in the text. I did not understand how exactly are they used. 5.2.4 Dynamic Decoder - What is "target resolution’s base grid"? 5.4.2 Test-Time Adaptation - Original idea or missing references?\ 5.4.3 Hardware and Parallelization - Why consider training parallelization - running many experiments. What is "GPU thread locks occurred quite frequently"? Does not make sense. What is the "default size" (5, 100, 200)? ...	65
Formální úprava technické zprávy	The thesis contains many high-quality figures and tables. It is also generally well formatted. On the other hand, it contains rather larger amount of typos and wrong or incomprehensible sentences. Typographic issues: Referencing Figures without “figure” e.g. B.2a Missing references to figures. Figures and tables in the middle of text. Tables overflowing into margins. Many equations are not numbered. No text between titles.	75
Práce s literaturou	The thesis references 61 high quality sources. However, most sources cover topics of either glyphs and visual perception or various neural networks (image backbone architectures, their pre-training, generative models, VAE). The thesis is missing literature review on how to solve related tasks to the one addressed (e.g. visual object ranking, domain adaptation) and it does not mention sources for techniques used (e.g. Test Time Adaptation, auxiliary reconstruction losses). Sometimes, it is not precisely clear which ideas are novel and original. Some parts are missing sources. Few specific issues not mentioned before: 2.3.3 Wording of the Challenge - Contains a block citation without a source. 3.1.3 Adaptation for Value Estimation - Missing references. 5.4.2 Test-Time Adaptation (TTA) - No sources.	65
Realizační výstup	Student performed many experiments including a human study. He also proposed and tested two interesting ideas - Test-Time Adaptation and auxiliary reconstruction loss with latent variable disentanglement. On the other hand, the presentation of results is rather confusing and some of the experiments may be poorly designed or flawed.	90
Využitelnost výsledků	The student explored a novel area and tested interesting approaches. However, the motivation is not clear, some experiments may be flawed and the presentation of results is rather confusing.
Náročnost zadání	Evaluation level: obtížnější zadání The thesis addresses a slightly unusual and unexplored topic, where the student had to apply his own judgement which directions are worth pursuing including methods, evaluation methodologies, and experiments.

Topics for thesis defence:

Could you precisely explain the terms “visual attribute”, “rendering style”, “visual features”, and “glyph type”? Are these general terms, or are they specific to the rendering package used?
In the perception experiment, was the glyph ordering the same for all participants? Could the other option be more suitable?
Does it make sense to test out-of-distribution performance using such large distribution shifts, for example from “letters” to “VUT”?
Does it make sense for both humans and, especially, the proposed neural networks to estimate the “value” of a glyph without clearly indicating the limits of the glyph? How can a network infer the absolute value of a novel glyph?
The text mentions that the networks could potentially process images captured by a camera. How would this work for glyphs that rely on properties such as size, rotation, or color?
Did all experiments use augmentations, and what was the augmentation strength? How do the augmentations interact with glyphs that rely on the augmented properties, such as size, rotation, or color?

Points proposed by reviewer: 77

Grade proposed by reviewer: C

Responsibility: Mgr. et Mgr. Hana Odstrčilová

VUT

Faculties and university institutes

Parts

Počítačové vidění nad Malleable Glyphs