Master's Thesis

Počítačové vidění nad Malleable Glyphs

Final Thesis 19.34 MB

Author of thesis: Bc. Přemek Janda

Acad. year: 2025/2026

Supervisor: prof. Ing. Adam Herout, Ph.D.

Reviewer: Ing. Michal Hradiš, Ph.D.

Abstract:

While humans can intuitively estimate continuous values from visuals such as glyphs, for computer vision systems, this task presents a significant challenge, especially under limitations such as sparse or out-of-distribution input data. To tackle this problem, this thesis proposes a deep learning approach framing glyph analysis as a continuous regression task. The work introduces a generation pipeline for rasterized malleable glyphs and evaluates architectures based on Convolutional Neural Networks (CNN) and Vision Transformers (ViT). A core contribution of this work is the design of a VAE-assisted architecture utilizing a probabilistically regularized latent space, which decouples the geometric identity of a glyph from its magnitude. Through a series of experiments, the thesis evaluates the perception and model's capacity for interpolation and zero-shot transfer. The final results confirm that lightweight CNN backbones coupled with structured latent space division yield the highest stability and generalization performance.

Keywords:

Malleable glyphs, Computer vision, Deep learning, Information visualization, Value estimation, Regression, Synthetic data, CNN, Vision Transformer

Date of defence

24.06.2026

Result of the defence

Defended (thesis was successfully defended)

znamkaCznamka

Grading

C

Process of defence

Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm C.

Topics for thesis defence

  1. Could you precisely explain the terms “visual attribute”, “rendering style”, “visual features”, and “glyph type”? Are these general terms, or are they specific to the rendering package used?
  2. Did all experiments use augmentations, and what was the augmentation strength? How do the augmentations interact with glyphs that rely on the augmented properties, such as size, rotation, or color?
  3. Does it make sense to test out-of-distribution performance using such large distribution shifts, for example from “letters” to “VUT”?
  4. The text mentions that the networks could potentially process images captured by a camera. How would this work for glyphs that rely on properties such as size, rotation, or color?
  5. Does it make sense for both humans and, especially, the proposed neural networks to estimate the “value” of a glyph without clearly indicating the limits of the glyph? How can a network infer the absolute value of a novel glyph?
  6. In the perception experiment, was the glyph ordering the same for all participants? Could the other option be more suitable?
  7. Proč jste práci vytisknul černobíle? 
  8. V jakém formátu byste chtěl odevzdávat závěrečnou práci? 
  9. Jaký sémantický smysl dává interpolace mezi glyphy? 

Language of thesis

English

Faculty

Department

Study programme

Information Technology and Artificial Intelligence (MITAI)

Specialization

Computer Vision (NVIZ)

Composition of Committee

prof. Ing. Adam Herout, Ph.D. (předseda)
prof. Ing. Martin Čadík, Ph.D. (místopředseda)
doc. RNDr. Milan Češka, Ph.D. (člen)
prof. Dr. Ing. Pavel Zemčík, dr. h. c. (člen)
Ing. David Bařina, Ph.D. (člen)
Ing. Tomáš Milet, Ph.D. (člen)

Supervisor’s report
prof. Ing. Adam Herout, Ph.D.

Řešiteli se podařilo odvést množství práce a navrhnout několik postupů pro experimentální práci s počítačovým viděním na malleable glyphs. Řešení by možná prospělo intenzivnější konzultování s vedoucím.

Evaluation criteria Verbal classification
Informace k zadání

Zadání bylo výzkumného charakteru; řešitel měl přijít s inovativními postupy k řešení nového problému. Řešiteli se podařilo navrhnout několik zajímavých přístupů a jeho výstupy jsou hodnotné.

Aktivita při dokončování

Práce byla dokončena včas. I při dokončování pracoval řešitel samostatně a znění technické zprávy téměř nekonzultoval.

Publikační činnost, ocenění

N/A

Práce s literaturou

Řešitel dobře proniknul do problematiky, seznámil se s množstvím postupů strojového učení a počítačového vidění a osvojil si praktickou práci v této oblasti.

Aktivita během řešení, konzultace, komunikace

Řešitel pracoval hodně samostatně a na konzultace docházel ne často a až po urgencích.

Points proposed by supervisor: 75

Grade proposed by supervisor: C

Reviewer’s report
Ing. Michal Hradiš, Ph.D.

The student explored a novel area and tested interesting approaches in large number of experiments. However, the general motivation is not clear, some experiments may be flawed and the presentation of results is rather confusing. The text is hard to follow and related work is missing for TTA and auxiliary reconstruction loss.

Evaluation criteria Verbal classification Points
Rozsah splnění požadavků zadání

Evaluation level: zadání splněno

Rozsah technické zprávy

Evaluation level: přesahuje obvyklé rozmezí

The thesis is too long; it could have been shorter and more concise.

Prezentační úroveň technické zprávy

The text is often rather confusing, imprecise and hard to follow. Sometimes it is not clear if the text explains "general" ideas or some specific instances and it does not distinguish precisely between own and previous ideas. The structure is often confusing. 

Selected specific comments (only some):

  • Datasets in experiments are not defined well enough. Description of  experiments and their analysis is often confusing.
  • Introduction is not clear and motivation is weird. Why are there LLMs
  • Chapter 2 Glyphs and Visual Perception should be more focused to topics needed to understand the thesis. Should be more precise.
  • 2.2 Representational Formats - vector and raster graphics probably not relevant. If it is an overview, other possibilities exist: implicit shapes, physical, ...
  • Figure 2.4 - The caption is probably for different figure.
  • 2.4.2 - Refences some "artificial neural network", but the reader has no clue what that is.
  • Page 13 - The explanation of  "rendering style" is confusing and unclear. 
  • Chapter 3 unnecessarily presents some basic concepts.
  • DeiT, DINO, and MAE are not "architectures".
  • 3.1.3 Adaptation for Value Estimation (Regression) - creating a regression head and training it is trivial. Using sigmoid at the output is problematic. This should be a review of existing options, but it is not.
  • 3.2.2 Variational Autoencoders (VAE) - Using "reconstruction" as auxiliary task is not novel idea. The text does not mention that.
  • Spherical Linear Interpolation (SLERP) - are these "novel" ideas. No prior work mentioned. 
  • 3.3 Final Considerations - why does it mention vector images - using them as network input was (is) never considered.
  • 4.1.2 Issue of Interpolation and Extrapolation - This section is imprecise, confusing and the thesis would be better without it.
  • 4.1.3 Curse of Dimensionality - Similar to 4.1.2 and probably not relevant.
  • 4.1.6 Input Representation Analysis - not relevant.
  • 4.1.7 Resolution - Needs source or evidence for "To keep glyphs visually distinct, the resolution should be at least 50 px."
  • Figures 4.5 to 4.7 - Why do train and data have the same distribution in all cases?
  • 4.2.2 - The zip files are an irrelevant technical detail.
  • B-splines are mentioned only 2x in the text. I did not understand how exactly are they used.
  • 5.2.4 Dynamic Decoder - What is "target resolution’s base grid"?
  • 5.4.2 Test-Time Adaptation - Original idea or missing references?\
  • 5.4.3 Hardware and Parallelization - Why consider training parallelization - running many experiments.
  • What is "GPU thread locks occurred quite frequently"? Does not make sense.
  • What is the "default size" (5, 100, 200)?
  • ...
65
Formální úprava technické zprávy

The thesis contains many high-quality figures and tables. It is also generally well formatted. On the other hand, it contains rather larger amount of typos and wrong or incomprehensible sentences. Typographic issues:

  • Referencing Figures without “figure” e.g. B.2a
  • Missing references to figures.
  • Figures and tables in the middle of text. 
  • Tables overflowing into margins.
  • Many equations are not numbered.
  • No text between titles.
75
Práce s literaturou

The thesis references 61 high quality sources. However, most sources cover topics of either glyphs and visual perception or various neural networks (image backbone architectures, their pre-training, generative models, VAE). The thesis is missing literature review on how to solve related tasks to the one addressed (e.g. visual object ranking, domain adaptation) and it does not mention sources for techniques used (e.g. Test Time Adaptation, auxiliary reconstruction losses). Sometimes, it is not precisely clear which ideas are novel and original. Some parts are missing sources.

Few specific issues not mentioned before:

  • 2.3.3 Wording of the Challenge - Contains a block citation without a source.
  • 3.1.3 Adaptation for Value Estimation  - Missing references.
  • 5.4.2 Test-Time Adaptation (TTA) - No sources. 
65
Realizační výstup

Student performed many experiments including a human study. He also proposed and tested two interesting ideas - Test-Time Adaptation and auxiliary reconstruction loss with latent variable disentanglement. On the other hand, the presentation of results is rather confusing and some of the experiments may be poorly designed or flawed.

90
Využitelnost výsledků

The student explored a novel area and tested interesting approaches. However, the motivation is not clear, some experiments may be flawed and the presentation of results is rather confusing.

Náročnost zadání

Evaluation level: obtížnější zadání

The thesis addresses a slightly unusual and unexplored topic, where the student had to apply his own judgement which directions are worth pursuing including  methods, evaluation methodologies, and experiments.

Topics for thesis defence:
  1. Could you precisely explain the terms “visual attribute”, “rendering style”, “visual features”, and “glyph type”? Are these general terms, or are they specific to the rendering package used?
  2. In the perception experiment, was the glyph ordering the same for all participants? Could the other option be more suitable?
  3. Does it make sense to test out-of-distribution performance using such large distribution shifts, for example from “letters” to “VUT”?
  4. Does it make sense for both humans and, especially, the proposed neural networks to estimate the “value” of a glyph without clearly indicating the limits of the glyph? How can a network infer the absolute value of a novel glyph?
  5. The text mentions that the networks could potentially process images captured by a camera. How would this work for glyphs that rely on properties such as size, rotation, or color?
  6. Did all experiments use augmentations, and what was the augmentation strength? How do the augmentations interact with glyphs that rely on the augmented properties, such as size, rotation, or color?
Points proposed by reviewer: 77

Grade proposed by reviewer: C

Responsibility: Mgr. et Mgr. Hana Odstrčilová