Přístupnostní navigace
E-application
Search Search Close
Master's Thesis
Author of thesis: Ing. Petr Mičulek
Acad. year: 2022/2023
Supervisor: Ing. Jakub Špaňhel, Ph.D.
Reviewer: doc. Ing. Vítězslav Beran, Ph.D.
The goal of this thesis is to explore, develop, and evaluate explainable face presentation attack detection (PAD) systems. PAD systems act as security filters for face recognition, preventing spoofed faces from reaching the identification phase. These systems are a necessary component enabling the recent rise of biometric systems used in smartphones and security cameras. While neural networks are the standard method for this task, they are commonly a black-box method providing no explanation. To provide a better understanding of the detection process, input attribution methods are applied. Their suitability is studied and various variants are compared. Of the seven methods compared, GradCAM using test-time augmentation is evaluated as the best, achieving a deletion metric AUC of 0.658 and an insertion metric AUC of 0.908. Experiments with the explanations show their limited capability at helping understand the model, but provide hints at how the predictive accuracy of the PAD system can be verified, and possibly improved.
Machine Learning (ML), Interpretable ML, ML Explainability, Convolutional Neural Networks, Face Liveness, Face Presentation Attack Detection
Date of defence
24.08.2023
Result of the defence
Defended (thesis was successfully defended)
Grading
C
Process of defence
Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm C.
Topics for thesis defence
Language of thesis
English
Faculty
Fakulta informačních technologií
Department
Department of Computer Graphics and Multimedia
Study programme
Information Technology and Artificial Intelligence (MITAI)
Specialization
Sound, Speech and Natural Language Processing (NSPE)
Composition of Committee
prof. Ing. Adam Herout, Ph.D. (předseda) prof. Ing. Martin Čadík, Ph.D. (člen) Ing. František Grézl, Ph.D. (člen) Ing. Michal Hradiš, Ph.D. (člen) Ing. David Bařina, Ph.D. (člen) doc. Mgr. Adam Rogalewicz, Ph.D. (člen)
Supervisor’s reportIng. Jakub Špaňhel, Ph.D.
Student naplnil zadání dle požadavků. Natrénoval model pro klasifikaci živosti tváře a provedl experimenty s metodami vysvětlitelnosti nad tímto modelem.
Student měl za úkol experimentovat s metodami vysvětlitelnosti pro klasifikaci živosti tváři (anglicky - Face Anti-Spoofing / Presentation Attack Detection). Cílem bylo otestovat jednotlivé metody a jejich vhodnost pro daný typ úlohy a zjistit požadavky na architekturu / model neuronové sítě, aby byla metoda vysvětlitelnosti použitelná.
Student postupoval dle pokynů vedoucího / zadání práce. Dále si sám dohledal veškeré potřebné zdroje a další literaturu.
V první semestru byl student aktivní. Poté vyjel na erasmus do Francie a práce postupovala velmi pomalu. Konzultace probíhali hlavně na začátku a ke konci stanovené doby na řešení práce. Student však byl na konzultace vždy připravený.
Práce byla dokončována velmi blízko termínu odevzdání. Text práce jsem měl k dispozici po celou dobu řešení. Finální verzi textu student konzultoval před odevzdáním.
-
Grade proposed by supervisor: C
Reviewer’s reportdoc. Ing. Vítězslav Beran, Ph.D.
The author was introduced to a relatively new and advanced topic of explainable ML. Using existing architectures, he trained a CNN model for Presentation Attack Detection on a custom dataset. He selected and used appropriate state-of-the-art techniques to analyze and explain the behavior of one of the models. The weaker part is the scope and clarity of some parts of the technical report. The experiments and their results are presented well, but it is not very clear what modifications to the model being explained these results should lead to. The software solution is of very good quality. The thesis deals with an advanced problem, is of an excellent overall standard, and is carefully elaborated.
Evaluation level: obtížnější zadání
The assignment requires study and a good understanding of the fairly new and advanced issue of explainability and interpretability in ML.
Evaluation level: zadání splněno
Evaluation level: splňuje pouze minimální požadavky
The technical report contains all relevant information. Nevertheless, its quality would benefit from more space devoted to information on existing Presentation Attack Detection systems (Chapter 2), a more comprehensive overview of the explainable ML approaches and their properties for the solved problem (Chapter 3), and above all an explanation of the choice of methods and their properties and how they are used in the solved problem (Chapter 4).
The technical report has a logical structure, but the sub-chapter headings could be better balanced, both in content and scope. Although there are links between the chapters, some essential information is not very clear. The overall clarity of the issues and solutions presented would have been helped by a better indication of the context and reason for the text section in the introductions of the (sub)chapters. The author presents methods for explainability, but does not further specify what is the output of the different approaches and, above all, how they are interpreted and thus how they "explain" the model (e.g. Figures 3.1-3.3).
The presentation of the experiments could include a clearer and more understandable explanation of the problem the experiment aims to solve, including e.g. the ideal or worst-case outcome. While the text does include a discussion of the experiments and results, it is pretty difficult for the reader to understand their meaning. Furthermore, it is not clear why the author experiments with the two classification architectures ResNet18 and EfficientNetV2s, what are the conclusions from the "all-attacks" experiments, and which architecture is subsequently used for other experiments and scenarios. The results in Chapter 6.1 are not interpreted, but only presented. The significance of the experiments presented in Chapter 6.2 is not clear. Minor errors include e.g. missing reference to Figure (p. 23) or an error in Chapter 6.1.2 (should not be "one-attack" but "unseen-attack").
The above-mentioned shortcomings are a very demanding requirement in the given issue and therefore the presentation level of the technical report can be still assessed as good.
The technical report is written in English and in the experience of the opponent, the language quality can be assessed as very good. The text is more or less free of errors and is written in a professional and comprehensible manner. Similarly, the typographical level of the report is excellent.
The author draws on 55 study sources, which are primarily scientific articles. This is a relevant selection of literature from which the author draws appropriately. It is clear from the technical report that the author understands the methods he has chosen and used for his solution. The question is, whether it would not be more appropriate to base the review of the topic on a relevant book (e.g. [28]) and then make a shortlist of publications. After all, it is a challenging task to comprehensively process almost 50 scientific publications into a somewhat more concise theoretical overview. Source [26] does not have a publisher, and source [12] would be more suited to a footnote.
The realization output is a custom dataset built on a relevant existing RoseYout dataset, two trained models for Presentation Attack Detection built on CNN architectures ResNet-18 and EfficientNetV2_s, and experimental results explaining the behavior of the selected "all-attacks" model. The selected detection model is explained using different variants of the Class Activation Mapping (CAM) method and by the visualization analysis of sample perturbations using PCA and t-SNE approaches. The preparation of the actual dataset uses a variety of supporting tools (e.g. MTCNN for sample alignment). The realization output includes several relevant scripts for data handling, model training, processing and presentation of experimental results, visualizations for model explanation, etc. The software solution is built on relevant and up-to-date libraries and other existing solutions, which are clearly separated by authorship. It is well documented and contains careful descriptions of functions and authorship.
The result is a trained model for Presentation Attack Detection and an analysis explaining the model behavior. The analysis is based on an appropriate procedure and is relevant. To better use the results of the work, conclusions should be drawn that would lead to further modifications of the model and analyze also other scenarios ("unseen-attack" and "one-attack"). The results can serve as a good start for further analyses in Presentation Attack Detection task as well as in other image classification tasks.
Grade proposed by reviewer: B
Responsibility: Mgr. et Mgr. Hana Odstrčilová