Přístupnostní navigace
E-application
Search Search Close
Bachelor's Thesis
Author of thesis: Bc. Lucie Klímová
Acad. year: 2024/2025
Supervisor: doc. Mgr. Lukáš Holík, Ph.D.
Reviewer: Mgr. Juraj Síč
This thesis is focused on the possibilities of using finite automata to accelerate the detection of structural domains of transposons. The central part of the thesis introduces a method based on deterministic finite automata (DFA) as a faster alternative to the BLASTX tool. BLASTX is used by the TE-greedy-nester tool, which is designed to detect LTR retrotransposons. As a starting point, we used the HMMER tool, which uses a profile hidden Markov model (PHMM) to precisely describe the character of the searched sequence. Due to the significant nondeterminism of PHMMs, the determinization of the model of the entire domain proved unfeasible. Instead, a method to transform a PHMM into several smaller DFAs designed to detect domain subsequences was introduced. Closely located occurrences of these subsequences are subsequently interpreted as occurrences of the entire domain. The experimental evaluation demonstrated that the presented approach maintains high accuracy while achieving up to a tenfold search speedup compared to BLASTX.
profile hidden Markov models (PHMM), deterministic finite automata (DFA), LTR retrotransposons
Date of defence
16.06.2025
Result of the defence
Defended (thesis was successfully defended)
Grading
A
Process of defence
Studentka nejprve prezentovala výsledky, kterých dosáhla v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Studentka následně odpověděla na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studentky na položené otázky rozhodla práci hodnotit stupněm A.
Topics for thesis defence
Language of thesis
English
Faculty
Fakulta informačních technologií
Department
Department of Intelligent Systems
Study programme
Information Technology (BIT)
Composition of Committee
doc. RNDr. Milan Češka, Ph.D. (předseda) Ing. Zbyněk Křivka, Ph.D. (člen) Ing. Zdeněk Materna, Ph.D. (člen) doc. Ing. Jan Kořenek, Ph.D. (člen) Ing. Jaroslav Rozman, Ph.D. (člen)
Supervisor’s reportdoc. Mgr. Lukáš Holík, Ph.D.
The work and results deserve to be priced by themselves, but the students tenacity and overall ability to conduct all kinds of research task highly independently stands out even among the best students.
An extremely difficult research assignment. It combines a research on advanced automata-based pattern matching with advanced technology of gene annotations. The assignment is even more difficult due to the supervisor's illiteracy in gene annotation and computational biology. It is also very ambitious, aiming at outperforming state-of-the-art tools in gene annotation.
Exceptional. The student has compensated for my lack of knowledge in computational biology and studied on need literature form that field by herself, and generally worked with the literature competently and efficiently.
This project has been going on for several years, starting by an experimental collaboration and slowly evolving into a serious research endevour with a high potential to generate publications. Most of the credit for this goes to the student, who compensated for my lack of knowledge in computational biology, carried on with tenacity even after major setbacks (we run into several dead ends on the way, for instance with the Alergia learning approach), solved technical problems, understood quickly her advisor's half baked ideas and turned them into meaningful conversations and later solutions, and came up with very good and fresh ideas by herself. She has shown an exceptional abilities in theory and algorithm design as well as in implementation. The amount of work actually exceeds an excellent diploma thesis.
I was able to review the entire text before it was handed in.
The work is almost ready for publication, it may even turn into a research direction.
Grade proposed by supervisor: A
Reviewer’s reportMgr. Juraj Síč
The student presented two novel techniques for the detection of structural domains of transposons. The first approach, based on learning automata, did not yield competitive results. The second one is based on splitting profile hidden Markov models and turning them into deterministic finite automata which are then used for fast matching. The results show that it can outcompete the state-of-the-art tool BLAST. Considering the quality and scope of the work, I believe it would be suitable even for a master's thesis. Therefore, I recommend awarding the grade A.
Evaluation level: obtížnější zadání
I find the assignment quite difficult, as it required the student to study advanced methods of biological sequence alignment along with related topics in automata theory.
The thesis is very well written, with a good number of examples showing how the proposed algorithm works.
The report is written in solid English, with only very minor problems.
The proposed algorithm was implemented in a new tool named DUCK. The tool is properly documented and was extensively compared with existing tools BLAST and HMMER.
I think that the work could lead to a publication.
Evaluation level: zadání splněno
Evaluation level: přesahuje obvyklé rozmezí
Even though the thesis is longer than usual, it contains no unnecessary information. The length is justified and reflects the substantial amount of work the student has completed.
The thesis contains an extensive overview of existing methods, with appropriate selection and citation of relevant sources. The student’s own work is clearly distinguished from existing approaches.
Grade proposed by reviewer: A
Responsibility: Mgr. et Mgr. Hana Odstrčilová