Přístupnostní navigace
E-application
Search Search Close
Bachelor's Thesis
Author of thesis: Timur Nurtdinov
Acad. year: 2025/2026
Supervisor: doc. Ing. Vítězslav Beran, Ph.D.
Reviewer: Ing. Michal Hradiš, Ph.D.
This bachelor’s thesis presents a system that automates the creation of multi-voice audiobooks from EPUB files to overcome the high costs and time constraints of traditional production. The solution leverages Large Language Models to automatically analyze the text, detect scenes, extract characters, and attribute dialogue. Speech is then synthesized using advanced Text-to-Speech engines. Key features of the system include a multi-stage text analysis pipeline and a web-based system, which enables users to select specific voices for characters, configure the narrator’s style, and overlay ambient sound onto the final audio. This approach significantly accelerates audiobook production while preserving the user’s creative control.
audiobook, multi-voice speech synthesis, text-to-speech, large language models, user inter- face, web application
Date of defence
15.06.2026
Result of the defence
Defended (thesis was successfully defended)
Grading
A
Process of defence
Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm A.
Topics for thesis defence
Language of thesis
English
Faculty
Fakulta informačních technologií
Department
Department of Computer Graphics and Multimedia
Study programme
Information Technology (BIT)
Composition of Committee
prof. Ing. Adam Herout, Ph.D. (předseda) doc. Mgr. Adam Rogalewicz, Ph.D. (místopředseda) Ing. Vladimír Bartík, Ph.D. (člen) Ing. Michal Hradiš, Ph.D. (člen) Ing. Josef Strnadel, Ph.D. (člen)
Supervisor’s reportdoc. Ing. Vítězslav Beran, Ph.D.
Student Timur Nurtdinov dedicated himself to the project conscientiously and with great interest, demonstrating an outstanding capacity for independent work. Following the successful deployment of the core system, he methodically expanded the solution to include a high-quality GUI for the management and user parameterisation of the entire process. The student successfully proposed partial solutions, effectively resolved technical challenges during integration, and delivered an exceptionally high-quality final product.
The bachelor's thesis focuses on automating the processing of electronic books to generate audio versions with dramatisation. The topic requires a deep understanding of LLM attributes, including both text analysis and text-to-speech generation. The student successfully fulfilled all aspects of the assignment: he designed the automated process while preserving creative user control, selected suitable models, prepared a testing data set, and executed evaluation experiments. The resulting solution meets high standards, is entirely self-contained, and does not depend on previous projects.
The student drew from an extensive list of relevant technical literature and materials regarding LLMs and speech generation. He utilised several essential academic sources effectively, while less methodical, non-peer-reviewed references, such as documentation and software manuals, are mostly cited in the footnotes.
Timur Nurtdinov was highly active and intensely interested in the topic. He attended consultations thoroughly prepared and according to the schedule. In the initial phase, the student focused primarily on the functionality of individual system elements and their integration into the overall pipeline. Gradually, he adopted a more methodical approach, focusing on data structures, user inputs, intermediate results, and the interfaces between system components, thereby successfully shifting attention toward specific tasks with high technical added value.
The work on developing the pipeline and preparing the data set progressed continuously according to the schedule, allowing the thesis to be completed well ahead of the deadline. After implementing the basic functional version, the student methodically elaborated on individual components, finalised the user interaction process and control mechanisms, and built a functional web application with a GUI. The final content was fully consulted, and all recommendations were incorporated.
The paper was presented at the Excel@FIT 2026 student conference.
Grade proposed by supervisor: A
Reviewer’s reportIng. Michal Hradiš, Ph.D.
The student created a relatively complex application that, in its current form, is already usable for the intended task. He worked creatively and systematically, and he comprehensively tested the resulting solution. With further work, the software could be turned into a successful open-source project or product.
Evaluation level: obtížnější zadání
The topic is complex and involves integrating many components.
The text is understandable and readable. I like that it starts with “Traditional Audiobook Production.” It maintains a relatively high level of abstraction, but I find that suitable, and it keeps the text length within a reasonable range.
I would probably separate the UI description from some of the implementation details, but I find the current arrangement acceptable.
I am missing a review of at least some of the many existing tools for automatic and semi-automatic audiobook production, including their capabilities and limitations.
The text is well written and contains only very few errors, although it is sometimes missing a definite or indefinite article. The formatting is often good, but there are also several issues:
The created tool is functional and usable. It is able to segment speech, assign voices, select background sounds for individual scenes, and render a full audiobook. The created web application allows users to manage the process and correct or adjust some aspects of audio generation. The automated processing steps were well tested on a custom annotated dataset. The student also performed user testing.
The application definitely still has room for additional functionality, but most of it is very well summarized in the “Future Work” section of the thesis.
The created application could be a good starting point, for example, for a cool open-source project.
Evaluation level: zadání splněno
Evaluation level: je v obvyklém rozmezí
The thesis cites 29 relevant sources, consisting of a mix of peer-reviewed publications, documentation, web sources, a book, and a standard. The sources are mostly sufficient for the thesis, except for the missing review of similar existing tools.
In a few places, the sources should be cited a bit more diligently, but overall I find the sources and their use reasonable.
Grade proposed by reviewer: A
Responsibility: Mgr. et Mgr. Hana Odstrčilová