Přístupnostní navigace
E-application
Search Search Close
Bachelor's Thesis
Author of thesis: Marek Hric
Acad. year: 2025/2026
Supervisor: Ing. Vladimír Malenovský, Ph.D.
Reviewer: Santosh Kesiraju, Ph.D.
This bachelor's thesis focuses on the problem of speech and music classification under real-time and causality constraints. The standard 2-class speech-vs-music formulation of the reference methods is extended with a third class for silence and ambient noise. The main objective is to design and implement a classifier satisfying these constraints and to objectively evaluate it against existing reference methods on a sufficiently diverse dataset. The first part of the thesis provides an overview of current approaches to audio signal classification, covering both traditional machine learning methods based on handcrafted features and modern neural network-based approaches. The second part focuses on the design and implementation of the dataset construction pipeline used to assemble the dataset for training and evaluation. The third part presents the experimental work. Four reference methods are implemented and compared using a benchmark on the constructed dataset. The subsequent experiments focus on the best-performing Temporal Convolutional Network (TCN). Based on these experiments, the proposed method called TCN-S is derived. The reference and proposed methods are then compared and analyzed. The results show that TCN-S achieves performance comparable to the reference TCN architecture while using only a fraction of the parameters and exhibiting lower computational latency.
speech and music classification, on-line audio classification, causal Temporal Convolutional Network (TCN), deep learning, machine learning, dataset construction
Date of defence
16.06.2026
Result of the defence
Defended (thesis was successfully defended)
Grading
A
Process of defence
Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázku oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm A.
Topics for thesis defence
Language of thesis
English
Faculty
Fakulta informačních technologií
Department
Department of Computer Graphics and Multimedia
Study programme
Information Technology (BIT)
Composition of Committee
doc. Ing. Lukáš Burget, Ph.D. (předseda) doc. Mgr. Adam Rogalewicz, Ph.D. (místopředseda) Ing. Libor Polčák, Ph.D. (člen) Ing. Michal Hradiš, Ph.D. (člen) Ing. Martin Žádník, Ph.D. (člen)
Supervisor’s reportIng. Vladimír Malenovský, Ph.D.
The student demonstrated solid technical skills, particularly in dataset construction and experimental analysis, and proposed a lighter variant of a TCN model with competitive performance. The results are well supported by experiments, but the contribution remains mainly incremental. Overall, the work meets the task assignement, though it lacks some stronger originality.
This bachelor thesis addresses the problem of real-time speech/music classification. The overall difficulty is moderate, as the topic has been extenssively studied and there are numerous existing references in the literature describing baseline methods, evaluation metrics, and common approaches. The task combines the evaluation of classical methods and neural networks, implementation of the schosen approaches, dataset design, and experimental evaluation.
The student worked independently, consulted regularly, and made a steady progress. A strong aspect of this work is the construction of a reasonably balanced dataset, together with the implementation and evaluation of several reference methods.
The experimental part focuses on the Temporal Convolutional Network and leads to a simplified variant (TCN-S) with comparable accuracy and lower computational complexity. This represents a useful abd a practical contribution.
However, the assignment also required proposing improvements over the state-of-the-art method, which was not fully achieved. The student explored only one neural architecture. The results are technically sound but mainly incremental.
Despite the lack of originality, in my opinion, the thesis fullfills the assignment.
The student worked well with the relevant sources on speech/music classification. Due to the lack of time he did not explore some of the latest state-of-the-art neural techniques in greater detail but this did not have an negative impact on the experimental part of his work. Based on his study, he was able to select and successfully implement the selected methods for evaluation.Overall, the student demonstrated the capability of studying and utilizing the available literature for an independent work.
The student was moderaltely active throughout the work. He regularly attended consultations, and was generally well prepared for them. He communicated his progress and responded well to my feedback.
Teh agreed sub-tasks were mostly followed, and the student demonstrated a consistent effort during the development of the thesis. Overall, the level of activity, communication, and engagement was satisfactory.
The thesis was completed very close to the submission deadline, and the final version was consulted only to a limited extent. While the main parts of the work were discussed during the development, the final form could have benefited from a more thorough consultation and refinement.
To the best of my knowledge, there has been no publication activity or awards related to the thesis. The thesis is accompanied with an open-source software demo allowing thorough examination and possibly a follow-up.
Grade proposed by supervisor: C
Reviewer’s reportSantosh Kesiraju, Ph.D.
The work done in the thesis is very solid interms of the presentation and experiments. The presentation of the results is especially very clear and elegant.
Evaluation level: průměrně obtížné zadání
The report is fairly well written. All chapters are logically connected.
- The work might have practical usecases as it builds on existing works with an emphasis on practical usability -- real time classification while being computation budget friendly.
Evaluation level: zadání splněno
Evaluation level: je v obvyklém rozmezí
Grade proposed by reviewer: A
Responsibility: Mgr. et Mgr. Hana Odstrčilová