Master's Thesis

Audio inpainting in the time-frequency domain using instantaneous frequency

Final Thesis 4.8 MB Appendix 10.36 MB

Author of thesis: Ing. Peter Balušík

Acad. year: 2024/2025

Supervisor: prof. Mgr. Pavel Rajmic, Ph.D.

Reviewer: Dr. Kohei Yatabe

Abstract:

The master's thesis focuses on the problem of audio inpainting in the time-frequency domain. The main goal of this thesis was to propose a method that solves this problem using a phase-aware optimization algorithm that exploits the instantaneous frequency. Two methods of solving this problem were proposed. One method solves it using the Chambolle–Pock algorithm. It acts solely in the time-frequency domain. The other method solves the problem using the generalized Chambolle–Pock algorithm. Instead of working only in the time-frequency domain, it utilizes the short-time Fourier transform to alternate between the time domain and the time-frequency domain, improving the overall quality of the reconstruction. The proposed methods were objectively and subjectively compared with other established inpainting methods in the time-frequency domain. The proposed method utilizing the generalized Chambolle–Pock algorithm outperformed all other methods in the objective evaluation and in the conducted listening test. The other proposed method performed similarly to one of the already established methods, both by objective metrics and subjectively. In addition, the proposed methods were less computationally demanding than the established methods.

Keywords:

audio inpainting, convex optimization, Chambolle–Pock algorithm, instantaneous frequency, phase-aware optimization, short-time Fourier transform, time-frequency domain

Date of defence

09.06.2025

Result of the defence

Defended (thesis was successfully defended)

znamkaAznamka

Grading

A

Process of defence

Student prezentoval výsledky své práce a komise byla seznámena s posudky. Otázky oponenta: What made the subjective quality so different (as in Fig. 5.9) among the four compared methods? Which part of the proposed method contributes to the difference? How can the proposed method be improved in future work? Student obhájil diplomovou práci a odpověděl na otázky členů komise a oponenta.

Language of thesis

English

Faculty

Department

Study programme

Communications and Informatics (MPC-TIT)

Composition of Committee

prof. Ing. Zdeněk Smékal, CSc. (předseda)
Ing. Ondřej Mokrý, Ph.D. (člen)
Ing. Rudolf Vohnout, Ph.D. (člen)
Ing. Jiří Přinosil, Ph.D. (člen)
doc. Ing. Pavel Šilhavý, Ph.D. (člen)
doc. Ing. Martin Vaculík, Ph.D. (člen)
prof. Ing. Aleš Prokeš, Ph.D. (místopředseda)

Supervisor’s report
prof. Mgr. Pavel Rajmic, Ph.D.

Bc. Peter Balušík přizpůsobil algoritmus PHAIN dostupný v literatuře pro případ doplňování chybějících sloupců spektrogramu. Student konzultoval pravidelně, nicméně pracoval velmi samostatně a přicházel s vlastními nápady na řešení obtíží. Výsledkem diplomové práce je algoritmus, který podle objektivních i subjektivních metrik překonává všechny ostatní metody pro tento typ úlohy. Text je napsán v angličtině, s minimem jazykových nebo typografických chyb. Vytknout by se možná dal poměrně zdlouhavý průběh vysvětlování problematiky; k samotnému návrhu se totiž čtenář dostane až po 40 stranách textu. Points proposed by supervisor: 98

Grade proposed by supervisor: A

Reviewer’s report
Dr. Kohei Yatabe

The diploma thesis of Bc. Peter Balušík worked on audio inpainting in the time-frequency domain.
In the thesis, new audio inpainting methods were proposed based on the existing method for the time domain setting. One of the proposed methods works well and outperforms the existing methods in the time-frequency domain setting. Its idea is interesting and seems promising. I hope I can see it somewhere in the literature.
The thesis is easy to read, but the mathematical notations and theoretical parts could be polished to make them more consistent across the entire thesis. I understand that the topics contain somewhat complicated contents, but in that case the author should care more about the readers, especially those in the other field. Moreover, it would be better to draw the figures by his own instead of adopting from the existing literature because drawing the figures improves the understanding of the topic.
I have some specific comments regarding the technical contents as follows:
- [pp. 21-22] When the author wrote about the Fourier transform of the window functions, "main lobe" and "side lobe" are labelled in the equations. Are these labels correct? In the equations, "main lobe" is labelled for the oscillatory functions that are not concentrated around 0 Hz.
- [pp. 29-30] Algorithm 1 is described as "general CP algorithm (without relaxation)" in the text. Is this correct? The 7th line (containing alpha) seems to be the relaxation procedure.
- [pg. 87] Appendix A.2 says, "Computing a canonical tight window does not effect the shape of the window function only its amplitude." Is this correct? Some assumptions are necessary to make it correct.
- [Table 5.1] Computational time is given without the processor names of the CPUs (e.g., i7-14700K). It should be provided for interpretability. (Writing the MATLAB version is also preferable because the runtime can be significantly different depending on the version of MATLAB.) Topics for thesis defence:
  1. What made the subjective quality so different (as in Fig. 5.9) among the four compared methods? Which part of the proposed method contributes to the difference? How can the proposed method be improved in future work?
Points proposed by reviewer: 90

Grade proposed by reviewer: A

File inserted by the reviewer Size
Posudek oponenta [.pdf] 129,58 kB

Responsibility: Mgr. et Mgr. Hana Odstrčilová