Master's Thesis

Interpretation of emotions from text on social media

Author of thesis: Ing. Vít Tlustoš

Acad. year: 2023/2024

Supervisor: prof. Aamir Saeed Malik, Ph.D.

Abstract:

Most human interactions are either text-based or can be converted to text using speech-to-text technologies. This thesis is dedicated to recognizing emotions from these texts. Despite extensive research in this domain, three significant challenges persisted: unexplored or limited cross-domain efficacy of the methods, superficial analysis of the result, and limited usability of the outcomes. We address these challenges by proposing two models based on the RoBERTa model, which we call EmoMosaic-base and EmoMosaic-large. These models were trained on the following datasets: SemEval-2018 Task 1: Affect in Tweets, GoEmotions, XED, and DailyDialog datasets. In contrast to prior studies, we trained our models on all the datasets simultaneously while preserving their original categories. This resulted in models that exhibit strong performance across diverse domains and are directly comparable to other methods. In fact, EmoMosaic-large outperforms recent single-domain state-of-the-art models on SemEval-2018 Task 1: Affect in Tweets and GoEmotions datasets, demonstrating outstanding cross-domain performance. To promote the usability and reproducibility of our research, we make all our code and models public, available at: https://huggingface.co/vtlustos.

Keywords:

emotion classification from text, emotion recognition from text, cross-domain emotion recognition, GoEmotions, DailyDialog, XED, SemEval-2018 Task 1

Date of defence

20.06.2024

Result of the defence

Defended (thesis was successfully defended)

znamkaAznamka

Grading

Process of defence

Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm A.

Topics for thesis defence

Jak si vaše práce poradí s ironicky míněnými větami?
Proč jste zvolil zrovna jazykový model Roberta?
Jak přesně provádíte klasifikaci?

Language of thesis

English

Faculty

Fakulta informačních technologií

Department

Department of Computer Systems

Study programme

Information Technology and Artificial Intelligence (MITAI)

Specialization

Computer Vision (NVIZ)

Composition of Committee

prof. Ing. Adam Herout, Ph.D. (předseda)
prof. Dr. Ing. Jan Černocký (člen)
doc. RNDr. Milan Češka, Ph.D. (člen)
Ing. Michal Hradiš, Ph.D. (člen)
doc. Ing. Peter Chudý, Ph.D., MBA (člen)
Ing. David Bařina, Ph.D. (člen)

Supervisor’s report
prof. Aamir Saeed Malik, Ph.D.

The student has completed all the objectives specified in the project description. He has developed two models based on RoBERTa to analyze text for the detection of various emotions. He has tested it with four different datasets and the results are promising. The student has provided detailed results and comparisons with existing methods. It’s a very good work and thesis is easy to read and nicely written.

Evaluation criteria	Verbal classification
Informace k zadání	The thesis was related to detection of emotions from text on social media platforms. The work was challenging because it not only required knowledge in natural language processing, and machine learning but also required understanding of the emotions from psychological perspective. The complexity of the project increased due to the multi-level classification, that is a variety of emotions. The student proposed two models based on RoBERTa model, a large language model, to achieve this objective and tested it on four different databases. The results are promising.
Aktivita při dokončování	The student regularly consulted me before the submission of the thesis. The work was challenging because four different datasets were involved. The student proposed two models based on RoBERTa and performed extensive experimentation. Inspite of that, the student was able to complete all the development in time and was able to submit the thesis within the given timeframe.
Publikační činnost, ocenění	The thesis is well structured and organized, nicely written and contains content which is publishable. The student is willing to assist in preparing the draft for a journal paper. The journal paper is being prepared and will be submitted to a Q1 journal. In addition, he has provided a working web page on hugging face where a user can type in some text and the model will predict the emotion.
Práce s literaturou	The student has provided detailed literature review in chapter 2. The chapter starts with description of emotions and the corresponding emotion models. Then the various datasets are discussed that have been used by various researchers in this domain. After discussing the datasets, the various quality metrics are discussed in detail. Finally, many articles are presented which have attempted to provide a solution to this problem. The last section provides critical review highlighting the limitations of the existing methods. Overall, it’s a very well written chapter.
Aktivita během řešení, konzultace, komunikace	The student had regular weekly meetings throughout the two semesters while working on the thesis. He was punctual and always came to meetings well prepared and on time. I found him to be a very dedicated and hardworking student.

Points proposed by supervisor: 94

Grade proposed by supervisor: A

Reviewer’s report
Ing. Vlastimil Košař, Ph.D.

The technical report is well-written and precise. The software solution works well, and the achieved results are competitive with the state of the art. Therefore, I overall grade the work as excellent - A.

Evaluation criteria	Verbal classification	Points
Rozsah splnění požadavků zadání	Evaluation level: zadání splněno
Rozsah technické zprávy	Evaluation level: je v obvyklém rozmezí The technical report is approximately 83 standard pages long, falling within the usual range. All sections of the report contain essential and information-rich content.
Prezentační úroveň technické zprávy	The technical report is logically structured. First, the author presents a review of the literature on psychological models of emotions and approaches to emotion recognition. Available datasets are described and evaluated. Limitations and gaps in existing approaches are described. Next, research goals are outlined, and methodology to address the goals is proposed and fully described. Finally, the results of all experiments are thoroughly analysed and extensively evaluated in a span of 14 pages. The achieved results are competitive with the state of the art. The chapters are reasonably extensive and easy to understand for the reader. The continuity of individual chapters is also at a high level.	96
Formální úprava technické zprávy	The document is well-designed from a typographic standpoint. There is only one instance of text and table overflowing to the page border. The technical report is written in clear and proper English. The language used is well-crafted, with no apparent mistakes in style or grammar. The text is clear and easy to understand.	92
Práce s literaturou	The study sources are well chosen and represent actual state of the art. The sources are mostly scientific papers and are relevant to the assignment. The author works well with the sources and cites them accordingly. Only in the chapter regarding metrics of classification models, the lesser known metrics such as Brier Score or Expected Calibration Error probably should have sources specified. However, this issue is negligible in the context of the whole work. The rest of metrics can be safely regarded as common knowledge. Bibliographic citations are complete without any apparent problems.	90
Realizační výstup	The software solution works well and is well-structured. It includes a model training component, evaluation scripts, and a demo application. The code is sufficiently commented and documented. The trained models were thoroughly evaluated. Also, the author made the models, application, and results freely available to the community, which is commendable.	92
Využitelnost výsledků	The thesis is research-oriented and the achieved results are competitive with the state of the art, indicating potential for publication.
Náročnost zadání	Evaluation level: obtížnější zadání The assignment is in a more difficult category. It is a research-oriented assignment, requiring deep research into the state of the art and rigorous evaluation of results.

Points proposed by reviewer: 95

Grade proposed by reviewer: A

Responsibility: Mgr. et Mgr. Hana Odstrčilová

VUT

Faculties and university institutes

Parts

Interpretation of emotions from text on social media