Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19234
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΣωτήρου, Θεόδωρος-
dc.date.accessioned2024-07-25T07:16:22Z-
dc.date.available2024-07-25T07:16:22Z-
dc.date.issued2024-07-17-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19234-
dc.description.abstractMusic Information Retrieval (MIR) is a field of research concerned with the extraction and analysis of information from music. Among other tasks, it includes music regression/classification and specifically mood detection and genre recognition. Alongside the growth seen in artificial intelligence (AI) fields, MIR has also experienced significant advancements, including the availability of extensive datasets, the integration of new technologies and multimodal approaches as well as the development and application of advanced explainability methods. In this thesis, we dive into explaining music emotion and genre classification multimodal models. Firstly we look for available datasets that provide multimodal and multi task capabilities. We choose Music4All [54], offering lyrics and audio as well as emotion and genre metadata for each song and proceed by analysing, refining and slightly augmenting this work. We continue by utilizing pretrained transformer architectures, namely Robustly Optimized BERT Pretraining Approach (RoBERTa) and Audio Spectrogram Transformer (AST), so as to classify music creations into 9 distinct emotion and genre categories utilizing their lyrics, their audio and a combination of the two. Finally, we look for methods to explain each model and propose a way to generate multimodal explanations from lyrics and audio, using the power of LIME [51] and its audio implementation auioLIME [25]. Finally we generate global aggregates [35] of LIME explanations, providing insights into the models performance and the models ability to detect themes and elements distinct for each class.en_US
dc.languageenen_US
dc.subjectMusic Information Retrievalen_US
dc.subjectDeep Learningen_US
dc.subjectMultimodalityen_US
dc.subjectMusic Genre Classificationen_US
dc.subjectLocal Explanationsen_US
dc.subjectMultimodal Explainabilityen_US
dc.titleExplaining Multimodal Music Emotion and Genre Recognitionen_US
dc.description.pages101en_US
dc.contributor.supervisorΣτάμου Γιώργοςen_US
dc.departmentΤομέας Τεχνολογίας Πληροφορικής και Υπολογιστώνen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Diploma_Sotirou_Final.pdf4.61 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.