Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19234
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Σωτήρου, Θεόδωρος | - |
dc.date.accessioned | 2024-07-25T07:16:22Z | - |
dc.date.available | 2024-07-25T07:16:22Z | - |
dc.date.issued | 2024-07-17 | - |
dc.identifier.uri | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19234 | - |
dc.description.abstract | Music Information Retrieval (MIR) is a field of research concerned with the extraction and analysis of information from music. Among other tasks, it includes music regression/classification and specifically mood detection and genre recognition. Alongside the growth seen in artificial intelligence (AI) fields, MIR has also experienced significant advancements, including the availability of extensive datasets, the integration of new technologies and multimodal approaches as well as the development and application of advanced explainability methods. In this thesis, we dive into explaining music emotion and genre classification multimodal models. Firstly we look for available datasets that provide multimodal and multi task capabilities. We choose Music4All [54], offering lyrics and audio as well as emotion and genre metadata for each song and proceed by analysing, refining and slightly augmenting this work. We continue by utilizing pretrained transformer architectures, namely Robustly Optimized BERT Pretraining Approach (RoBERTa) and Audio Spectrogram Transformer (AST), so as to classify music creations into 9 distinct emotion and genre categories utilizing their lyrics, their audio and a combination of the two. Finally, we look for methods to explain each model and propose a way to generate multimodal explanations from lyrics and audio, using the power of LIME [51] and its audio implementation auioLIME [25]. Finally we generate global aggregates [35] of LIME explanations, providing insights into the models performance and the models ability to detect themes and elements distinct for each class. | en_US |
dc.language | en | en_US |
dc.subject | Music Information Retrieval | en_US |
dc.subject | Deep Learning | en_US |
dc.subject | Multimodality | en_US |
dc.subject | Music Genre Classification | en_US |
dc.subject | Local Explanations | en_US |
dc.subject | Multimodal Explainability | en_US |
dc.title | Explaining Multimodal Music Emotion and Genre Recognition | en_US |
dc.description.pages | 101 | en_US |
dc.contributor.supervisor | Στάμου Γιώργος | en_US |
dc.department | Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών | en_US |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Diploma_Sotirou_Final.pdf | 4.61 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.