Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19922
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΜπότσας, Χαράλαμπος Σπυρίδων-
dc.date.accessioned2025-11-12T15:09:54Z-
dc.date.available2025-11-12T15:09:54Z-
dc.date.issued2025-10-13-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19922-
dc.description.abstractRecent advances in Music Information Retrieval (MIR) have showcased the ability of pretrained audio representations to achieve competitive performance in various downstream tasks. However, their potential in the area of music recommender systems, and particularly in sequential recommendation, has not been fully explored. The present work examines whether audio-based embeddings from pretrained models can prove useful in a session-based recommendation setting. Using Music4All, an openly available dataset of listening histories along with audio segments, we generate listening sessions and we extract embeddings from three pretrained models: MusiCNN, MERT and a custom artist-based model and experiment with different strategies of integrating them into Transformer-based recommender architectures built upon two sequential recommendation frameworks: Transformers4Rec and Represent-Then-Aggregate (RTA). Results show that while audio representations accelerate convergence, they do not consistently outperform randomly initialized baselines under the Transformers4Rec setup. In contrast, within the RTA framework, content-based embeddings yield modest but consistent improvements, but fail to surpass other metadata-driven baselines. These opposite trends across the two frameworks can be partly attributed to their different task formulation and training objective, highlighting that the benefit of pretrained audio representations depends on how well the downstream objective aligns with the semantics of the embedding space. This is also supported by the fact that fine-tuning the original embeddings on behavioral data improves their effectiveness, outperforming the out-of-the-box embeddings and demonstrating the value of lightweight domain adaptation. Finally, an ablation study revealed the crucial role of proper hyperparameter choice in leveraging the full potential of the pretrained representations.en_US
dc.languageenen_US
dc.subjectMusic Recommender Systemsen_US
dc.subjectMusic Information Retrievalen_US
dc.subjectRepresentation Learningen_US
dc.subjectSequential Recommendationen_US
dc.subjectNeural Networksen_US
dc.titleEvaluating the Effectiveness of Pretrained Audio Representations in Session-Based Music Recommender Systemsen_US
dc.description.pages117en_US
dc.contributor.supervisorΜαραγκός Πέτροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Botsas_DiplomaThesis.pdf10.77 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.