Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19922| Title: | Evaluating the Effectiveness of Pretrained Audio Representations in Session-Based Music Recommender Systems |
| Authors: | Μπότσας, Χαράλαμπος Σπυρίδων Μαραγκός Πέτρος |
| Keywords: | Music Recommender Systems Music Information Retrieval Representation Learning Sequential Recommendation Neural Networks |
| Issue Date: | 13-Oct-2025 |
| Abstract: | Recent advances in Music Information Retrieval (MIR) have showcased the ability of pretrained audio representations to achieve competitive performance in various downstream tasks. However, their potential in the area of music recommender systems, and particularly in sequential recommendation, has not been fully explored. The present work examines whether audio-based embeddings from pretrained models can prove useful in a session-based recommendation setting. Using Music4All, an openly available dataset of listening histories along with audio segments, we generate listening sessions and we extract embeddings from three pretrained models: MusiCNN, MERT and a custom artist-based model and experiment with different strategies of integrating them into Transformer-based recommender architectures built upon two sequential recommendation frameworks: Transformers4Rec and Represent-Then-Aggregate (RTA). Results show that while audio representations accelerate convergence, they do not consistently outperform randomly initialized baselines under the Transformers4Rec setup. In contrast, within the RTA framework, content-based embeddings yield modest but consistent improvements, but fail to surpass other metadata-driven baselines. These opposite trends across the two frameworks can be partly attributed to their different task formulation and training objective, highlighting that the benefit of pretrained audio representations depends on how well the downstream objective aligns with the semantics of the embedding space. This is also supported by the fact that fine-tuning the original embeddings on behavioral data improves their effectiveness, outperforming the out-of-the-box embeddings and demonstrating the value of lightweight domain adaptation. Finally, an ablation study revealed the crucial role of proper hyperparameter choice in leveraging the full potential of the pretrained representations. |
| URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19922 |
| Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Botsas_DiplomaThesis.pdf | 10.77 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.