Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19777
Title: | Εκμάθηση πολυπολιτισμικών αναπαραστάσεων για ανάλυση μουσικών σημάτων (Multicultural representation learning for music signal analysis) |
Authors: | Παπαϊωάννου, Χαρίλαος Ποταμιάνος Αλέξανδρος |
Keywords: | Machine learning Signal processing Music information retrieval Audio processing Computational musicology Cross-cultural music similarity Μηχανική μάθηση Επεξεργασία σήματος Ανάκτηση πληροφορίας από μουσική Επεξεργασία ήχου Υπολογιστική μουσικολογία Διαπολιτισμική μουσική ομοιότητα |
Issue Date: | 27-Aug-2025 |
Abstract: | Music Information Retrieval (MIR) research has traditionally focused on Western musical traditions, creating a significant gap in computational approaches to diverse world music cultures. This dissertation addresses this gap by developing and evaluating methods for multicultural music representation learning, aiming to create more culture-aware computational approaches that can effectively capture and analyze the distinctive characteristics of various musical traditions. The research develops the Lyra dataset, a comprehensive collection of Greek traditional and folk music comprising 1570 pieces with rich metadata, and explores cross-cultural knowledge transfer through systematic evaluation of deep audio embedding models across Western, Mediterranean, and Indian musical traditions. To address limited annotated data challenges, the dissertation introduces Label-Combination Prototypical Networks (LC-Protonets), a novel multi-label few-shot learning approach that creates prototypes for label combinations rather than individual labels. The work evaluates state-of-the-art foundation models across diverse musical corpora and introduces CultureMERT, a multi-culturally adapted foundation model developed through continual pre-training on Greek, Turkish, and Indian music. The final investigation presents a comprehensive analysis of cross-cultural music similarity bridging human perception, signal processing features, and foundation models through human annotations from 125 participants evaluating 1130 audio pairs across Western, Mediterranean, Indian, and Chinese cultures. Results demonstrate that foundation models achieve the strongest alignment with human perception, while melody emerges as the most important perceptual dimension. By advancing dataset development, transfer learning, few-shot learning, foundation model adaptation, and human-centered evaluation, this dissertation contributes computational methodologies for analyzing diverse musical traditions and provides insights into the relationship between human cross-cultural music perception and computational music understanding. |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19777 |
Appears in Collections: | Διδακτορικές Διατριβές - Ph.D. Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Multicultural_Representation_Learning_for_Music_Signal_Analysis.pdf | Doctoral Dissertation of Charilaos Papaioannou | 18.54 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.