Εκμάθηση πολυπολιτισμικών αναπαραστάσεων για ανάλυση μουσικών σημάτων (Multicultural representation learning for music signal analysis)

Παπαϊωάννου, Χαρίλαος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19777

Title:	Εκμάθηση πολυπολιτισμικών αναπαραστάσεων για ανάλυση μουσικών σημάτων (Multicultural representation learning for music signal analysis)
Authors:	Παπαϊωάννου, Χαρίλαος Ποταμιάνος Αλέξανδρος
Keywords:	Machine learning Signal processing Music information retrieval Audio processing Computational musicology Cross-cultural music similarity Μηχανική μάθηση Επεξεργασία σήματος Ανάκτηση πληροφορίας από μουσική Επεξεργασία ήχου Υπολογιστική μουσικολογία Διαπολιτισμική μουσική ομοιότητα
Issue Date:	27-Aug-2025
Abstract:	Music Information Retrieval (MIR) research has traditionally focused on Western musical traditions, creating a significant gap in computational approaches to diverse world music cultures. This dissertation addresses this gap by developing and evaluating methods for multicultural music representation learning, aiming to create more culture-aware computational approaches that can effectively capture and analyze the distinctive characteristics of various musical traditions. The research develops the Lyra dataset, a comprehensive collection of Greek traditional and folk music comprising 1570 pieces with rich metadata, and explores cross-cultural knowledge transfer through systematic evaluation of deep audio embedding models across Western, Mediterranean, and Indian musical traditions. To address limited annotated data challenges, the dissertation introduces Label-Combination Prototypical Networks (LC-Protonets), a novel multi-label few-shot learning approach that creates prototypes for label combinations rather than individual labels. The work evaluates state-of-the-art foundation models across diverse musical corpora and introduces CultureMERT, a multi-culturally adapted foundation model developed through continual pre-training on Greek, Turkish, and Indian music. The final investigation presents a comprehensive analysis of cross-cultural music similarity bridging human perception, signal processing features, and foundation models through human annotations from 125 participants evaluating 1130 audio pairs across Western, Mediterranean, Indian, and Chinese cultures. Results demonstrate that foundation models achieve the strongest alignment with human perception, while melody emerges as the most important perceptual dimension. By advancing dataset development, transfer learning, few-shot learning, foundation model adaptation, and human-centered evaluation, this dissertation contributes computational methodologies for analyzing diverse musical traditions and provides insights into the relationship between human cross-cultural music perception and computational music understanding.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19777
Appears in Collections:	Διδακτορικές Διατριβές - Ph.D. Theses

Files in This Item:

File	Description	Size	Format
Multicultural_Representation_Learning_for_Music_Signal_Analysis.pdf	Doctoral Dissertation of Charilaos Papaioannou	18.54 MB	Adobe PDF	View/Open

Show full item record