Automatic Cover Song Generation from Audio Signal

Μαρκαντωνάτος, Γεράσιμος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19654

Title:	Automatic Cover Song Generation from Audio Signal
Authors:	Μαρκαντωνάτος, Γεράσιμος Ποταμιάνος Αλέξανδρος
Keywords:	Music Information Retrieval Domain Adaptation Transfer Learning Cross-Cultural Transfer Cross-Instrumental Transfer Deep Learning Cover Song Generation
Issue Date:	1-Jul-2025
Abstract:	Cover song generation represents a challenging task in Music Information Retrieval, requiring systems to preserve the musical essence of original compositions while adapting them to specific instruments and styles. This thesis addresses key limitations in the field: the scarcity of training data for non-pop musical genres and the lack of cover generation models for instruments beyond piano. We present two key dataset contributions: the GreekSong2Piano dataset, containing 659 Greek songs paired with piano covers across eight distinct genres (Rembetiko, Laiko, Entexno, etc.), and the Pop2Guitar dataset with 40 song-guitar pairs for cross-instrument domain adaptation. These datasets enable systematic investigation of transfer learning approaches in low-resource scenarios. Our methodology employs a T5-based encoder-decoder Transformer architecture that treats cover generation as a sequence-to-sequence translation problem, converting audio spectrograms to symbolic MIDI representations. We systematically compare three training strategies: from-scratch training, partial fine-tuning, and full fine-tuning. Additionally, we introduce a novel sequential fine-tuning approach that performs multi-step domain adaptation from Western pop piano covers to Greek piano covers to guitar covers. Experimental results demonstrate clear advantages for transfer learning approaches over from-scratch training. For Greek piano covers, fine-tuning strategies achieve up to 21.0% improvement in Melody Chroma Accuracy compared to baseline models. The sequential fine-tuning approach shows particular promise for guitar generation, with the partial fine-tuned model achieving the highest similarity ratings (3.31±0.33) in user studies, approaching human performance (4.17±0.28). Our comprehensive evaluation framework combines objective metrics (melody similarity, cover song identification, embedding-based measures) with subjective user assessment, demonstrating strong correlation between computational measures and human perception. This work establishes a foundation for culturally-aware and instrumentdiverse music arrangement systems, contributing to both creative applications and computational understanding of musical translation across cultural and instrumental boundaries
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19654
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
diploma_thesis_final_version.pdf		6.24 MB	Adobe PDF	View/Open

Show full item record