Please use this identifier to cite or link to this item:
Title: Lyrics and Vocal Melody Generation conditioned on Accompaniment
Authors: Melistas, Thomas
Ποταμιάνος Αλέξανδρος
Keywords: lyrics and symbolic music generation
deep learning
natural language processing
efficient attention
language modeling
music analysis
Issue Date: 12-Mar-2021
Abstract: The purpose of this dissertation is to study the generation of lyrics and vocal melody for a given instrumental music piece. It is a novel, previously unexplored task. During the last few years, there has been increasing research interest over lyrics generation as a case of language modelling with domain specific structure and attributes, as well as regarding symbolic music generation. The correlation of lyrics and corresponding vocal melody has also recently started gaining attention and a few models that are able to generate lyrics conditioned on melody, and vice versa, have been developed. While the above research directions are very promising, they fail to capture the general musical context of the songwriting process. In the majority of contemporary music, singing coexists with accompaniment and its function is to both provide a melodic line, that is grounded on the instrumental part and advances it musically, as well as to promote the unfolding of a story through lyrical imagery. Moreover, former research on the matter has followed a proof-of-concept approach, working on the level of one or a few sentences, which is insufficient for capturing the structure and the recurring musical and lyrical themes present in a song. Our work models lyrics and vocal melody generation for a given music piece as a sequence-to-sequence task, using for the first time an efficient attention Transformer architecture trained on text event sequences, that describe entire songs. We build a symbolic music dataset, suitable for the described task, and we apply music theory analysis, compressing successfully our training data and making them key-independent. As a result, our models become faster to train and more robust. Furthermore, we come up with a novel architecture, that decouples lyric and melody generation, while also providing the ability to use any pretrained language model and optional conditioning on predefined lyrics. Finally, the output is used together with a singing voice synthesis model to create and add vocals to instrumental tracks, which we use for qualitative evaluation. To the best of our knowledge, this is the first attempt to study both the melodic and lyrical content of singing in relation to the musical context it is found in, and through that, automate the process a singer or songwriter would follow, when presented with an instrumental music piece, in order to enrich it with vocals. We believe that our work can fuel human creativity and provide interesting musical ideas.
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
melistas_lyrics_vocals_generation.pdf3.8 MBAdobe PDFView/Open

Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.