Please use this identifier to cite or link to this item:
Title: Automated Audio Captioning
Authors: Κουζέλης, Θοδωρής
Ποταμιάνος Αλέξανδρος
Keywords: Deep Learning
Automated Audio Captioning
Issue Date: 14-Jul-2022
Abstract: The purpose of this dissertation is to study Automated Audio Captioning. The aim of this task is to describe the content of an audio clip using natural language. It is a cross-modal translation task at the intersection of audio signal processing and natural language processing. Audio Captioning focuses on the audio events and their spaciotem- poral relationships in an audio clip and expresses them in natural language. It is a recent and rather unexplored task, that has a great potential for practical applications. In this work, we model caption generation for a given audio clip as a sequence-to- sequence task, using a Transformer architecture. We show that using recent strategies from the related field of Audio Tagging, allows us to significantly reduce the complexity of our model without affecting performance. In order to generate rich and varied descriptions we investigate decoding algorithms that minimize the trade-off between, semantic fidelity and diversity in captions. As a real world application of Automated Audio Captioning, we propose a novel task, where given the audio of movie a system aims to generate captions of salient sound events. Essentially, the aim of our proposed task is to automatically generate Subtitles for Deaf and Hard of Hearing (SDH). Our proposed system detects the segments of sound events using a pre-trained tagging model and generates a textual description using our model for Audio Captioning. To improve the performance of our audio captioning model we create a task specific dataset using SDH subtitles and movies. Furthermore, we integrate the textual information of the tagging model into caption generation by building a model for text guided audio captioning. Finally, we propose an novel metric to evaluate our results.
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Thesis.pdf12.76 MBAdobe PDFView/Open

Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.