Multi-Stage Unsupervised Domain Adaptation For Automatic Speech Recognition

Δαμιανός, Δημήτρης

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19363

Τίτλος:	Multi-Stage Unsupervised Domain Adaptation For Automatic Speech Recognition
Συγγραφείς:	Δαμιανός, Δημήτρης Ποταμιάνος Αλέξανδρος
Λέξεις κλειδιά:	Unsupervised Domain Adaptation Automatic Speech Recognition Self-supervision Semi-supervision Pseudo-labeling
Ημερομηνία έκδοσης:	18-Οκτ-2024
Περίληψη:	The purpose of this diploma thesis is to study unsupervised domain adaptation for Automatic Speech Recognition. In the context of unsupervised domain adaptation, we work with two distinct data distributions, the source domain and the target domain. While both domains have available input data, corresponding labels are only accessible in the source domain. The goal is to develop a model that can be effectively applied to the target domain, leveraging both the available labeled and unlabeled data. In this dissertation, we discuss the fundamentals of machine learning and the challenges associated with speech recognition, covering both traditional and modern approaches. We then review the literature on domain adaptation methods, categorizing these approaches into three major groups, including semi-supervised learning and self-supervision techniques. In the present work, we explore the capabilities of the Meta PL domain adaptation framework - previously applied to image recognition task- for Automatic Speech Recognition. Additionally, we introduce Multi-Stage Domain Adaptation, a two-stage domain adaptation method that combines self-supervised strategies with semi-supervised techniques. Multi-Stage Domain Adaptation is designed to enhance the robustness and generalization of Automatic Speech Recognition models in the context of low-resource languages, such as Greek, and weakly supervised data where labeled data is scarce or noisy. Our extensive experiments show that Meta PL can be effectively applied to Automatic Speech Recognition tasks, resulting in an average WER improvement of 4%. Additionally, we demonstrate that Multi-Stage Domain Adaptation outperforms our baselines WER by 7% on average, providing a more robust solution for domain adaptation in Automatic Speech Recognition, especially in underrepresented linguistic settings. Finally, we examine the limitations of integrating self-supervised tasks with semi-supervised training within the Meta PL framework and conclude that self-supervised tasks should be applied separately from semi-supervised training.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19363
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
damianos_thesis.pdf		2.29 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε την πλήρη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.