Domain Generalization in Robust Vision Transformers for Semantic Segmentation in Autonomous Driving

Τζόκας, Γιώργος

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19396

Τίτλος:	Domain Generalization in Robust Vision Transformers for Semantic Segmentation in Autonomous Driving
Συγγραφείς:	Τζόκας, Γιώργος Βουλόδημος Αθανάσιος
Λέξεις κλειδιά:	Νευρωνικά ∆ίκτυα Βαθιά Μάθηση Κατάτμηση Εικόνας Γενίκευση Πεδίου Αυτόνομα Οχήματα Σημασιολογική Τμηματοποίηση Πραγματικού Χρόνου
Ημερομηνία έκδοσης:	25-Οκτ-2024
Περίληψη:	Recent advancements in artificial intelligence have led to the widespread use of deep learning across various applications. Computer vision, in particular, has greatly benefited from these developments, with highly efficient models now employed in real-time scenarios. One notable application is semantic segmentation for autonomous driving, which enables self-driving vehicles to achieve a detailed understanding of their surroundings, allowing them to make informed decisions in real-time. For these applications, it is crucial for models to maintain high accuracy across diverse environmental conditions while operating in real-time. The goal of this thesis was to develop a model that harnesses the robustness of transformer encoders while enhancing the model's efficiency compared to state-of-the-art generalization architectures. To demonstrate that the model maintains accuracy, we conducted a generalization experiment comparing the model agianst robust models and a real-time architecture. Additionally, we performed two experiments in different knowledge domains to show that the capabilities of these models extend beyond autonomous driving. The experiments showed that although transformers are robust and unaffected by field shifts, they are far from being a viable solution in real-time operations. In our case, by using an efficient decoder we managed to accelerate the speed of inference without sacrificing accuracy. However, this small reduction in extraction time is not enough to achieve real-time segmentation or speeds comparable to those of convolutional models. In conclusion, efforts should be made to reduce the computational burden caused by transformer models, as they seem to be the main source of the peak in inference times compared to convolutional architectures.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19396
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
thesis_Tzokas_Georgios.pdf		9.61 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε την πλήρη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.