Domain Generalization in Robust Vision Transformers for Semantic Segmentation in Autonomous Driving

Τζόκας, Γιώργος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19396

Full metadata record

DC Field	Value	Language
dc.contributor.author	Τζόκας, Γιώργος	-
dc.date.accessioned	2024-11-08T07:55:40Z	-
dc.date.available	2024-11-08T07:55:40Z	-
dc.date.issued	2024-10-25	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19396	-
dc.description.abstract	Recent advancements in artificial intelligence have led to the widespread use of deep learning across various applications. Computer vision, in particular, has greatly benefited from these developments, with highly efficient models now employed in real-time scenarios. One notable application is semantic segmentation for autonomous driving, which enables self-driving vehicles to achieve a detailed understanding of their surroundings, allowing them to make informed decisions in real-time. For these applications, it is crucial for models to maintain high accuracy across diverse environmental conditions while operating in real-time. The goal of this thesis was to develop a model that harnesses the robustness of transformer encoders while enhancing the model's efficiency compared to state-of-the-art generalization architectures. To demonstrate that the model maintains accuracy, we conducted a generalization experiment comparing the model agianst robust models and a real-time architecture. Additionally, we performed two experiments in different knowledge domains to show that the capabilities of these models extend beyond autonomous driving. The experiments showed that although transformers are robust and unaffected by field shifts, they are far from being a viable solution in real-time operations. In our case, by using an efficient decoder we managed to accelerate the speed of inference without sacrificing accuracy. However, this small reduction in extraction time is not enough to achieve real-time segmentation or speeds comparable to those of convolutional models. In conclusion, efforts should be made to reduce the computational burden caused by transformer models, as they seem to be the main source of the peak in inference times compared to convolutional architectures.	en_US
dc.language	en	en_US
dc.subject	Νευρωνικά ∆ίκτυα	en_US
dc.subject	Βαθιά Μάθηση	en_US
dc.subject	Κατάτμηση Εικόνας	en_US
dc.subject	Γενίκευση Πεδίου	en_US
dc.subject	Αυτόνομα Οχήματα	en_US
dc.subject	Σημασιολογική Τμηματοποίηση Πραγματικού Χρόνου	en_US
dc.title	Domain Generalization in Robust Vision Transformers for Semantic Segmentation in Autonomous Driving	en_US
dc.description.pages	91	en_US
dc.contributor.supervisor	Βουλόδημος Αθανάσιος	en_US
dc.department	Τομέας Ηλεκτρομαγνητικών Εφαρμογών Ηλεκτροοπτικής και Ηλεκτρονικών Υλικών	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
thesis_Tzokas_Georgios.pdf		9.61 MB	Adobe PDF	View/Open

Show simple item record