Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19929| Τίτλος: | Cross-modal Flow Matching for Domain Generalization |
| Συγγραφείς: | Κρητικός, Αντώνιος Βουλόδημος Αθανάσιος |
| Λέξεις κλειδιά: | machine learning computer vision domain generalization contrastive learning modality gap flow matching optimal transport |
| Ημερομηνία έκδοσης: | 11-Νοε-2025 |
| Περίληψη: | Domain generalization (DG) requires models to perform robustly under domain shift, which in the realm of computer vision mainly takes form as stylistic variations between samples. Despite intensive research, DG still poses a major challenge, as models often overfit to domain-specific appearance cues and fail to capture class semantics. Therefore, many efforts have explored the use of natural language, due to its inherent domain-invariance. However, current multimodal approaches, specifically those relying on cosine similarity for cross-modal contrastive alignment in a joint embedding space, suffer from the modality gap, a phenomenon where image and text embeddings occupy separate regions despite semantic alignment. In this thesis, we address this residual gap by applying flow matching to learn a continuous transformation between unnormalized image and text embeddings of the same class, in the joint Euclidean latent space. Unlike most prior work which uses simple source distributions, we instead train a vector field that explicitly flows potentially domain-biased image embeddings to domain-invariant text embeddings. The resulting framework, CrossFlowDG, is tested with the efficient VMamba image encoder, which achieves linear complexity, compared to the widely-used quadratic-complexity transformer backbones, and establishes state-of-the-art classification accuracy among similar methods across several challenging domains from relevant benchmarks. To further enhance inference efficiency, and therefore deployability on edge devices with restricted compute, we propose an optimal transport-informed variant, called OT-CrossFlowDG. This second framework incorporates optimal transport alignment to minimize the 2-Wasserstein distance between modality distributions of each class in the Euclidean space. By solving the Sinkhorn-Knopp problem to obtain approximate optimal couplings between image and text embeddings of the same class and using barycentric targets as refined flow supervision, OT-CrossFlowDG achieves its peak performance in only 1 inference step, highlighting its efficiency in computationally constrained deployment scenarios. |
| URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19929 |
| Εμφανίζεται στις συλλογές: | Διπλωματικές Εργασίες - Theses |
Αρχεία σε αυτό το τεκμήριο:
| Αρχείο | Περιγραφή | Μέγεθος | Μορφότυπος | |
|---|---|---|---|---|
| Antonios_Kritikos_Diploma_Thesis.pdf | 901.18 kB | Adobe PDF | Εμφάνιση/Άνοιγμα |
Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.