Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19886| Τίτλος: | Efficient Incomplete Multimodal-Diffused Emotion Recognition |
| Συγγραφείς: | Ασπρογέρακας, Ιωάννης Ποταμιάνος Αλέξανδρος |
| Λέξεις κλειδιά: | Diffusion Models Multimodal Emotion Recognition Stochastic Differential Equations Multimodal Deep Learning Deep Generative Modeling |
| Ημερομηνία έκδοσης: | 24-Οκτ-2025 |
| Περίληψη: | Multimodal Emotion Recognition (MER) aims to model human affect by integrating complementary signals from language, vision, and audio. While deep learning methods have achieved impressive results through cross-modal fusion, most assume complete modality availability during training and inference, a condition rarely met in real world deployments where occlusions, noise, or sensor failures frequently cause missing modalities. Addressing this problem requires robust imputation strategies that can recover missing signals without sacrificing efficiency. In this work, we explore the design space of diffusion models for missing modality imputation, building upon and extending the IMDER framework. We propose a decoupled two-stage training scheme where modality-specific diffusion models are pre-trained independently and then integrated into the MER pipeline. This design avoids the instability of end-to-end IMDER training, where untrained diffusion models initially degrade classifier performance. In addition, we systematically compare stochastic differential equation (SDE) formulations, specifically Variance Preserving (VP) and Variance Exploding (VE) processes, evaluate alternative conditioning mechanisms with transformerbased backbones, and finally investigate multiple sampling strategies to balance efficiency and accuracy. Extensive experiments on CMU-MOSI and CMU-MOSEI demonstrate consistent improvements across both fixed and random missing protocols. Our quality-focused configuration achieves superior accuracy, with up to +2% F1 and +1.5% ACC2 gains over IMDER, while delivering 5× faster inference. Meanwhile, our speed-optimized configuration maintains competitive performance, +1% ACC2, +0.5% F1, but achieves remarkable efficiency with 15× faster inference, making it competitive for real-time MER applications. |
| URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19886 |
| Εμφανίζεται στις συλλογές: | Διπλωματικές Εργασίες - Theses |
Αρχεία σε αυτό το τεκμήριο:
| Αρχείο | Περιγραφή | Μέγεθος | Μορφότυπος | |
|---|---|---|---|---|
| ioannisasprogerakas_thesis.pdf | 28.44 MB | Adobe PDF | Εμφάνιση/Άνοιγμα |
Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.