Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19886
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΑσπρογέρακας, Ιωάννης-
dc.date.accessioned2025-11-04T08:07:14Z-
dc.date.available2025-11-04T08:07:14Z-
dc.date.issued2025-10-24-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19886-
dc.description.abstractMultimodal Emotion Recognition (MER) aims to model human affect by integrating complementary signals from language, vision, and audio. While deep learning methods have achieved impressive results through cross-modal fusion, most assume complete modality availability during training and inference, a condition rarely met in real world deployments where occlusions, noise, or sensor failures frequently cause missing modalities. Addressing this problem requires robust imputation strategies that can recover missing signals without sacrificing efficiency. In this work, we explore the design space of diffusion models for missing modality imputation, building upon and extending the IMDER framework. We propose a decoupled two-stage training scheme where modality-specific diffusion models are pre-trained independently and then integrated into the MER pipeline. This design avoids the instability of end-to-end IMDER training, where untrained diffusion models initially degrade classifier performance. In addition, we systematically compare stochastic differential equation (SDE) formulations, specifically Variance Preserving (VP) and Variance Exploding (VE) processes, evaluate alternative conditioning mechanisms with transformerbased backbones, and finally investigate multiple sampling strategies to balance efficiency and accuracy. Extensive experiments on CMU-MOSI and CMU-MOSEI demonstrate consistent improvements across both fixed and random missing protocols. Our quality-focused configuration achieves superior accuracy, with up to +2% F1 and +1.5% ACC2 gains over IMDER, while delivering 5× faster inference. Meanwhile, our speed-optimized configuration maintains competitive performance, +1% ACC2, +0.5% F1, but achieves remarkable efficiency with 15× faster inference, making it competitive for real-time MER applications.en_US
dc.languageenen_US
dc.subjectDiffusion Modelsen_US
dc.subjectMultimodal Emotion Recognitionen_US
dc.subjectStochastic Differential Equationsen_US
dc.subjectMultimodal Deep Learningen_US
dc.subjectDeep Generative Modelingen_US
dc.titleEfficient Incomplete Multimodal-Diffused Emotion Recognitionen_US
dc.description.pages161en_US
dc.contributor.supervisorΠοταμιάνος Αλέξανδροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
ioannisasprogerakas_thesis.pdf28.44 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.