Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19886
Title: Efficient Incomplete Multimodal-Diffused Emotion Recognition
Authors: Ασπρογέρακας, Ιωάννης
Ποταμιάνος Αλέξανδρος
Keywords: Diffusion Models
Multimodal Emotion Recognition
Stochastic Differential Equations
Multimodal Deep Learning
Deep Generative Modeling
Issue Date: 24-Oct-2025
Abstract: Multimodal Emotion Recognition (MER) aims to model human affect by integrating complementary signals from language, vision, and audio. While deep learning methods have achieved impressive results through cross-modal fusion, most assume complete modality availability during training and inference, a condition rarely met in real world deployments where occlusions, noise, or sensor failures frequently cause missing modalities. Addressing this problem requires robust imputation strategies that can recover missing signals without sacrificing efficiency. In this work, we explore the design space of diffusion models for missing modality imputation, building upon and extending the IMDER framework. We propose a decoupled two-stage training scheme where modality-specific diffusion models are pre-trained independently and then integrated into the MER pipeline. This design avoids the instability of end-to-end IMDER training, where untrained diffusion models initially degrade classifier performance. In addition, we systematically compare stochastic differential equation (SDE) formulations, specifically Variance Preserving (VP) and Variance Exploding (VE) processes, evaluate alternative conditioning mechanisms with transformerbased backbones, and finally investigate multiple sampling strategies to balance efficiency and accuracy. Extensive experiments on CMU-MOSI and CMU-MOSEI demonstrate consistent improvements across both fixed and random missing protocols. Our quality-focused configuration achieves superior accuracy, with up to +2% F1 and +1.5% ACC2 gains over IMDER, while delivering 5× faster inference. Meanwhile, our speed-optimized configuration maintains competitive performance, +1% ACC2, +0.5% F1, but achieves remarkable efficiency with 15× faster inference, making it competitive for real-time MER applications.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19886
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
ioannisasprogerakas_thesis.pdf28.44 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.