Investigating Optimization Techniques for Multimodal Neural Networks

Καφφέζα, Ιωάννα

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19545

Title:	Investigating Optimization Techniques for Multimodal Neural Networks
Authors:	Καφφέζα, Ιωάννα Ποταμιάνος Αλέξανδρος
Keywords:	Machine Learning Multimodal Neural Networks Sentiment Analysis Backpropagation Algorithm Imbalanced Learning Optimization Techniques
Issue Date:	25-Feb-2025
Abstract:	Multimodal learning has gained significant attention in sentiment analysis, yet multimodal models often have degraded performance compared to their unimodal counterparts—a counterintuitive phenomenon. Imbalanced learning dynamics, where certain modalities dominate the learning process while others remain underutilized, lead to suboptimal model performance. This thesis investigates the impact of optimization techniques on multimodal neural networks, focusing on how different strategies influence unbalanced learning dynamics in sentiment analysis. We evaluate two categories of optimization techniques on the CMU-MOSI and CMU-MOSEI datasets for sentiment classification. Methods of OGM-GE and AGM, apply direct gradient adjustments during backpropagation to ensure balanced contributions from each modality. On the other hand, PMR and ReconBoost focuses on a multi-loss approach. PMR introduces a penalty-boosting loss scheme, while ReconBoost incorporates an alternating learning paradigm. Additionally, we assess architectural choices, including optimizer selection, batch size, and the use of a development set for unbiased auxiliary calculations in dynamic adjustments. While gradient-based and multi-loss approaches help balance learning dynamics, no single method fully resolves modality imbalance in our tasks. Established baselines, such as Late Concatenation and Uni-Pre Finetuned, remain superior in accuracy. The use of a development set enhances stability and reduces bias, while Adam proves to be the most effective optimizer. Despite these advancements, multimodal optimization remains an open challenge. While dynamic optimization techniques improve modality balance, they do not consistently enhance overall performance, highlighting the need for more adaptive and modality-aware optimization strategies. These findings provide a deeper understanding of multimodal learning dynamics, offering valuable insights for future advancements in multimodal sentiment analysis.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19545
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
ioanna_kaffeza_thesis.pdf		4.85 MB	Adobe PDF	View/Open

Show full item record