Context-Based Visual Emotion Recognition Using Deep Neural Networks

Πίκουλης, Ιωάννης

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17971

Title:	Context-Based Visual Emotion Recognition Using Deep Neural Networks
Authors:	Πίκουλης, Ιωάννης Μαραγκός Πέτρος
Keywords:	emotion recognition deep neural networks body face pose visual-semantic context CNN GCN TSN ST-GCN network ensemble
Issue Date:	24-Jun-2021
Abstract:	Visual emotion recognition constitutes a major subject in the interdisciplinary field of Computer Vision which is associated with the process of identifying human emotion on categorical (discrete) and/or dimensional (continuous) level, as it is being depicted in still images or video sequences. A review of related literature reveals that the majority of past efforts in visual emotion recognition have been mostly limited to the analysis of facial expressions, while some studies have either incorporated information relative to body pose or have attempted to perform emotion recognition solely on the basis of body movements and gestures. While some of these approaches perform well in controlled environments, they fail to interpret real-world scenarios where unpredictable social settings can render one or multiple of the aforementioned sources of affective information inaccessible. However, evidence from psychology related studies suggest that visual context, in addition to facial expression and body pose, provides important information to the perception of people’s emotions. In this work, we aim at reinforcing the concept of context-based visual emotion recognition. To this end, we conduct extensive experiments on two newly assembled and challenging databases, i.e. the EMOTions In Context (EMOTIC) and Body Language Dataset (BoLD), tackling both the image-based and video-based versions of the problem. More specifically we: • Extend already successful baseline architectures by incorporating multiple input streams that encode bodily, facial, contextual as well as scene related features, thus enhancing our models’ understanding of visual context and emotion in general. • Directly infuse scene classification scores and attributes as additional features in the emotion recognition process that function in a complementary manner with respect to all other sources of affective information. To the best of our knowledge, our approach is the first to do so. • Exploit categorical emotion label dependencies, that reside within the datasets, through the usage of Graph Convolutional Networks (GCN) and the addition of metric-learning inspired loss that is based on GloVe word embeddings. • Achieve competitive results on EMOTIC and significant improvements over the stateof- the-art techniques with relation to BoLD. A big portion of our contributions [86] was submitted to the 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG), with the authors being Ioannis Pikoulis, Panagiotis Paraskevas Filntisis and Petros Maragos.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17971
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
el15198_thesis_final.pdf	Διπλωματική Εργασία - Κύριο Έγγραφο	22.94 MB	Adobe PDF	View/Open

Show full item record