Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17971
Title: Context-Based Visual Emotion Recognition Using Deep Neural Networks
Authors: Πίκουλης, Ιωάννης
Μαραγκός Πέτρος
Keywords: emotion recognition
deep neural networks
body
face
pose
visual-semantic context
CNN
GCN
TSN
ST-GCN
network ensemble
Issue Date: 24-Jun-2021
Abstract: Visual emotion recognition constitutes a major subject in the interdisciplinary field of Computer Vision which is associated with the process of identifying human emotion on categorical (discrete) and/or dimensional (continuous) level, as it is being depicted in still images or video sequences. A review of related literature reveals that the majority of past efforts in visual emotion recognition have been mostly limited to the analysis of facial expressions, while some studies have either incorporated information relative to body pose or have attempted to perform emotion recognition solely on the basis of body movements and gestures. While some of these approaches perform well in controlled environments, they fail to interpret real-world scenarios where unpredictable social settings can render one or multiple of the aforementioned sources of affective information inaccessible. However, evidence from psychology related studies suggest that visual context, in addition to facial expression and body pose, provides important information to the perception of people’s emotions. In this work, we aim at reinforcing the concept of context-based visual emotion recognition. To this end, we conduct extensive experiments on two newly assembled and challenging databases, i.e. the EMOTions In Context (EMOTIC) and Body Language Dataset (BoLD), tackling both the image-based and video-based versions of the problem. More specifically we: • Extend already successful baseline architectures by incorporating multiple input streams that encode bodily, facial, contextual as well as scene related features, thus enhancing our models’ understanding of visual context and emotion in general. • Directly infuse scene classification scores and attributes as additional features in the emotion recognition process that function in a complementary manner with respect to all other sources of affective information. To the best of our knowledge, our approach is the first to do so. • Exploit categorical emotion label dependencies, that reside within the datasets, through the usage of Graph Convolutional Networks (GCN) and the addition of metric-learning inspired loss that is based on GloVe word embeddings. • Achieve competitive results on EMOTIC and significant improvements over the stateof- the-art techniques with relation to BoLD. A big portion of our contributions [86] was submitted to the 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG), with the authors being Ioannis Pikoulis, Panagiotis Paraskevas Filntisis and Petros Maragos.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17971
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
el15198_thesis_final.pdfΔιπλωματική Εργασία - Κύριο Έγγραφο22.94 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.