Multimodal Deep Learning for Emotion Recognition and Expression Synthesis with Applications in Human-Robot Interaction

Φιλντίσης, Παναγιώτης Παρασκευάς

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18566

Τίτλος:	Multimodal Deep Learning for Emotion Recognition and Expression Synthesis with Applications in Human-Robot Interaction
Συγγραφείς:	Φιλντίσης, Παναγιώτης Παρασκευάς Μαραγκός Πέτρος
Λέξεις κλειδιά:	Αναγνώριση Συναισθήματος, Γλώσσα Σώματος, Σκηνή, Σύνθεση Συναι- σθήματος, Οπτικοακουστικός, Βαθιά Μάθηση, Τρισδιάστατη Ανακατασκευή, 3D Morphable Models, Οπτική Ομιλία Emotion Recognition, Affect, Body Language, Context, Expression Syn- thesis, AudioVisual, Deep Learning, 3D Reconstruction, Visual Speech, 3D Morphable Models
Ημερομηνία έκδοσης:	7-Νοε-2022
Περίληψη:	Affective computing is an exciting new research area with the goal of equipping computers and robots with the capability of recognizing, expressing, modeling, and even ``feeling" emotions. An interdisciplinary field, affective computing draws resources from computer science, mathematics, cognitive sciences, and psychology. In this thesis, which is split into two major parts, we explore two aspects of affective computing; namely ``emotion recognition" and ``expression synthesis" since they constitute the most important aspects one needs to consider when building human-robot interaction systems. To this end, in the first part, we explore and study various information streams that contain valuable information for recognizing the emotions of a human, and design architectures based on deep learning, that can efficiently combine information from these streams, with the ultimate goal of deploying the system for human-robot interaction, with an emphasis in child-robot interaction scenarios. While traditional approaches for emotion recognition have mostly focused on facial expressions and speech, we take into account the body language the context, and also employ embeddings that accurately capture the semantic distances of discrete emotions. In the second part, we first enhance existing methods for audiovisual speech synthesis, by giving them the capabilities to both combine, and express emotions in different intensity levels. Then, we design a deep learning-based architecture for expressive audiovisual speech synthesis which achieves a high level of realism and expressiveness, outperforming previous methods. Lastly, we present the first method for visual speech aware monocular perceptual 3D reconstruction in the wild. This work tackles the traditional bottleneck of data collection for high-fidelity 3D ground truth data and offers the field of affective computing a way for easier acquisition of expressive 3D facial data data from monocular videos.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18566
Εμφανίζεται στις συλλογές:	Διδακτορικές Διατριβές - Ph.D. Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
Ph__D__Dissertation-5.pdf	Κείμενο Διδακτορικής Διατριβής	53.1 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε την πλήρη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.