Photo-realistic neural rendering for emotion-related semantic manipulation of unconstrained facial videos

Παραπέρας Παπαντωνίου, Φοίβος

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18259

Πλήρες αρχείο μεταδεδομένων

Πεδίο DC	Τιμή	Γλώσσα
dc.contributor.author	Παραπέρας Παπαντωνίου, Φοίβος	-
dc.date.accessioned	2022-03-10T08:40:56Z	-
dc.date.available	2022-03-10T08:40:56Z	-
dc.date.issued	2022-02-28	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18259	-
dc.description.abstract	Recent advances in generative Deep Learning have made it possible to synthesize and manipulate images and videos with unprecedented realism, giving rise to a plethora of creative applications lying at the intersection of Computer Vision and Computer Graphics. In fact, a class of generative models, known as Generative Adversarial Networks (GANs), have proven particularly successful at generating images of human faces, leading to the new era of synthetic visual facial content known as “deepfakes”. For instance, deepfake techniques such as face swap or attribute (e.g. hair color, gender) manipulation have become quite popular since they rely solely on neural networks, without requiring expertise on digital effects. Yet, when it comes to manipulating dynamic facial expressions encountered in videos of talking faces, explicit prior knowledge of the face’s structure is usually needed. To this end, challenging applications such as face reenactment typically employ 3D face representations that can be obtained by fitting a statistical morphable model (3D Morphable Model - 3DMM) to a given image/video in a way that disentangles the expressions of the face from its rest modes of variation. Still, these methods are often limited to making a target actor directly mimic the expressions of a source actor without any further semantic control over these expressions. Motivated by this, our goal in this thesis is simple, yet challenging: the development of a novel deepfake system for altering the dynamic emotion conveyed by an actor in a video in an easily interpretable way, i.e. by even using as a sole input the semantic labels of the desired emotions, while at the same time preserving the original words of the talking person. Our main contributions can be summarized as follows: • We perform an in-depth review of the literature related to photo-realistic emotion manipulation in face images drawing conclusions about the limitations and challenges of the current SOTA. We, also, provide an overview of the latest developments in the fields of 3D face modelling and GAN-based image synthesis, some of which are carefully integrated in our system. • We propose the first - to our knowledge - deep learning method, which we call Neural Emotion Director, for “directing” the emotional state of actors in unconstrained (“in-the-wild”) videos, by translating their facial expressions to multiple unseen emotions or styles, without altering the lip movements. • We introduce a GAN-based network, called 3D-based Emotion Manipulator, that receives a sequence of facial expression parameters across consecutive frames and translates them to a given target emotion or a specific reference style. We, then, design a video-based neural face renderer for decoding the parametric representation of the altered expressions back to photo-realistic frames. We modify only the face area, while the background remains unchanged. • We assess our method through extensive qualitative and quantitative experiments, user and ablation studies and compare it with recent state-of-the-art methods demonstrating its superiority and advantages. We achieve promising results in very challenging scenarios like the ones found in movie scenes with moving background objects. Our work [93] was accepted to the 2022 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), with the authors being Foivos Paraperas Papantoniou, Panagiotis P. Filntisis, Petros Maragos and Anastasios Roussos. Our demo youtube video and source code can be found in our project website: https://foivospar.github.io/NED/.	en_US
dc.language	en	en_US
dc.subject	emotion manipulation	en_US
dc.subject	facial expressions	en_US
dc.subject	deepfakes	en_US
dc.subject	GANs	en_US
dc.subject	3DMMs	en_US
dc.subject	neural rendering	en_US
dc.subject	deep neural networks	en_US
dc.subject	video editing	en_US
dc.subject	VFX	en_US
dc.title	Photo-realistic neural rendering for emotion-related semantic manipulation of unconstrained facial videos	en_US
dc.description.pages	118	en_US
dc.contributor.supervisor	Μαραγκός Πέτρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
FoivosPP_NTUA_thesis_final.pdf		56.25 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε τη σύντομη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.