Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18259
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΠαραπέρας Παπαντωνίου, Φοίβος-
dc.date.accessioned2022-03-10T08:40:56Z-
dc.date.available2022-03-10T08:40:56Z-
dc.date.issued2022-02-28-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18259-
dc.description.abstractRecent advances in generative Deep Learning have made it possible to synthesize and manipulate images and videos with unprecedented realism, giving rise to a plethora of creative applications lying at the intersection of Computer Vision and Computer Graphics. In fact, a class of generative models, known as Generative Adversarial Networks (GANs), have proven particularly successful at generating images of human faces, leading to the new era of synthetic visual facial content known as “deepfakes”. For instance, deepfake techniques such as face swap or attribute (e.g. hair color, gender) manipulation have become quite popular since they rely solely on neural networks, without requiring expertise on digital effects. Yet, when it comes to manipulating dynamic facial expressions encountered in videos of talking faces, explicit prior knowledge of the face’s structure is usually needed. To this end, challenging applications such as face reenactment typically employ 3D face representations that can be obtained by fitting a statistical morphable model (3D Morphable Model - 3DMM) to a given image/video in a way that disentangles the expressions of the face from its rest modes of variation. Still, these methods are often limited to making a target actor directly mimic the expressions of a source actor without any further semantic control over these expressions. Motivated by this, our goal in this thesis is simple, yet challenging: the development of a novel deepfake system for altering the dynamic emotion conveyed by an actor in a video in an easily interpretable way, i.e. by even using as a sole input the semantic labels of the desired emotions, while at the same time preserving the original words of the talking person. Our main contributions can be summarized as follows: • We perform an in-depth review of the literature related to photo-realistic emotion manipulation in face images drawing conclusions about the limitations and challenges of the current SOTA. We, also, provide an overview of the latest developments in the fields of 3D face modelling and GAN-based image synthesis, some of which are carefully integrated in our system. • We propose the first - to our knowledge - deep learning method, which we call Neural Emotion Director, for “directing” the emotional state of actors in unconstrained (“in-the-wild”) videos, by translating their facial expressions to multiple unseen emotions or styles, without altering the lip movements. • We introduce a GAN-based network, called 3D-based Emotion Manipulator, that receives a sequence of facial expression parameters across consecutive frames and translates them to a given target emotion or a specific reference style. We, then, design a video-based neural face renderer for decoding the parametric representation of the altered expressions back to photo-realistic frames. We modify only the face area, while the background remains unchanged. • We assess our method through extensive qualitative and quantitative experiments, user and ablation studies and compare it with recent state-of-the-art methods demonstrating its superiority and advantages. We achieve promising results in very challenging scenarios like the ones found in movie scenes with moving background objects. Our work [93] was accepted to the 2022 IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), with the authors being Foivos Paraperas Papantoniou, Panagiotis P. Filntisis, Petros Maragos and Anastasios Roussos. Our demo youtube video and source code can be found in our project website: https://foivospar.github.io/NED/.en_US
dc.languageenen_US
dc.subjectemotion manipulationen_US
dc.subjectfacial expressionsen_US
dc.subjectdeepfakesen_US
dc.subjectGANsen_US
dc.subject3DMMsen_US
dc.subjectneural renderingen_US
dc.subjectdeep neural networksen_US
dc.subjectvideo editingen_US
dc.subjectVFXen_US
dc.titlePhoto-realistic neural rendering for emotion-related semantic manipulation of unconstrained facial videosen_US
dc.description.pages118en_US
dc.contributor.supervisorΜαραγκός Πέτροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
FoivosPP_NTUA_thesis_final.pdf56.25 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.