Please use this identifier to cite or link to this item:
Title: Video Anonymization and Neural Rendering of Photo-realistic Human Actor Videos with applications to Sign Language
Authors: Τζε, Χριστίνα Ουρανία
Μαραγκός, Πέτρος
Μαραγκός Πέτρος
Keywords: Sign Language Video Anonymization
Sign Language Production
Photo-realistic Video Synthesis
Generative Adversarial Networks
Neural Rendering
Human Motion Retargeting
Performance Cloning
3D Human Pose Estimation
Issue Date: 17-Oct-2022
Abstract: Sign languages are languages that have evolved in deaf communities and use the visual-manual modality to convey meaning. Computer vision researchers have been studying sign languages for the last three decades. However, for many years, research had focused on recognizing isolated signs, mainly due to the lack of large scale datasets for training and evaluation. Participants' reluctance to contribute to data collection was partly related to their worries about privacy and video misuse, and continues to concern the research community. More recently, the availability of some sign language (SL) corpora as well as the development of algorithms that can learn from weak annotations, moved research towards continuous sign language recognition (CSLR) and sign language translation (SLT), i.e., recognizing signs from continuous signing videos and translating sign languages to spoken languages, respectively. One of the most challenging open problems of SL technologies is the generation of synthetic SL videos that allow SL users to experience a natural and fluid communication, similar to human-to-human SL communication. Most existing sign language production (SLP) techniques are based on animation of a computer-generated 3D avatar, followed by traditional 3D graphics rendering. However, this typically results to a low level of realism, as far as the appearance and motion of the avatars are concerned. As in the case of immature speech synthesis technologies (with e.g., robot-like synthesized voices), this creates important problems from the side of the users in terms of the plausibility and engagement with such technologies. Motivated by the aforementioned concerns and challenges in the SL field, our goal in this thesis is twofold: to propose a novel method for anonymizing videos using animated cartoon characters, and to develop a novel system for photo-realistic human video generation based on neural rendering. Our primary focus is to apply the proposed methods for the synthesis of new SL videos: given an input SL video, we can generate a video of a cartoon character or a human target signer making the same body movements, hand gestures, and facial expressions as the source signer. Our work [159] was accepted to the 2022 IEEE Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), with the authors being Christina O. Tze, Panagiotis P. Filntisis, Anastasios Roussos and Petros Maragos.
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Tze_Christina_Ourania_Thesis.pdf42.22 MBAdobe PDFView/Open

Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.