Photorealistic Sign Language Production from Text using Transformer Networks and Neural Rendering

Πρατικάκη, Χρύσα

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19557

Full metadata record

DC Field	Value	Language
dc.contributor.author	Πρατικάκη, Χρύσα	-
dc.date.accessioned	2025-03-17T18:16:31Z	-
dc.date.available	2025-03-17T18:16:31Z	-
dc.date.issued	2025-03-07	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19557	-
dc.description.abstract	Sign Languages are the primary form of communication for Deaf communities across the world. It is estimated that more than 70 million people make part of the deaf and hard-of-hearing (DHH) community, while there are more than 200 Sign Languages across the world. To break the communication barriers between the DHH and the hearing communities, it is imperative to build systems capable of translating the spoken language into sign language and reciprocally. To this end, continuous sign language translation and production are the two necessary components for making such machine-learning based system. Over the past three decades, Sign Language Translation has gained significant interest, resulting in a plethora of publications exploring various technologies to address the challenge. However, Sign Language Production is considered to be one of the most challenging open problems regarding Sign Language technologies. The most recent suggested technologies for SLP tackle the synthesis of photorealistic sign language videos with neural machine translation and a variety of generative models, and although these methods show encouraging results, there remains potential for further adaptations and innovation. Building on insights from previous research, the central objective of this thesis is to develop a robust deep learning model for Sign Language Production (SLP). We tackle this task by utilizing a transformer-based architecture that enables the translation from text input to human pose keypoints. Furthermore, we explore the photerealistic aspect of the problem, aiming to create a complete SLP pipeline that transforms text directly into realistic human SL videos. For the photorealistic module, we harness Generative Adversarial Networks (GANs) to perform neural rendering on the pose sequences generated by the transformer model. Finally, we evaluate the effectiveness of the proposed pipeline on three different datasets through an extensive series of comparative analyses, ablation studies, and user studies. Part of our work was accepted at the 18th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2025), titled "A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations" with the authors being Chrysa Pratikaki, Panagiotis Filntisis, Athanasios Katsamanis, Anastasios Roussos and Petros Maragos.	en_US
dc.language	en	en_US
dc.subject	Sign Language Production	en_US
dc.subject	Deep Learning	en_US
dc.subject	Transformers	en_US
dc.subject	Generative Adversarial Networks	en_US
dc.subject	Neural Rendering	en_US
dc.subject	Pose Estimation	en_US
dc.title	Photorealistic Sign Language Production from Text using Transformer Networks and Neural Rendering	en_US
dc.description.pages	115	en_US
dc.contributor.supervisor	Μαραγκός Πέτρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
thesis_chrysa_pratikaki.pdf		20.62 MB	Adobe PDF	View/Open

Show simple item record