Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19557
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Πρατικάκη, Χρύσα | - |
dc.date.accessioned | 2025-03-17T18:16:31Z | - |
dc.date.available | 2025-03-17T18:16:31Z | - |
dc.date.issued | 2025-03-07 | - |
dc.identifier.uri | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19557 | - |
dc.description.abstract | Sign Languages are the primary form of communication for Deaf communities across the world. It is estimated that more than 70 million people make part of the deaf and hard-of-hearing (DHH) community, while there are more than 200 Sign Languages across the world. To break the communication barriers between the DHH and the hearing communities, it is imperative to build systems capable of translating the spoken language into sign language and reciprocally. To this end, continuous sign language translation and production are the two necessary components for making such machine-learning based system. Over the past three decades, Sign Language Translation has gained significant interest, resulting in a plethora of publications exploring various technologies to address the challenge. However, Sign Language Production is considered to be one of the most challenging open problems regarding Sign Language technologies. The most recent suggested technologies for SLP tackle the synthesis of photorealistic sign language videos with neural machine translation and a variety of generative models, and although these methods show encouraging results, there remains potential for further adaptations and innovation. Building on insights from previous research, the central objective of this thesis is to develop a robust deep learning model for Sign Language Production (SLP). We tackle this task by utilizing a transformer-based architecture that enables the translation from text input to human pose keypoints. Furthermore, we explore the photerealistic aspect of the problem, aiming to create a complete SLP pipeline that transforms text directly into realistic human SL videos. For the photorealistic module, we harness Generative Adversarial Networks (GANs) to perform neural rendering on the pose sequences generated by the transformer model. Finally, we evaluate the effectiveness of the proposed pipeline on three different datasets through an extensive series of comparative analyses, ablation studies, and user studies. Part of our work was accepted at the 18th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2025), titled "A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations" with the authors being Chrysa Pratikaki, Panagiotis Filntisis, Athanasios Katsamanis, Anastasios Roussos and Petros Maragos. | en_US |
dc.language | en | en_US |
dc.subject | Sign Language Production | en_US |
dc.subject | Deep Learning | en_US |
dc.subject | Transformers | en_US |
dc.subject | Generative Adversarial Networks | en_US |
dc.subject | Neural Rendering | en_US |
dc.subject | Pose Estimation | en_US |
dc.title | Photorealistic Sign Language Production from Text using Transformer Networks and Neural Rendering | en_US |
dc.description.pages | 115 | en_US |
dc.contributor.supervisor | Μαραγκός Πέτρος | en_US |
dc.department | Τομέας Σημάτων, Ελέγχου και Ρομποτικής | en_US |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
thesis_chrysa_pratikaki.pdf | 20.62 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.