Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19557
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΠρατικάκη, Χρύσα-
dc.date.accessioned2025-03-17T18:16:31Z-
dc.date.available2025-03-17T18:16:31Z-
dc.date.issued2025-03-07-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19557-
dc.description.abstractSign Languages are the primary form of communication for Deaf communities across the world. It is estimated that more than 70 million people make part of the deaf and hard-of-hearing (DHH) community, while there are more than 200 Sign Languages across the world. To break the communication barriers between the DHH and the hearing communities, it is imperative to build systems capable of translating the spoken language into sign language and reciprocally. To this end, continuous sign language translation and production are the two necessary components for making such machine-learning based system. Over the past three decades, Sign Language Translation has gained significant interest, resulting in a plethora of publications exploring various technologies to address the challenge. However, Sign Language Production is considered to be one of the most challenging open problems regarding Sign Language technologies. The most recent suggested technologies for SLP tackle the synthesis of photorealistic sign language videos with neural machine translation and a variety of generative models, and although these methods show encouraging results, there remains potential for further adaptations and innovation. Building on insights from previous research, the central objective of this thesis is to develop a robust deep learning model for Sign Language Production (SLP). We tackle this task by utilizing a transformer-based architecture that enables the translation from text input to human pose keypoints. Furthermore, we explore the photerealistic aspect of the problem, aiming to create a complete SLP pipeline that transforms text directly into realistic human SL videos. For the photorealistic module, we harness Generative Adversarial Networks (GANs) to perform neural rendering on the pose sequences generated by the transformer model. Finally, we evaluate the effectiveness of the proposed pipeline on three different datasets through an extensive series of comparative analyses, ablation studies, and user studies. Part of our work was accepted at the 18th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA 2025), titled "A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations" with the authors being Chrysa Pratikaki, Panagiotis Filntisis, Athanasios Katsamanis, Anastasios Roussos and Petros Maragos.en_US
dc.languageenen_US
dc.subjectSign Language Productionen_US
dc.subjectDeep Learningen_US
dc.subjectTransformersen_US
dc.subjectGenerative Adversarial Networksen_US
dc.subjectNeural Renderingen_US
dc.subjectPose Estimationen_US
dc.titlePhotorealistic Sign Language Production from Text using Transformer Networks and Neural Renderingen_US
dc.description.pages115en_US
dc.contributor.supervisorΜαραγκός Πέτροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
thesis_chrysa_pratikaki.pdf20.62 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.