Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19875
Title: Occlusion-Robust Audiovisual Face Reconstruction with Temporal Modeling
Authors: Αγγελική Τσινούκα
Μαραγκός Πέτρος
Keywords: Audiovisual Face Reconstruction
Multimodal Learning
Talking Avatars
Face Modeling
Synthetic Occlusions
Issue Date: 28-Oct-2025
Abstract: Recent progress in deep learning for 3D face reconstruction and animation has enabled the creation of realistic digital humans, capable of reproducing subtle expressions and natural speechdriven motion. These advances open new opportunities for applications across communication, entertainment, AR/VR, and education. Nevertheless, significant challenges remain. Conventional approaches often fail to fully capture the expressive dynamics of human faces, they struggle with temporal consistency, and they are not robust to real-world conditions such as partial occlusions or interfering background voices. As the demand for detailed digital avatars grows, especially in interactive systems, developing methods that combine realism, expressiveness, and robustness becomes increasingly crucial. This thesis addresses the challenges in 3D face reconstruction through the design of an audiovisual learning technique from input video, which we call FAVOR, extending the SMIRK framework. Our method applies synthetic occlusions to the training dataset to improve robustness and employs a lip-reading loss for supervision, guiding the model toward more accurate mouth movements. By combining multimodal signals from and training strategies that reflect real-world variability, the proposed approach generates talking avatars that remain coherent and natural even when visual information is missing or corrupted. The system also ensures proper synchronization between speech and facial motion, reducing common artifacts. Extensive experimental analysis on videos with natural occlusions demonstrates that the proposed model achieves robust and temporally consistent results compared to single-modality methods, both in qualitative and quantitative evaluations. A user study further confirms the perceptual quality of the generated avatars, while an ablation study highlights the contribution of each component to the overall performance.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19875
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
thesis_angeliki_tsinouka.pdf35.49 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.