Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19875| Τίτλος: | Occlusion-Robust Audiovisual Face Reconstruction with Temporal Modeling |
| Συγγραφείς: | Αγγελική Τσινούκα Μαραγκός Πέτρος |
| Λέξεις κλειδιά: | Audiovisual Face Reconstruction Multimodal Learning Talking Avatars Face Modeling Synthetic Occlusions |
| Ημερομηνία έκδοσης: | 28-Οκτ-2025 |
| Περίληψη: | Recent progress in deep learning for 3D face reconstruction and animation has enabled the creation of realistic digital humans, capable of reproducing subtle expressions and natural speechdriven motion. These advances open new opportunities for applications across communication, entertainment, AR/VR, and education. Nevertheless, significant challenges remain. Conventional approaches often fail to fully capture the expressive dynamics of human faces, they struggle with temporal consistency, and they are not robust to real-world conditions such as partial occlusions or interfering background voices. As the demand for detailed digital avatars grows, especially in interactive systems, developing methods that combine realism, expressiveness, and robustness becomes increasingly crucial. This thesis addresses the challenges in 3D face reconstruction through the design of an audiovisual learning technique from input video, which we call FAVOR, extending the SMIRK framework. Our method applies synthetic occlusions to the training dataset to improve robustness and employs a lip-reading loss for supervision, guiding the model toward more accurate mouth movements. By combining multimodal signals from and training strategies that reflect real-world variability, the proposed approach generates talking avatars that remain coherent and natural even when visual information is missing or corrupted. The system also ensures proper synchronization between speech and facial motion, reducing common artifacts. Extensive experimental analysis on videos with natural occlusions demonstrates that the proposed model achieves robust and temporally consistent results compared to single-modality methods, both in qualitative and quantitative evaluations. A user study further confirms the perceptual quality of the generated avatars, while an ablation study highlights the contribution of each component to the overall performance. |
| URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19875 |
| Εμφανίζεται στις συλλογές: | Διπλωματικές Εργασίες - Theses |
Αρχεία σε αυτό το τεκμήριο:
| Αρχείο | Περιγραφή | Μέγεθος | Μορφότυπος | |
|---|---|---|---|---|
| thesis_angeliki_tsinouka.pdf | 35.49 MB | Adobe PDF | Εμφάνιση/Άνοιγμα |
Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.