Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19642
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΑραβανής, Τηλέμαχος-
dc.date.accessioned2025-07-02T07:28:38Z-
dc.date.available2025-07-02T07:28:38Z-
dc.date.issued2025-07-01-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19642-
dc.description.abstractGenerative image models have seen significant advancements in the past few years, enabling the creation of highly realistic images from text prompts. However, while proficient in high-fidelity image generation and alignment with the text prompt, text-to-image generative models do not offer the desired controllability to the user via just text. For this reason, personalization algorithms have been developed within these models to allow users to guide image generation in ways that reflect specific preferences, making the output more tailored and meaningful. More specifically, an envisioned research direction is the rendition of images that share the same visual interpretation of a text-specified style. Recent state-of-the-art personalization techniques in generative text-to-image models aim to achieve this by finetuning the model's backbone using a set of images that share common visual stylistic elements. To address the high computational cost associated with this optimization, more recent methods utilize the attention layers of the model during batched inference to transfer stylistic visual elements from a reference image to others within the batch. However, these stylistic alignment approaches often fail to effectively separate semantic content from stylistic elements, leading to content leakage from the reference image, due to the uniform application across instances. We contend that the inherent variability in text-to-image models, stemming from input prompts and noise, necessitates an adaptive approach within these style alignment methods. To address this challenge, we exploit the explainability of the attention mechanism and propose a novel method that mitigates content leakage in a semantically coherent manner within the context of attention-based style alignment, while preserving stylistic consistency. Furthermore, to enhance adaptivity, we introduce a content leakage localization process during inference, allowing the tuning of the stylistic alignment process to faithfully transfer the desired style. Our method’s evaluation across diverse image objects and styles, demonstrates a significant improvement compared to state-of-the-art style alignment methods, removing the undesired effect of content leakage, while maintaining the desired stylistic alignment.en_US
dc.languageelen_US
dc.subjectgenerative modelsen_US
dc.subjecttext-to-image generationen_US
dc.subjectattention-based modelsen_US
dc.subjectexplainabilityen_US
dc.subjectpersonalizationen_US
dc.subjectstylistic alignmenten_US
dc.subjectcontent leakageen_US
dc.titleText to image stylistic alignment via explainability of attentionen_US
dc.description.pages116en_US
dc.contributor.supervisorΜαραγκός Πέτροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Thesis_Aravanis-2.pdf122.54 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.