Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19642
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Αραβανής, Τηλέμαχος | - |
dc.date.accessioned | 2025-07-02T07:28:38Z | - |
dc.date.available | 2025-07-02T07:28:38Z | - |
dc.date.issued | 2025-07-01 | - |
dc.identifier.uri | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19642 | - |
dc.description.abstract | Generative image models have seen significant advancements in the past few years, enabling the creation of highly realistic images from text prompts. However, while proficient in high-fidelity image generation and alignment with the text prompt, text-to-image generative models do not offer the desired controllability to the user via just text. For this reason, personalization algorithms have been developed within these models to allow users to guide image generation in ways that reflect specific preferences, making the output more tailored and meaningful. More specifically, an envisioned research direction is the rendition of images that share the same visual interpretation of a text-specified style. Recent state-of-the-art personalization techniques in generative text-to-image models aim to achieve this by finetuning the model's backbone using a set of images that share common visual stylistic elements. To address the high computational cost associated with this optimization, more recent methods utilize the attention layers of the model during batched inference to transfer stylistic visual elements from a reference image to others within the batch. However, these stylistic alignment approaches often fail to effectively separate semantic content from stylistic elements, leading to content leakage from the reference image, due to the uniform application across instances. We contend that the inherent variability in text-to-image models, stemming from input prompts and noise, necessitates an adaptive approach within these style alignment methods. To address this challenge, we exploit the explainability of the attention mechanism and propose a novel method that mitigates content leakage in a semantically coherent manner within the context of attention-based style alignment, while preserving stylistic consistency. Furthermore, to enhance adaptivity, we introduce a content leakage localization process during inference, allowing the tuning of the stylistic alignment process to faithfully transfer the desired style. Our method’s evaluation across diverse image objects and styles, demonstrates a significant improvement compared to state-of-the-art style alignment methods, removing the undesired effect of content leakage, while maintaining the desired stylistic alignment. | en_US |
dc.language | el | en_US |
dc.subject | generative models | en_US |
dc.subject | text-to-image generation | en_US |
dc.subject | attention-based models | en_US |
dc.subject | explainability | en_US |
dc.subject | personalization | en_US |
dc.subject | stylistic alignment | en_US |
dc.subject | content leakage | en_US |
dc.title | Text to image stylistic alignment via explainability of attention | en_US |
dc.description.pages | 116 | en_US |
dc.contributor.supervisor | Μαραγκός Πέτρος | en_US |
dc.department | Τομέας Σημάτων, Ελέγχου και Ρομποτικής | en_US |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Thesis_Aravanis-2.pdf | 122.54 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.