Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19577
Title: | Knowledge Transfer from Large Vision-Language Models for Localization and Segmentation in 2D Medical Imaging |
Authors: | Τριανταφύλλης, Γεώργιος Βουλόδημος Αθανάσιος |
Keywords: | Medical Imaging, Grounded Segmentation, Large Vision-Language Models, Fine-Tuning, GroundingDINO, SAM2, MedSAM2, MRI, CT |
Issue Date: | 26-Mar-2025 |
Abstract: | Grounded segmentation of medical images is a challenging task requiring expert annotated datasets, which are scarce. To address this problem, we employ Large Vision Language Models (LVLMs) as well as deterministic algorithms to generate the missing textual descriptions for organ masks. For the grounded segmentation task, a pipeline is developed consisting of GroundingDINO and SAM2 or Med-SAM2 with only GroundingDINO being fine-tuned. The dataset used for this study is RAOS, which includes CT scans and synthetic MRI images. Our experiments assess the accuracy of LLaVA-Med’s responses and the performance of the proposed fine-tuned pipeline to various prompting strategies on both in-distribution and out-of-distribution images. The results indicate that LLaVA-Med alone cannot reliably generate the textual descriptions due to its limited reasoning ability. Additionally, our results show that the proposed pipeline performs well within the closed setting in which it was applied while acknowledging inherent limitations. |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19577 |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
DIPLOMATIKI_english.pdf | 7.68 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.