Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19577
Title: Knowledge Transfer from Large Vision-Language Models for Localization and Segmentation in 2D Medical Imaging
Authors: Τριανταφύλλης, Γεώργιος
Βουλόδημος Αθανάσιος
Keywords: Medical Imaging, Grounded Segmentation, Large Vision-Language Models, Fine-Tuning, GroundingDINO, SAM2, MedSAM2, MRI, CT
Issue Date: 26-Mar-2025
Abstract: Grounded segmentation of medical images is a challenging task requiring expert annotated datasets, which are scarce. To address this problem, we employ Large Vision Language Models (LVLMs) as well as deterministic algorithms to generate the missing textual descriptions for organ masks. For the grounded segmentation task, a pipeline is developed consisting of GroundingDINO and SAM2 or Med-SAM2 with only GroundingDINO being fine-tuned. The dataset used for this study is RAOS, which includes CT scans and synthetic MRI images. Our experiments assess the accuracy of LLaVA-Med’s responses and the performance of the proposed fine-tuned pipeline to various prompting strategies on both in-distribution and out-of-distribution images. The results indicate that LLaVA-Med alone cannot reliably generate the textual descriptions due to its limited reasoning ability. Additionally, our results show that the proposed pipeline performs well within the closed setting in which it was applied while acknowledging inherent limitations.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19577
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
DIPLOMATIKI_english.pdf7.68 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.