Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19755
Title: Optimizing Cardiology Diagnostic Accuracy via Synergistic AI-Human Integration
Authors: Kalogeropoulos, Michael
Καλογερόπουλος, Μιχαήλ
Νικήτα Κωνσταντίνα
Keywords: Artificial Intelligence
Large Language Models
Diagnostic Accuracy
Clinical Decision-Making
Electrocardiography
Human-AI Interaction
Issue Date: 4-Jul-2025
Abstract: Introduction: Artificial intelligence (AI) is becoming an increasingly important part of healthcare, especially as a tool to support clinical decision-making. Electrocardiogram (ECG) is a routine but critical task in medical practice, making it a strong candidate for AI assistance. While AI models like GPT-4o have shown strong accuracy in diagnosing ECGs, there’s still little real-world evidence on how they actually influence doctors’ decisions during interpretation. This study aims to close that gap by examining how GPT-4o affects clinicians’ diagnostic accuracy when reading ECGs. Methods: We carried out a controlled study using a questionnaire-based design with 25 physicians at different levels of experience: 10 cardiologists, 10 experienced internists, and 5 less experienced internists. Each participant reviewed the same set of 50 ECG cases twice, first on their own and then with GPT-4o’s diagnostic suggestion for each case. The cases included 20 everyday ECGs, 20 more challenging ones, and 10 extra challenging ECG cases where the AI provided an intentionally incorrect suggestion to test whether it could lead physicians into making errors. For each case, physicians recorded their initial diagnosis and had the option of revising it after seeing the AI’s suggestion. We compared diagnostic accuracy with and without AI assistance across experience levels and case difficulty, and we also tracked how often participants changed their answers. Statistical tests were used to validate whether these differences were significant. Results: GPT-4o achieved an accuracy of 72.5% on the ECG cases. With AI assistance, diagnostic accuracy improved for all physician groups. Cardiologists improved from 81.6% without AI to 84.8% with it. Experienced internists saw their accuracy rise from 63.4% to 73.8%, while less experienced internists improved from 43.2% to 55.6%. The biggest improvement was seen in the most difficult cases, where less experienced internists jumped from 38% to 75% with AI support. Overall, physicians mostly used AI to fix their initial mistakes, wrong-to-right answer changes outnumbered right-to-wrong changes by about 4:1. However, in cases with deliberately misleading AI suggestions, less experienced internists were completely misled, with their accuracy dropping to 0% in those cases due to overreliance on the AI. However, in cases where the AI gave intentionally incorrect suggestions, less experienced internists were fully misled, dropping to 0% accuracy due to overreliance. All improvements in accuracy were statistically significant (p < 0.05), although the size of the benefit varied depending on experience level. Conclusion: GPT-4o significantly improved ECG interpretation accuracy across all physician groups, with the biggest gains seen among less experienced clinicians. Even senior doctors saw modest benefits. However, the study also highlights the risk of automation bias when the AI was wrong, less experienced physicians were especially prone to follow its suggestions, leading to major drops in accuracy. These findings show that while AI like GPT-4o can be a valuable diagnostic aid, it must be used with caution.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19755
Appears in Collections:Μεταπτυχιακές Εργασίες - M.Sc. Theses

Files in This Item:
File Description SizeFormat 
MICHAEL KALOGEROPOULOS MSc THESIS .pdf3.92 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.