Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/20087| Title: | Abductive Event Reasoning with Large Language Models |
| Authors: | Καραφύλλης, Νικόλαος Βουλόδημος Αθανάσιος |
| Keywords: | Abductive Reasoning Large Language Models Causality Causal Graphs Retrieval-Augmented Generation Prompt Engineering Multi-agent Systems Prompt Optimisation |
| Issue Date: | 17-Mar-2026 |
| Abstract: | Abductive reasoning, the process of inferring the most plausible causes from incomplete evidence, remains a significant challenge for Large Language Models (LLMs), demanding simultaneous evaluation of competing hypotheses under uncertainty. This diploma thesis addresses this challenge through the lens of SemEval 2026 Task 12: Abductive Event Reasoning, where a system must identify all plausible direct causes of a target event from four candidate explanations, using multi-document evidence as context. We develop two complementary approaches: a three-stage direct prompting pipeline combining hybrid GraphRAG retrieval, structured XML prompting refined through GEPA prompt optimization, and eight deterministic post-hoc verification rules; and an auxiliary multi-expert causal graph in which four specialized experts collaboratively construct explicit directed acyclic graphs with confidence-scored edges, providing interpretable causal chains that support human verification of the system’s reasoning. Our system achieves an accuracy of 0.95 on the test set, ranking first on the SemEval 2026 Task 12 evaluation-phase leaderboard. Through a cross-model error analysis spanning 15 configurations across 7 LLM families and the Causal Graph System, we identify three shared inductive biases: a single-cause default that reduces the annotated cause count by 47%, temporal proximity preference driving all wrong-answer failures, and salience preference favouring dramatic over subtler contributing causes. The Causal Graph System partially mitigates these biases, exhibiting the smallest multi-answer gap (−14.7 pp) and contributing 12 unique correct predictions, the most of any individual system. |
| URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/20087 |
| Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Diploma_Thesis_N_Karafyllis .pdf | 1.85 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.