Investigating the Capabilities of Language Models in Puzzle Reasoning: A Survey and Experimental Analysis

Γιαδικιάρογλου, Παναγιώτης

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19369

Τίτλος:	Investigating the Capabilities of Language Models in Puzzle Reasoning: A Survey and Experimental Analysis
Συγγραφείς:	Γιαδικιάρογλου, Παναγιώτης Στάμου Γιώργος
Λέξεις κλειδιά:	Large Language Models Reasoning Puzzle Solving Prompting Neurosymbolic Methods
Ημερομηνία έκδοσης:	24-Οκτ-2024
Περίληψη:	Puzzle-solving has long served as a benchmark for evaluating artificial intelligence, testing a model’s ability to reason, infer, and strategize across complex problem spaces. Traditional AI and machine learning methods, such as symbolic reasoning and reinforcement learning, have made notable strides in structured domains like board games and logic puzzles. However, as neural networks and, more recently, large language models (LLMs) have evolved, new possibilities have emerged for tackling a broader range of puzzle types, including those requiring nuanced commonsense reasoning, abstract pattern recognition, and complex multi-step calculations. LLMs, with their vast data-driven language capabilities, hold unique potential to bridge structured logical tasks and less formal, knowledge-based puzzles. Despite these advances, the current landscape of puzzle-solving with LLMs reveals both achievements and limitations, particularly when models are tasked with problems that demand interpretative reasoning and precise calculation. This thesis explores the evolving role of LLMs in solving such complex reasoning tasks, specifically focusing on their puzzle-solving capabilities. Divided into two main sections, the thesis first provides a comprehensive survey of recent advancements in LLM methodologies, covering diverse prompting techniques, neuro-symbolic approaches, and fine-tuning strategies for puzzles. Using a newly proposed taxonomy, puzzles are categorized into rule-based and rule-less types, with each category examined for its unique cognitive demands on LLMs. The second section presents experimental evaluations conducted on four datasets—two math-based datasets (GSM8K, SVAMP) and two puzzle-focused datasets (Game of 24 and RiddleSense). Various reasoning techniques, including Input-Output (IO) prompting, Chain-of-Thought (CoT), Least-to-Most (LtM), and Faithful-CoT methods, are employed to assess LLM performance. Models of varying scales, particularly smaller LLMs like Llama-3.1 family and Mistral, are tested across settings such as zero-shot, few-shot, and self-consistency to evaluate their efficacy in solving complex and multi-step reasoning tasks. The thesis provides critical insights into the performance limitations of current LLMs in puzzle-solving, particularly noting that advanced reasoning methods like Faithful-CoT and puzzle translation techniques yield inconsistent improvements with smaller models. Finally, it outlines future research directions, advocating for expanded dataset creation, neuro-symbolic integration, and advancements in puzzle generation. This thesis aims to deepen our understanding of LLMs' reasoning abilities and highlight pathways to enhance their performance in complex cognitive tasks.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19369
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
Diploma_Thesis_Giadikiaroglou.pdf	Diploma Thesis	4 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε την πλήρη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.