Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19369
Title: | Investigating the Capabilities of Language Models in Puzzle Reasoning: A Survey and Experimental Analysis |
Authors: | Γιαδικιάρογλου, Παναγιώτης Στάμου Γιώργος |
Keywords: | Large Language Models Reasoning Puzzle Solving Prompting Neurosymbolic Methods |
Issue Date: | 24-Oct-2024 |
Abstract: | Puzzle-solving has long served as a benchmark for evaluating artificial intelligence, testing a model’s ability to reason, infer, and strategize across complex problem spaces. Traditional AI and machine learning methods, such as symbolic reasoning and reinforcement learning, have made notable strides in structured domains like board games and logic puzzles. However, as neural networks and, more recently, large language models (LLMs) have evolved, new possibilities have emerged for tackling a broader range of puzzle types, including those requiring nuanced commonsense reasoning, abstract pattern recognition, and complex multi-step calculations. LLMs, with their vast data-driven language capabilities, hold unique potential to bridge structured logical tasks and less formal, knowledge-based puzzles. Despite these advances, the current landscape of puzzle-solving with LLMs reveals both achievements and limitations, particularly when models are tasked with problems that demand interpretative reasoning and precise calculation. This thesis explores the evolving role of LLMs in solving such complex reasoning tasks, specifically focusing on their puzzle-solving capabilities. Divided into two main sections, the thesis first provides a comprehensive survey of recent advancements in LLM methodologies, covering diverse prompting techniques, neuro-symbolic approaches, and fine-tuning strategies for puzzles. Using a newly proposed taxonomy, puzzles are categorized into rule-based and rule-less types, with each category examined for its unique cognitive demands on LLMs. The second section presents experimental evaluations conducted on four datasets—two math-based datasets (GSM8K, SVAMP) and two puzzle-focused datasets (Game of 24 and RiddleSense). Various reasoning techniques, including Input-Output (IO) prompting, Chain-of-Thought (CoT), Least-to-Most (LtM), and Faithful-CoT methods, are employed to assess LLM performance. Models of varying scales, particularly smaller LLMs like Llama-3.1 family and Mistral, are tested across settings such as zero-shot, few-shot, and self-consistency to evaluate their efficacy in solving complex and multi-step reasoning tasks. The thesis provides critical insights into the performance limitations of current LLMs in puzzle-solving, particularly noting that advanced reasoning methods like Faithful-CoT and puzzle translation techniques yield inconsistent improvements with smaller models. Finally, it outlines future research directions, advocating for expanded dataset creation, neuro-symbolic integration, and advancements in puzzle generation. This thesis aims to deepen our understanding of LLMs' reasoning abilities and highlight pathways to enhance their performance in complex cognitive tasks. |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19369 |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Diploma_Thesis_Giadikiaroglou.pdf | Diploma Thesis | 4 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.