Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19729
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Σταματίου, Σπυρίδων | - |
dc.date.accessioned | 2025-07-15T10:26:37Z | - |
dc.date.available | 2025-07-15T10:26:37Z | - |
dc.date.issued | 2025-07-03 | - |
dc.identifier.uri | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19729 | - |
dc.description.abstract | Large Language Models (LLMs) have rapidly advanced in recent years, demonstrating impressive capabilities in natural language understanding, generation, and multi-turn conversation. These models are capable not only of responding fluently and contextually to prompts but also of simulating human-like behavior in dialogue. As models evolve, the potential for LLMs to engage in deceptive behavior, intentionally or emergently, has raised important questions about their transparency, interpretability, and the ethical implications of their deployment. Prior research has shown that LLMs can mimic human discourse to a degree that can make distinguishing between human and machine increasingly difficult, especially in open-ended or strategic communication settings. Building upon this foundation, the present thesis introduces am experimental framework for studying deception and detection among LLMs in controlled conversational environments. In this framework, three LLMs are assigned roles as Alice, Bob, and Charlie, and are prompted to engage in structured three-person dialogues. Each model is explicitly instructed to behave as if it were human, with two competing goals: concealing their identity while trying to detect other LLMs. The models are organized into groups based on their parameter size , and engage in multi turn conversations of varying lengths . After each conversation, every model casts a vote for the identity (human or AI) of the other two participants, along with a natural language explanation justifying each classification. These explanations are collected and categorized, resulting in visual representations of the reasoning strategies used by LLMs when attempting to detect or deceive others. The results before the Persona Prompts were varying, with most of the top performing models in the smaller model groups averaging ~50% AI detection rates. The Sate-of-the-art models' best performer, Claude 3.7 Sonnet ranged from 19.08% in shorter conversations, up to 66.17% AI detection in bigger conversation lengths. To assess the influence of persona construction on deception effectiveness, the experiment is repeated with models prompted to adopt human-like personas. The results are afterwards compared to evaluate whether an enhanced persona engineering improves the models’ ability to deceive or alter their judgment when classifying others. Especially in the larger models, there was significant success, with Claude 3.7 Sonnet and Llama 3.1 (405B) managing to avoid detection by up to 100% in certain experimental setups. | en_US |
dc.language | en | en_US |
dc.subject | Machine Learning | en_US |
dc.subject | Large Language Models | en_US |
dc.subject | Prompt Engineering | en_US |
dc.subject | Conversational AI | en_US |
dc.subject | Deception | en_US |
dc.title | Creating a Conversational Framework Between LLMs for Measuring Deception in Three-Party Dialogue Scenarios | en_US |
dc.description.pages | 186 | en_US |
dc.contributor.supervisor | Στάμου Γιώργος | en_US |
dc.department | Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών | en_US |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
thesis_stamatiou_final.pdf | 4.11 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.