Creating a Conversational Framework Between LLMs for Measuring Deception in Three-Party Dialogue Scenarios

Σταματίου, Σπυρίδων

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19729

Πλήρες αρχείο μεταδεδομένων

Πεδίο DC	Τιμή	Γλώσσα
dc.contributor.author	Σταματίου, Σπυρίδων	-
dc.date.accessioned	2025-07-15T10:26:37Z	-
dc.date.available	2025-07-15T10:26:37Z	-
dc.date.issued	2025-07-03	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19729	-
dc.description.abstract	Large Language Models (LLMs) have rapidly advanced in recent years, demonstrating impressive capabilities in natural language understanding, generation, and multi-turn conversation. These models are capable not only of responding fluently and contextually to prompts but also of simulating human-like behavior in dialogue. As models evolve, the potential for LLMs to engage in deceptive behavior, intentionally or emergently, has raised important questions about their transparency, interpretability, and the ethical implications of their deployment. Prior research has shown that LLMs can mimic human discourse to a degree that can make distinguishing between human and machine increasingly difficult, especially in open-ended or strategic communication settings. Building upon this foundation, the present thesis introduces am experimental framework for studying deception and detection among LLMs in controlled conversational environments. In this framework, three LLMs are assigned roles as Alice, Bob, and Charlie, and are prompted to engage in structured three-person dialogues. Each model is explicitly instructed to behave as if it were human, with two competing goals: concealing their identity while trying to detect other LLMs. The models are organized into groups based on their parameter size , and engage in multi turn conversations of varying lengths . After each conversation, every model casts a vote for the identity (human or AI) of the other two participants, along with a natural language explanation justifying each classification. These explanations are collected and categorized, resulting in visual representations of the reasoning strategies used by LLMs when attempting to detect or deceive others. The results before the Persona Prompts were varying, with most of the top performing models in the smaller model groups averaging ~50% AI detection rates. The Sate-of-the-art models' best performer, Claude 3.7 Sonnet ranged from 19.08% in shorter conversations, up to 66.17% AI detection in bigger conversation lengths. To assess the influence of persona construction on deception effectiveness, the experiment is repeated with models prompted to adopt human-like personas. The results are afterwards compared to evaluate whether an enhanced persona engineering improves the models’ ability to deceive or alter their judgment when classifying others. Especially in the larger models, there was significant success, with Claude 3.7 Sonnet and Llama 3.1 (405B) managing to avoid detection by up to 100% in certain experimental setups.	en_US
dc.language	en	en_US
dc.subject	Machine Learning	en_US
dc.subject	Large Language Models	en_US
dc.subject	Prompt Engineering	en_US
dc.subject	Conversational AI	en_US
dc.subject	Deception	en_US
dc.title	Creating a Conversational Framework Between LLMs for Measuring Deception in Three-Party Dialogue Scenarios	en_US
dc.description.pages	186	en_US
dc.contributor.supervisor	Στάμου Γιώργος	en_US
dc.department	Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	en_US
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
thesis_stamatiou_final.pdf		4.11 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε τη σύντομη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.