Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19186
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΑνδρεάδης, Δημήτριος-
dc.date.accessioned2024-07-19T07:59:08Z-
dc.date.available2024-07-19T07:59:08Z-
dc.date.issued2024-07-16-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19186-
dc.description.abstractLarge language models (LLMs) are becoming mainstream and easily accessible, ushering in an explosion of machine-generated content over various channels, such as news, social media, question-answering forums, educational, and even academic contexts. Recent LLMs, such as ChatGPT and GPT-4, generate remarkably fluent responses to a wide variety of user queries. The articulate nature of such generated texts makes LLMs attractive for replacing human labor in many scenarios. However, this has also resulted in concerns regarding their potential misuse, such as spreading misinformation and causing disruptions in the education system. Since humans perform only slightly better than chance when classifying machine-generated vs. human written text, there is a need to develop automatic systems to identify machine-generated text with the goal of mitigating its potential misuse. This need is addressed by the 8th task of the SemEval Workshop 2024. In this thesis, we aimed to make a substantial step towards exploring this interesting task by addresing subtasks A and B for the 8th SemEval task. As a starting point, we experimented on fine-tuning pre trained language models (PLMs) for machine-generated text detection (MGTD), examining the effect of the hyperparameters on the accuracy. We suggest the use of prompt tuning as an effective adapter technique that further boosts performance. Moreover, we tried to apply our findings to the more difficult subtask of author attribution (AA). For the multilingual track of MGTD, we attempted to detect the source language of the texts and then translated them as well as used language adapters to test if further improvements can be achieved. Apart from model and adapter tuning, we also explored another approach. By making use of multiple PLMs, we calculated fixed-length perplexities. Overall, this thesis attempts to unveil the potential of methods towards the solution of the problems of MGTD and AA, reaching insightful conclusions.en_US
dc.languageelen_US
dc.subjectMachine Generated Text Detectionen_US
dc.subjectAuthor Attributionen_US
dc.subjectPretrained Language Modelsen_US
dc.subjectLarge Language Modelsen_US
dc.subjectAdapter Tuningen_US
dc.subjectPrompt Tuningen_US
dc.subjectFixed-length Perplexityen_US
dc.subjectMultilingualen_US
dc.titleLarge Language Models, Adapters and Perplexity Scores for Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detectionen_US
dc.description.pages81en_US
dc.contributor.supervisorΣτάμου Γιώργοςen_US
dc.departmentΤομέας Συστημάτων Μετάδοσης Πληροφορίας και Τεχνολογίας Υλικώνen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Diploma_Thesis_Andreadis_Dimitrios.pdf1.51 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.