Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19186
Title: Large Language Models, Adapters and Perplexity Scores for Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection
Authors: Ανδρεάδης, Δημήτριος
Στάμου Γιώργος
Keywords: Machine Generated Text Detection
Author Attribution
Pretrained Language Models
Large Language Models
Adapter Tuning
Prompt Tuning
Fixed-length Perplexity
Multilingual
Issue Date: 16-Jul-2024
Abstract: Large language models (LLMs) are becoming mainstream and easily accessible, ushering in an explosion of machine-generated content over various channels, such as news, social media, question-answering forums, educational, and even academic contexts. Recent LLMs, such as ChatGPT and GPT-4, generate remarkably fluent responses to a wide variety of user queries. The articulate nature of such generated texts makes LLMs attractive for replacing human labor in many scenarios. However, this has also resulted in concerns regarding their potential misuse, such as spreading misinformation and causing disruptions in the education system. Since humans perform only slightly better than chance when classifying machine-generated vs. human written text, there is a need to develop automatic systems to identify machine-generated text with the goal of mitigating its potential misuse. This need is addressed by the 8th task of the SemEval Workshop 2024. In this thesis, we aimed to make a substantial step towards exploring this interesting task by addresing subtasks A and B for the 8th SemEval task. As a starting point, we experimented on fine-tuning pre trained language models (PLMs) for machine-generated text detection (MGTD), examining the effect of the hyperparameters on the accuracy. We suggest the use of prompt tuning as an effective adapter technique that further boosts performance. Moreover, we tried to apply our findings to the more difficult subtask of author attribution (AA). For the multilingual track of MGTD, we attempted to detect the source language of the texts and then translated them as well as used language adapters to test if further improvements can be achieved. Apart from model and adapter tuning, we also explored another approach. By making use of multiple PLMs, we calculated fixed-length perplexities. Overall, this thesis attempts to unveil the potential of methods towards the solution of the problems of MGTD and AA, reaching insightful conclusions.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19186
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Diploma_Thesis_Andreadis_Dimitrios.pdf1.51 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.