Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19587
Title: | Unlearning Sensitive Content from Large Language Models |
Authors: | Premptis, Iraklis Στάμου Γιώργος |
Keywords: | Large Language Models Machine Unlearning Gradient Ascent Gradient Descent |
Issue Date: | 21-Mar-2025 |
Abstract: | Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language processing tasks, exhibiting unprecedented scalability and adaptability. However, their inherent tendency to memorize training data raises critical ethical and legal concerns, particularly regarding the retention of sensitive or copyrighted information. This issue is further compounded by regulatory frameworks such as the "right to be forgotten" (RTBF), which mandates the selective removal of data while preserving overall model functionality. Traditional approaches to machine unlearning, originally developed for small-scale classifiers, struggle to extend to LLMs due to their high-dimensional parameter spaces, interdependent data representations, and computationally expensive retraining requirements. As a result, developing efficient, targeted, and scalable unlearning mechanisms for LLMs remains an open challenge. This thesis introduces a novel framework for machine unlearning in LLMs, leveraging parameter-efficient fine-tuning (PEFT) techniques to achieve targeted data removal without degrading general model capabilities. Specifically, we explore gradient-based methods employing low-rank adaptation (LoRA) modules and selective fine-tuning of the final layers while keeping the majority of model parameters frozen. These approaches facilitate efficient knowledge removal while mitigating catastrophic forgetting, ensuring robust retention of unrelated knowledge. Additionally, we propose alternative strategies, such as alternating gradient ascent-descent and sequential unlearning via gradient difference, to enhance computational efficiency and unlearning effectiveness. Experimental validation against a retraining-from-scratch baseline demonstrates that our methods achieve high unlearning fidelity while preserving reasoning abilities and general knowledge, offering a scalable solution to the unlearning problem in LLMs. |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19587 |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Diploma Thesis.pdf | 3.64 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.