Unlearning Sensitive Content from Large Language Models

Premptis, Iraklis

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19587

Title:	Unlearning Sensitive Content from Large Language Models
Authors:	Premptis, Iraklis Στάμου Γιώργος
Keywords:	Large Language Models Machine Unlearning Gradient Ascent Gradient Descent
Issue Date:	21-Mar-2025
Abstract:	Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language processing tasks, exhibiting unprecedented scalability and adaptability. However, their inherent tendency to memorize training data raises critical ethical and legal concerns, particularly regarding the retention of sensitive or copyrighted information. This issue is further compounded by regulatory frameworks such as the "right to be forgotten" (RTBF), which mandates the selective removal of data while preserving overall model functionality. Traditional approaches to machine unlearning, originally developed for small-scale classifiers, struggle to extend to LLMs due to their high-dimensional parameter spaces, interdependent data representations, and computationally expensive retraining requirements. As a result, developing efficient, targeted, and scalable unlearning mechanisms for LLMs remains an open challenge. This thesis introduces a novel framework for machine unlearning in LLMs, leveraging parameter-efficient fine-tuning (PEFT) techniques to achieve targeted data removal without degrading general model capabilities. Specifically, we explore gradient-based methods employing low-rank adaptation (LoRA) modules and selective fine-tuning of the final layers while keeping the majority of model parameters frozen. These approaches facilitate efficient knowledge removal while mitigating catastrophic forgetting, ensuring robust retention of unrelated knowledge. Additionally, we propose alternative strategies, such as alternating gradient ascent-descent and sequential unlearning via gradient difference, to enhance computational efficiency and unlearning effectiveness. Experimental validation against a retraining-from-scratch baseline demonstrates that our methods achieve high unlearning fidelity while preserving reasoning abilities and general knowledge, offering a scalable solution to the unlearning problem in LLMs.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19587
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
Diploma Thesis.pdf		3.64 MB	Adobe PDF	View/Open

Show full item record