On Learning What to Learn : Adaptive Data Mixtures for Robust Multi-Target LLM Pretraining

Glarou, Maria Ios

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19890

Πλήρες αρχείο μεταδεδομένων

Πεδίο DC	Τιμή	Γλώσσα
dc.contributor.author	Glarou, Maria Ios	-
dc.date.accessioned	2025-11-04T15:34:38Z	-
dc.date.available	2025-11-04T15:34:38Z	-
dc.date.issued	2025-05-16	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19890	-
dc.description.abstract	Data selection during LLM pretraining is a major driver of downstream performance. Steering the training mixture with signals from target tasks can orient learning toward representations that better serve those objectives. However, most existing approaches tune mixtures for a single target, yielding narrow specialization and weak robustness across tasks. This thesis introduces GRAPE (Group-Robust Multi-target Adaptive PrEtraining), a multi-source, multi-target reweighting framework that discovers effective training mixtures for multiple targets simultaneously. GRAPE maintains two sets of weights: task weights, encoding the relative priority of each target task, and source weights, specifying sampling proportions over source domains. Derived from a minimax formulation, the method couples two interleaved reweighting updates. First, a task-reweighting step—using group distributionally robust optimization (GDRO)—reallocates task weights toward targets showing the least progress, correspondingly easing emphasis on better-served tasks. Second, a source-reweighting step updates the sampling weights over source domains, shifting probability toward domains whose updates most effectively reduce loss on the targets in focus. Together, these updates instantiate the minimax design and dynamically steer data selection, closing performance gaps and yielding balanced, sample-efficient improvements across targets. Empirically, on ClimbLab, GRAPE outperforms strong baselines with higher average accuracy, superior data efficiency, and more balanced task-wise gains across six targets, while generalizing better to unseen reasoning benchmarks. On SlimPajama, across twelve multi-task suites, we observe consistent improvements over competing methods. In multilingual experiments on Wiki40B, GRAPE leverages six high-resource languages to improve low-resource suites of sizes 4 and 8, achieving faster convergence and lower final perplexity.	en_US
dc.language	en	en_US
dc.subject	Large Language Models	en_US
dc.subject	Minimax Optimization	en_US
dc.subject	Multi-Target Learning	en_US
dc.subject	Domain Reweighting	en_US
dc.subject	Group Distributionally Robust Optimization	en_US
dc.subject	Data Mixture Optimization	en_US
dc.title	On Learning What to Learn : Adaptive Data Mixtures for Robust Multi-Target LLM Pretraining	en_US
dc.description.pages	131	en_US
dc.contributor.supervisor	Μαραγκός Πέτρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
thesis.pdf		17.62 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε τη σύντομη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.