Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19792
Τίτλος: Study and optimization of distributed deep learning under the parameter server architecture
Συγγραφείς: Nikodimos, Provatas
Κοζύρης Νεκτάριος
Λέξεις κλειδιά: parameter server
εξυπηρετητής παραμέτρων
deep learning
βαθιά μηχανική μάθηση
distributed learning
κατανεμημένη εκπαίδευση
asynchronous learning
ασύγχρονη εκπαίδευση
data management
διαχείριση δεδομένων
big data
"μεγάλα" δεδομένα
Ημερομηνία έκδοσης: 8-Ιου-2025
Περίληψη: Deep learning has transformed numerous fields by leveraging vast datasets and complex neural architectures, but the computational demands of modern models often exceed single-node capabilities, prompting distributed training solutions. This thesis investigates asynchronous training under the parameter server paradigm, focusing on enhancing both performance and stability. First, a thorough comparative analysis demonstrates that specialized distributed architectures deliver substantially higher throughput than general-purpose data-processing frameworks at large scales. Following a systematic literature review, consistency control and the mitigation of stale gradients as pivotal challenges in asynchronous setups are identified. To address these, a hybrid Strategy-Switch approach is introduced that begins with synchronous communication to identify a promising solution region before transitioning to asynchronous updates based on an empirically derived switching criterion, achieving both rapid convergence and model accuracy. Building on these insights, offline data-sharding techniques are then proposed, designed to preemptively balance sample distributions across workers, thereby reducing gradient variance and improving training consistency. Experimental results show that the proposed data distribution strategies decrease variability in training and validation metrics by up to eightfold and twofold, respectively, compared to random assignment. Collectively, these contributions advance asynchronous distributed deep learning by offering concrete methods to reconcile speed and stability, supporting more scalable and reliable large-scale neural network training.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19792
Εμφανίζεται στις συλλογές:Διδακτορικές Διατριβές - Ph.D. Theses

Αρχεία σε αυτό το τεκμήριο:
Αρχείο Περιγραφή ΜέγεθοςΜορφότυπος 
phd_thesis_final_Sep (3).pdf9.13 MBAdobe PDFΕμφάνιση/Άνοιγμα


Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.