Study and optimization of distributed deep learning under the parameter server architecture

Nikodimos, Provatas

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19792

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nikodimos, Provatas	-
dc.date.accessioned	2025-10-14T08:19:46Z	-
dc.date.available	2025-10-14T08:19:46Z	-
dc.date.issued	2025-07-08	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19792	-
dc.description.abstract	Deep learning has transformed numerous fields by leveraging vast datasets and complex neural architectures, but the computational demands of modern models often exceed single-node capabilities, prompting distributed training solutions. This thesis investigates asynchronous training under the parameter server paradigm, focusing on enhancing both performance and stability. First, a thorough comparative analysis demonstrates that specialized distributed architectures deliver substantially higher throughput than general-purpose data-processing frameworks at large scales. Following a systematic literature review, consistency control and the mitigation of stale gradients as pivotal challenges in asynchronous setups are identified. To address these, a hybrid Strategy-Switch approach is introduced that begins with synchronous communication to identify a promising solution region before transitioning to asynchronous updates based on an empirically derived switching criterion, achieving both rapid convergence and model accuracy. Building on these insights, offline data-sharding techniques are then proposed, designed to preemptively balance sample distributions across workers, thereby reducing gradient variance and improving training consistency. Experimental results show that the proposed data distribution strategies decrease variability in training and validation metrics by up to eightfold and twofold, respectively, compared to random assignment. Collectively, these contributions advance asynchronous distributed deep learning by offering concrete methods to reconcile speed and stability, supporting more scalable and reliable large-scale neural network training.	en_US
dc.language	en	en_US
dc.subject	parameter server	en_US
dc.subject	εξυπηρετητής παραμέτρων	en_US
dc.subject	deep learning	en_US
dc.subject	βαθιά μηχανική μάθηση	en_US
dc.subject	distributed learning	en_US
dc.subject	κατανεμημένη εκπαίδευση	en_US
dc.subject	asynchronous learning	en_US
dc.subject	ασύγχρονη εκπαίδευση	en_US
dc.subject	data management	en_US
dc.subject	διαχείριση δεδομένων	en_US
dc.subject	big data	en_US
dc.subject	"μεγάλα" δεδομένα	en_US
dc.title	Study and optimization of distributed deep learning under the parameter server architecture	en_US
dc.description.pages	188	en_US
dc.contributor.supervisor	Κοζύρης Νεκτάριος	en_US
dc.department	Τομέας Τεχνολογίας Πληροφορικής και Υπολογιστών	en_US
Appears in Collections:	Διδακτορικές Διατριβές - Ph.D. Theses

Files in This Item:

File	Description	Size	Format
phd_thesis_final_Sep (3).pdf		9.13 MB	Adobe PDF	View/Open

Show simple item record