Please use this identifier to cite or link to this item:
Title: ML-driven Automated Framework for Tuning Spark Applications
Authors: Nikitopoulou, Dimitra
Σούντρης Δημήτριος
Keywords: Spark
parameter tuning
parameter impact
performance modeling
Issue Date: 20-Jul-2020
Abstract: Nowadays, there is an ever increasing number of data that characterizes our era, since they are easy and cheap to collect from various devices connected to the Internet. Manipulating big data demands resources, which are provided from the cloud in a convenient way, and some tools to speed up the process. Towards this direction, the distributed execution of processes as well as the exploitation of the high speed that the use of memory has to offer over the hard disk is of utmost importance. Spark is a tool that takes advantage of these remarks and can manipulate easily a vast volume of data. Achieving an optimal execution of its workloads, though, depends to a great extent on the appropriate tuning of a large number of parameters. In this thesis, we design a framework that tunes in an automated way Spark’s parameters, depending on the workload and the size of the input data. We locate the parameters with the greatest impact on the execution of the applications and we form a methodology to tune them accordingly so as to minimize the execution time. Next, we integrate our solution into Spark, with the use of a wrapper script and we provide the user the chance to run a simple spark-submit command but actually executing the one with the optimal configuration. Finally, we present the speedup we achieved for the set of applications we used to construct the framework as well as for other unseen applications in order to evaluate the framework’s ability to generalize its optimization capacity.
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
mixed.pdf9.41 MBAdobe PDFView/Open

Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.