Μοντελοποίηση εφαρμογών και τελεστών μεγάλων δεδομένων σε περιβάλλοντα υπολογιστικών νεφών

Γιαννακόπουλος, Ιωάννης

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17592

Τίτλος:	Μοντελοποίηση εφαρμογών και τελεστών μεγάλων δεδομένων σε περιβάλλοντα υπολογιστικών νεφών
Συγγραφείς:	Γιαννακόπουλος, Ιωάννης Κοζύρης Νεκτάριος
Λέξεις κλειδιά:	cloud computing modeling big data machine learning
Ημερομηνία έκδοσης:	26-Νοε-2018
Περίληψη:	The Big Data revolution has created new requirements for the design of applications and operators that are able to handle the volume of the data sources. The adoption of distributed architectures and the increasing popularity of the Cloud paradigm has complexed their structure, making the problem of modeling their behavior increasingly difficulty. Moreover, the wide variety of the existing datasets have complicated the problem of selecting the appropriate inputs for a given operator, since the examination of the data utility for a given workflow is a largely manual process that requires exhaustive execution for the entirety of the available datasets. This thesis attempts to model the behavior of an arbitrary Big Data operator from two different viewpoints. First, we wish to model the operator’s performance when deployed under different resource configurations. We present an adaptive performance modeling methodology that relies on recursively partitioning the configuration space in disjoint regions, distributing a pre-defined number of samples to each region based on different region characteristics (i.e., size, modeling error) and deploying the given operator for the selected samples. The performance is, then, approximated for the entire space using a combination of linear models for each subregion. Second and in order to accelerate data analysis, we wish to model the operator’s output when deployed over different datasets. Based on the observation that similar datasets tend to affect the operators that are applied to them similarly, we propose a content-based methodology that models the output of a provided operator for all datasets. Through measuring the similarity of the provided datasets in the light of a handful of fundamental data properties, we construct a metric space which is, subsequently, used by Machine Learning models that approximate the operator’s behavior for all datasets. Our evaluation, conducted using several real-world operators applied for real and synthetic datasets, indicated that the introduced methodologies manage to accurately model the operator’s behavior from both angles.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17592
Εμφανίζεται στις συλλογές:	Διδακτορικές Διατριβές - Ph.D. Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
thesis.pdf	revised thesis	2.81 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε την πλήρη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.