Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19623
Τίτλος: Data Lakehouse Performance Study
Συγγραφείς: Σιδέρης, Κωνσταντίνος
Τσουμάκος Δημήτριος
Λέξεις κλειδιά: Big Data
Data Lakes
Batch Processing
Stream Processing
Delta Lake
Apache Hudi
Ημερομηνία έκδοσης: 23-Ιου-2025
Περίληψη: As organisations increasingly adopt lakehouse architectures to support big data analytics, understand- ing the performance trade-offs of utilising enhanced storage layers instead of standard data lake ar- chitectures is essential. This masters dissertation aims to present a comprehensive performance eval- uation of two leading data lakehouse solutions, Delta Lake and Apache Hudi, focusing on both batch and stream processing workloads. Through the benchmarking process, we compare Delta Lake and Hudi against standard data lake implementations, which consist of a simple storage layer queried by an analytics engine, in this case, HDFS and Apache Spark. Being built on top of data lakes, lakehouses leverage their strengths, while simultaneously, introducing new features, such as ACID transactions, schema enforcement, schema evolution and data governance mechanisms, to address the issues data lakes face. Additionally, they introduce optimisations, such as indexing, data skipping, and parti- tion pruning, to further improve them. Throughout this thesis, we present these features and through benchmarks, evaluate how they improve performance and whether the added functionalities justify the use of lakehouses, even in cases where they may underperform.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19623
Εμφανίζεται στις συλλογές:Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:
Αρχείο Περιγραφή ΜέγεθοςΜορφότυπος 
03118134_thesis.pdf1.34 MBAdobe PDFΕμφάνιση/Άνοιγμα


Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.