Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19484
Title: Extending the Daphne Runtime: Lustre File System integration
Authors: Σταματής, Απόστολος
Τσουμάκος Δημήτριος
Keywords: Distributed file systems
Distributed systems
Daphne
Lustre file system
IDA pipelines
Data analytics
Issue Date: 21-Feb-2025
Abstract: Recently, there has been a trend toward Integrated Data Analysis (IDA) pipelines that integrate various computational and data processing tasks within a unified framework. DAPHNE is an open and extensible system infrastructure for such IDA pipelines. This study focuses on the integration of the DAPHNE runtime with the Lustre file system. Lustre is a POSIX-compliant, object-based distributed file system, which is widely adopted in High-Performance Computing (HPC) due to its ability to handle parallel I/O operations efficiently. This integration is achieved via the development of specialized C++ kernels that support read and write operations for CSV and DAPHNE Binary Data Format (dbdf) files. The Single-File approach is selected to reduce metadata overhead and improve scalability. Experiments were conducted in an AWS-based cluster to analyze performance improvements in read/write operations, scalability with increasing worker nodes, and the impact of various optimization techniques such as stripe size adjustments, file preallocation, and stripe alignment. Results indicate that Lustre integration significantly enhances the performance of DAPHNE’s distributed runtime and enables better scalability for large datasets.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19484
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
DAPHNE_Lustre_Integration_Apostolis_Stamatis.pdf1.87 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.