Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19891
Title: Experiences in Deploying Ephemeral Slurm as a Slurm Job and Co-execution Analysis using ¼-socket CPU Allocation
Authors: Φλώρος, Επαμεινώνδας
Γκούμας Γεώργιος
Keywords: HPC
χρονοδρομολόγηση
Σύστημα Διαχείρισης Πόρων
SLURM
συντοποθέτηση
συν-τοποθέτηση
συνεκτέλεση
συν-εκτέλεση
μετρήσεις επίδοσης
MPI
bash
scheduling
Resource Management System
co-location
colocation
coexecution
co-execution
benchmarking
Issue Date: 30-Oct-2025
Abstract: High Performance Computing (HPC) clusters have a major contribution in scientific and commercial software. Energy requirements of such systems and growing demand drive the need for constant optimisation in resource utilisation as well as expanding and improving individual components of HPC systems. At present, the necessity for the optimal utilisation of resources of high performance computing systems is the trailblazer for change in current and future resource management systems. Starting from the perspective of the system designer up to the end user, it is essential to focus on the flexible aspect of configuring resource management systems, directing attention to the fields of extendability and adaptability of HPC systems to the multitude of workload types. Slurm, an open source software based on the Linux kernel, is currently holding the title of the most used and renowned resource management software for HPC systems. Slurm’s basic functionality is summarised in three functions, namely access and privilege management, providing a framework for the submission and execution of workloads, and, finally mediation and resolution for the underlying system’s resources. The aims of this thesis to develop a tool in a user-space environment, without the need for elevated administrator rights, for testing extensions and modifications to the Slurm resource management system, as well as, investigate the co-execution effects among MPI workloads that share common hardware resources, specifically at socket level. The tool’s objective is to extend Slurm so that, in the environment of a real functioning HPC system, it will allow the evaluation of extensions and alternative implementations of Slurm’s various components as well as extract insights and data for the operation of these components. In addition, an hands-on evaluation study of the tool is presented, both for the accuracy of its results and its reliability to integrate in production systems with existing Slurm installations. Finally, the performance of various MPI workloads is assessed using quantitative metrics to evaluate the viability and efficiency of co-executing computational tasks in HPC environments. These results provide a foundation for understanding the sustainability and potential benefits of workload co-execution in large-scale systems.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19891
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
Floros_Thesis_05112025.pdf2.15 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.