Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19891
Τίτλος: Experiences in Deploying Ephemeral Slurm as a Slurm Job and Co-execution Analysis using ¼-socket CPU Allocation
Συγγραφείς: Φλώρος, Επαμεινώνδας
Γκούμας Γεώργιος
Λέξεις κλειδιά: HPC
χρονοδρομολόγηση
Σύστημα Διαχείρισης Πόρων
SLURM
συντοποθέτηση
συν-τοποθέτηση
συνεκτέλεση
συν-εκτέλεση
μετρήσεις επίδοσης
MPI
bash
scheduling
Resource Management System
co-location
colocation
coexecution
co-execution
benchmarking
Ημερομηνία έκδοσης: 30-Οκτ-2025
Περίληψη: High Performance Computing (HPC) clusters have a major contribution in scientific and commercial software. Energy requirements of such systems and growing demand drive the need for constant optimisation in resource utilisation as well as expanding and improving individual components of HPC systems. At present, the necessity for the optimal utilisation of resources of high performance computing systems is the trailblazer for change in current and future resource management systems. Starting from the perspective of the system designer up to the end user, it is essential to focus on the flexible aspect of configuring resource management systems, directing attention to the fields of extendability and adaptability of HPC systems to the multitude of workload types. Slurm, an open source software based on the Linux kernel, is currently holding the title of the most used and renowned resource management software for HPC systems. Slurm’s basic functionality is summarised in three functions, namely access and privilege management, providing a framework for the submission and execution of workloads, and, finally mediation and resolution for the underlying system’s resources. The aims of this thesis to develop a tool in a user-space environment, without the need for elevated administrator rights, for testing extensions and modifications to the Slurm resource management system, as well as, investigate the co-execution effects among MPI workloads that share common hardware resources, specifically at socket level. The tool’s objective is to extend Slurm so that, in the environment of a real functioning HPC system, it will allow the evaluation of extensions and alternative implementations of Slurm’s various components as well as extract insights and data for the operation of these components. In addition, an hands-on evaluation study of the tool is presented, both for the accuracy of its results and its reliability to integrate in production systems with existing Slurm installations. Finally, the performance of various MPI workloads is assessed using quantitative metrics to evaluate the viability and efficiency of co-executing computational tasks in HPC environments. These results provide a foundation for understanding the sustainability and potential benefits of workload co-execution in large-scale systems.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19891
Εμφανίζεται στις συλλογές:Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:
Αρχείο Περιγραφή ΜέγεθοςΜορφότυπος 
Floros_Thesis_05112025.pdf2.15 MBAdobe PDFΕμφάνιση/Άνοιγμα


Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.