Co-scheduling algorithms for HPC applications

Κελλάρη, Μυρσίνη

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19554

Title:	Co-scheduling algorithms for HPC applications
Authors:	Κελλάρη, Μυρσίνη Γκούμας Γεώργιος
Keywords:	High Performance Computing (HPC) co-scheduling co-scheduling algorithms simulation performance metrics
Issue Date:	7-Mar-2025
Abstract:	This thesis explores the development and evaluation of co-scheduling algorithms for High- Performance Computing (HPC) systems, aiming to optimize resource utilization while maintain- ing high system performance and user satisfaction. The growing demand for computational power in fields such as scientific research, artificial intelligence, and big data analytics has made HPC systems essential. However, these systems often suffer from underutilization of resources, leading to increased energy consumption and operational costs. Traditional scheduling algorithms, such as First Come First Serve (FCFS) and EASY, cannot provide a solution. To address these challenges, co-scheduling is proposed as a solution. Co-scheduling allows multiple jobs to share computational nodes, reducing resource contention and improving system efficiency. This is particularly beneficial when co-allocated jobs have different resource demands, such as memory-intensive and compute-intensive tasks, which can lead to improved system performance. However, co-scheduling also introduces challenges, such as inter-job interference and fairness issues, which must be carefully managed. The research introduces several co- scheduling algorithms, including EASY Co-schedule, Largest Area First Co-schedule (LAF-Co), Popularity, Shortest Job First Co-schedule (SJF-Co), Longest Job First Co-schedule (LJF-Co), Filler, and Two Factors. These algorithms are evaluated using the Efficient Lightweight Scheduling Estimator (ELiSE), a Python-based simulator that enables controlled testing of scheduling policies. The evaluation is based on key metrics such as makespan speedup (system performance) and mean job slowdown (user satisfaction). Experimental results demonstrate that co-scheduling algorithms, particularly SJF-Filler (a Two Factors variant), achieve significant improvements in makespan speedup and mean job speedup, while maintaining low mean slowdown values. These algorithms effectively balance system performance and user satisfaction, making them promising candidates for real-world HPC systems. However, co-scheduling can lead to increased execution times for individual jobs, highlighting the trade-off between system efficiency and user experience. The findings suggest that co-scheduling can enhance the performance and efficiency of HPC systems, but careful management is required to ensure fairness and user satisfaction. Future work includes testing the algorithms on real HPC systems, exploring alternative colocation strategies, and integrating machine learning techniques to further optimize scheduling decisions.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19554
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
Kellari_Myrsini_Thesis_final.pdf	Diploma Thesis file	2.77 MB	Adobe PDF	View/Open

Show full item record