Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/9098
Full metadata record
DC FieldValueLanguage
dc.contributor.authorNikela Papadopoulou
dc.date.accessioned2018-07-22T22:51:54Z-
dc.date.available2018-07-22T22:51:54Z-
dc.date.issued2017-10-16
dc.date.submitted2017-7-20
dc.identifier.urihttp://artemis-new.cslab.ece.ntua.gr:8080/jspui/handle/123456789/9098-
dc.description.abstractOn the path to exascale, supercomputers will grow to host hundreds of millions of cores and various complex heterogeneous processing elements, yet even today, users fail to leverage the existing compute power of large-scale systems, as large classes of typical HPC applications are bound by non-scalable communication phases. The ability to predict the communication time of parallel applications can assist users, compilers, runtime systems and schedulers with decision-making for optimal resource utilization, performance optimizations, power saving and resilience.This thesis presents a methodology for predictive communication modeling of HPC applications. Communication time depends on a complex set of parameters, relevant to the application, the system architecture, the runtime configuration and runtime conditions. To handle this complexity, we follow an empirical modeling approach. We define features that can be extracted from the application, the process mapping and the allocation shape ahead of execution, deploy a single benchmark to sweep over the parameter space and develop predictive models for communication time on three large-scale computing systems, Vilje, Piz Daint and ARIS, using different subsets of our features, statistical and machine-learning methods and training sets. We compare the predictive performance of our models on various communication patterns and applications, for multiple problem sizes, executions and runtime configurations, ranging from a few dozen to a few thousand cores. Our methodology is successful across all tested communication patterns on all systems and exhibits high prediction accuracy and goodness-of-fit. Our models are applicable just-in-time ahead of the execution of an HPC application, and, as we demonstrate in this thesis, their high accuracy make them suitable for communication-aware decision making, towards the optimization of resource utilization on large-scale systems.
dc.languageEnglish
dc.subjectperformance modeling
dc.subjectpredictive modeling
dc.subjectcommunication time
dc.subjecthpc applications
dc.subjectmpi
dc.subjectsupercomputers
dc.subjectclusters
dc.subjectstatistical learning
dc.subjectmachine learning
dc.titleCommunication Performance Prediction On Large-scale Systems
dc.typePhD Thesis
dc.description.pages183
dc.departmentΤομέας Τεχνολογίας Πληροφορικής & Υπολογιστών
dc.organizationΕΜΠ, Τμήμα Ηλεκτρολόγων Μηχανικών & Μηχανικών Υπολογιστών
Appears in Collections:Διδακτορικές Διατριβές - Ph.D. Theses

Files in This Item:
File SizeFormat 
PD2017-0030.pdf11.32 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.