Please use this identifier to cite or link to this item:
|Title:||Optimization of GPU Workloads using Natural Language Processing based on Deep Learning techniques|
Natural Language Processing
|Abstract:||Setting program parameters is challenging due to the abstract relationship between hardware and software. Automatic optimization algorithms that are accurate are required to cope with the complexity and variety of current hardware and software. Autotuning has always relied on time-consuming trial and error approaches. Machine learning (ML) and Natural Language Processing (NLP) has flourished over the last decade with research focusing on deep architectures. In this context, the use of natural language processing techniques to source code in order to conduct autotuning tasks is an emerging field of study. While previous research has successfully completed a variety of different autotuning tasks using a variety of different source code languages, the majority of source code data is CPU-centric, with relatively little GPU code. In this work, we make two contributions. We first utilize the dataset of OpenCL kernels from the work of Cummins et al. to evaluate and compare our proposed six different deep neural networks to the state-of-the-art network. Our best model surpasses that of Cummins et al. work, providing, in total, a 2.65% improvement in prediction accuracy. In our second contribution, we extend our research to CUDA kernels and we create an end-to-end pipeline that incorporates a source-to-source compiler for thread and block coarsening transformations of CUDA kernels, a source rewriter that removes semantically irrelevant information from the kernels, creating train ready sequences, a profiling tool for measuring the performance of the transformed kernels, producing the prediction labels and, finally, our best machine learning model from our aforementioned neural network architecture research on OpenCL kernels. Hence, the pipeline receives a hand-written CUDA kernel and predicts its optimal configuration. To the best of our knowledge, this is the first work that attempts to apply NLP techniques on CUDA written applications for the specific optimizations. We evaluate our methodology on the LS-CAT dataset for five different coarsening factors on NVIDIA V100S high-end GPU, we discover its vulnerabilities and examine the applicability of machine learning for three different prediction problems: thread coarsening binary classification, thread and block coarsening five-class classification. Our model achieves 84% accuracy on the binary classification, while it performs poorly when it comes to five-class classification.|
|Appears in Collections:||Διπλωματικές Εργασίες - Theses|
Files in This Item:
|Petros_Vavaroutsos_Diploma_Thesis.pdf||2.94 MB||Adobe PDF||View/Open|
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.