Design and implementation of an intelligent mechanism capable of sharing resources, in multicore systems, using Deep Reinforcement Learning

Mandilaras, Nikiforos

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17662

Title:	Design and implementation of an intelligent mechanism capable of sharing resources, in multicore systems, using Deep Reinforcement Learning
Authors:	Mandilaras, Nikiforos Κοζύρης Νεκτάριος
Keywords:	Multiprocessors, Shared cache, LLC, Cache partitioning, coexecution, Intel RDT, Reinforcement Learning, Neural Nets, Deep Reinforcement Learning, DQN
Issue Date:	31-Aug-2020
Abstract:	The average usage of servers in modern data centers is extremely low, not exceeding 50 %. The reason for this, is the Service-Level Agreements (SLAs) that the providers sign with their customers. In order to ensure those agrements, the isolated execution of the services is preferred. The need for isolation arises due to the competition for shared resources, such as the last level cache memory. The competition that occurs between the coexecuted applications, negatively affects the performance of the services and calls into question the maintenance of their level of performance. To deal with such situations, technologies have now been integrated into modern processors, that provide support for usage monitoring as well as for partitioning of shared resources. In the present thesis, we utilize these technologies along with deep reinforcement learning methods, in order to implement an intelligent mechanism for partitioning the last level cache of a multicore system. The goal is to maintain the performance of a latency critical service when it is coexecuted with other applications, but also to increase the utilization of system resources. Reinforcement learning enables the automated implementation of such goals, using agents who explore a state space and utilize the knowledge they gather from the environment, in order to make the appropriate decisions and achieve their ultimate goal. We evaluate our mechanism in coexecutions of Memcached service with machine learning workloads. We prove that the mechanism can consistently protect the performance of the critical service and at the same time increase the throughput of low priority applications. Finally, we show that the training of neural networks offers opportunities to generalize the acquired knowledge and use it in new applications.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17662
Appears in Collections:	Μεταπτυχιακές Εργασίες - M.Sc. Theses

Files in This Item:

File	Description	Size	Format
thesis_resource_allocation_reinforcement_learning_nikiforos_mandilaras.pdf		2.5 MB	Adobe PDF	View/Open

Show full item record