Τεχνικές Συμπίεσης Βαθέων Νευρωνικών Δικτύων σε Πολυπύρηνο Επεξεργαστικό Περιβάλλον

Παπαγεωργίου, Παναγιώτης

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17750

Title:	Τεχνικές Συμπίεσης Βαθέων Νευρωνικών Δικτύων σε Πολυπύρηνο Επεξεργαστικό Περιβάλλον
Authors:	Παπαγεωργίου, Παναγιώτης Γκούμας Γεώργιος
Keywords:	Deep learning Convolution Quantization k-means Just in Time compilation Βαθιά Μάθηση Συνέλιξη Κβαντοποίηση
Issue Date:	22-Oct-2020
Abstract:	Nowadays the need has emerged to deploy deep neural networks (DNN) on a variety of embedded devices due to their high performance. DNNs have dominated, with their state-of-the-art accuracy, a variety of Machine Learning domains and among others, computer vision tasks. Their drawback is however their computational intensity. As a result their computational demands far surpass the capabilities of edge devices in terms of memory, computational power and energy autonomy. Therefore extensive research is being conducted in developing techniques to make DNNs deployable in such devices. This thesis focuses on studying clustering quantization as a DNN compression technique. We first study the compression achieved with clustering convolution layers while retaining model accuracy. Then we study the effects of clustering on computational performance. We optimize the performance of the DNN models, with methods inspired from existing research as well as with methods we propose. First we focus on improving performance by hiding the latency from irregular memory access patterns that quantization introduces. To achieve that, we investigate loop optimization techniques, as well as Just In Time compilation. To further increase performance, we also develop a Just In Time compilation library. Using the above library we also propose a method to eliminate the irregular access patterns altogether. Finally by comparing our implementations with contemporary optimized convolutions we observe that they achieve similar and sometimes better levels of performance. Στη σημερινή εποχή έχει προκύψει η ανάγκη τα βαθιά νευρωνικά δίκτυα (DNN) να χρησιμοποιηθούν σε μια πληθώρα ενσωματωμένων συσκευών λόγω της πολύ καλής τους ακρίβειας. Τα DNN κατάφεραν να κυριαρχήσουν με την πολύ καλή τους ακρίβεια σε πολλούς τομείς της μηχανικής μάθησης, μεταξύ άλλων και στον τομέα της όρασης υπολογιστών. Το μειονέκτημά τους είναι πως έχουν μεγάλες ανάγκες από υπολογιστικούς πόρους. Ως αποτέλεσμα οι υπολογιστικές απαιτήσεις των DNN ξεπερνούν κατά πολύ τις δυνατότητες των συσκευών αυτών σε επίπεδο μνήμης, υπολογιστικής ικανότητας και ενεργειακής αυτονομίας. Έτσι ένα κομμάτι της έρευνας επικεντρώθηκε σε τεχνικές ώστε να μπορούν τα DNN να χρησιμοποιηθούν στις παραπάνω συσκευές. Η διπλωματική αυτή επικεντρώνεται στη μελέτη της κβαντοποίησης (quantization) με ομαδοποίηση (clustering) ως μεθόδου συμπίεσης των DNN μοντέλων. Αρχικά μελετάμε πώς και σε ποιο βαθμό η κβαντοποίηση των συνελικτικών στρωμάτων επιτυγχάνει συμπίεση διατηρώντας την ακρίβεια των μοντέλων. Έπειτα εξετάζουμε πώς επηρεάζεται η υπολογιστική απόδοσή τους. Βελτιστοποιούμε την απόδοση βασιζόμενοι τόσο τεχνικές που υπάρχουν ήδη στην έρευνα όσο και τεχνικές που προτείνουμε εμείς. Η προσπάθεια αυτή αρχικά επικεντρώνεται στο να κρύψουμε τις καθυστερήσεις που εισάγουν τα μη κανονικά μοτίβα πρόσβασης στη μνήμη που εισάγει η κβαντοποίηση. Για να το πετύχουμε αυτό εξετάζουμε τεχνικές όπως loop optimizations καθώς και Just In Time compilation. Για να αυξήσουμε περαιτέρω την απόδοση, αναπτύσσουμε μια δική μας βιβλιοθήκη Just In Time compilation. Χρησιμοποιώντας τη βιβλιοθήκη αυτή προτείνουμε επίσης μια μέθοδο για την εξάλειψη των μη κανονικών μοτίβων. Τέλος συγκρίνοντας τις υλοποιήσεις μας με σύγχρονες βελτιστοποιημένες συνελίξεις παρατηρούμε πως πετυχαίνουν παρόμοια ή μερικές φορές καλύτερη απόδοση.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17750
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
papageorgiou_compression.pdf		6.67 MB	Adobe PDF	View/Open

Show full item record