Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18803
Title: Απεικόνιση αλγορίθµων Βαθιάς Μάθησης σε πλατφόρµες υλικού Graphcore IPU και NVIDIA GPU
Authors: Γερακάρης, Απόστολος
Πνευματικάτος Διονύσιος
Keywords: Όραση υπολογιστών, Ανίχνευση Προσώπων, Μηχανική Μάθηση, Εξαρτημένη Μάθηση, Ανίχνευση Οροσήμων
Face Detection, Machine Learning, Keras, Tensorflow, Eyeblink Conditioning, Landmark Detection
Issue Date: 10-Apr-2023
Abstract: This thesis investigates the performance of hardware accelerators, namely GPUs and IPUs, for Machine Learning and Deep Learning applications in both inference and training tasks. Specifically, we focus on two tasks: automating and accelerating eyeblink-response detection from video and training an image-based CNN face detection model. For the former, we explore and compare different algorithms and optimization techniques to achieve real-time processing speed. In the latter, we optimize the training pipeline by leveraging both CPUs and device accelerators. The Eyeblink Conditioning experiment is a widely used experiment in the field of neuroscience to study learning and memory processes in the brain. In the past, researchers have used potentiometers or electromyography (EMG) to monitor the movement of the eyelid during an experiment. In recent years, the use of computer vision and image processing has greatly reduced the need for these methods, as they need human intervention and are not fast enough to enable real-time processing. In order to fully automate eyelid tracking, we chose a combination of face and landmark detection algorithms and accelerated them to create a fast and accurate implementation. Various different algorithms from the fields of Deep Learning and Machine Learning are analyzed and compared for face detection and landmark detection (eyelid detection) in terms of speed and accuracy. Based on this study, two algorithms are identified as most suitable for our use case: the Ensemble of Regression Trees (ERT) approach for landmark detection and the BlazeFace CNN-based model for face detection. The BlazeFace model was accelerated on three different hardware accelerators: V100 Tesla GPU, MK1 IPU and MK2 IPU. The ERT algorithm was accelerated using multi-core CPUs. A combined implementation is successfully deployed for a real neuroscientific use-case: eyeblink response detection, achieving an overall runtime of 0.642 ms per frame with Tesla V100 GPU and 32 CPU processes, 0.7116 ms per frame with MK2-IPU and 64 CPU processes and 0.761 ms per frame with MK1-IPU and 32 CPU processes. Furthermore, an experimental open-source training implementation of the BlazeFace face detector was built from scratch to benchmark the performance of IPU and GPU hardware accelerators. Our results show that IPU-based systems have superior performance compared to the GPU-based systems in training the CNN-based face detector, especially for small batch sizes.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18803
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
AG_thesis_final.pdf4.54 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.