Please use this identifier to cite or link to this item:
http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19247
Title: | Multi-task learning for Action and Gesture Recognition |
Authors: | Spathis, Konstantinos Μαραγκός Πέτρος |
Keywords: | deep learning, computer vision, action recognition, gesture recognition, multi-task learning |
Issue Date: | 18-Jul-2024 |
Abstract: | The recent advances in deep learning have revolutionized the field of computer vision. Deep learning models have achieved state-of-the-art performance in various tasks, including action and gesture recognition. These two human-centric tasks involve the recognition of human actions and gestures in videos, aiming to mathe- matically model the human perception of actions and gestures. The current state-of-the-art models for action and gesture recognition focus on applying novel deep learning architectures to achieve better performance, while handling each task separately. However, these tasks find application in various fields where the recog- nition of both actions and gestures is required, as it arises for example with robotic assistants, surveillance systems, or autonomous driving, where object/human detection and recognition are required simultaneously. Therefore, these problems show great overlap, requiring common algorithms that address both of them at the same time. Recently, alternative approaches of learning methods have been proposed to improve the performance of deep learning models, without requiring the development of novel architectures or the collection of more data. One of these approaches is "multi-task learning", where multiple tasks are learned jointly, sharing information between them. Multi-task learning has been successfully applied to various computer vision tasks. Tasks including actions, such as action recognition and pose estimation have been shown to benefit from multi-task learning. While in the field of gesture recognition, multi-task learning has also been applied to tasks such as hand gesture recognition and segmentation, achiving remarkable results. In this thesis, we aim to show that the tasks of action and gesture recognition can be learned jointly, benefiting from each other. We constuct different multi-task learning models, where the tasks of action and gesture recognition are learned jointly. We evaluate the performance of the proposed models on the respective single- task learning models for action and gesture recognition. The results show that the proposed models achieve better performance compared to the single task models, demonstrating the benefits of multi-task learning in action and gesture recognition. Moreover, we extent this method to develop a multimodal multi-task learning model, where different modalities, specifically rgb and depth data, can be learnt jointly in the same framework, to achieve better performance in comparison to single task models and multimodal approaches. |
URI: | http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19247 |
Appears in Collections: | Διπλωματικές Εργασίες - Theses |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
kspathis_thesis.pdf | 8.16 MB | Adobe PDF | View/Open |
Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.