Multi-task learning for Action and Gesture Recognition

Spathis, Konstantinos

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19247

Title:	Multi-task learning for Action and Gesture Recognition
Authors:	Spathis, Konstantinos Μαραγκός Πέτρος
Keywords:	deep learning, computer vision, action recognition, gesture recognition, multi-task learning
Issue Date:	18-Jul-2024
Abstract:	The recent advances in deep learning have revolutionized the field of computer vision. Deep learning models have achieved state-of-the-art performance in various tasks, including action and gesture recognition. These two human-centric tasks involve the recognition of human actions and gestures in videos, aiming to mathe- matically model the human perception of actions and gestures. The current state-of-the-art models for action and gesture recognition focus on applying novel deep learning architectures to achieve better performance, while handling each task separately. However, these tasks find application in various fields where the recog- nition of both actions and gestures is required, as it arises for example with robotic assistants, surveillance systems, or autonomous driving, where object/human detection and recognition are required simultaneously. Therefore, these problems show great overlap, requiring common algorithms that address both of them at the same time. Recently, alternative approaches of learning methods have been proposed to improve the performance of deep learning models, without requiring the development of novel architectures or the collection of more data. One of these approaches is "multi-task learning", where multiple tasks are learned jointly, sharing information between them. Multi-task learning has been successfully applied to various computer vision tasks. Tasks including actions, such as action recognition and pose estimation have been shown to benefit from multi-task learning. While in the field of gesture recognition, multi-task learning has also been applied to tasks such as hand gesture recognition and segmentation, achiving remarkable results. In this thesis, we aim to show that the tasks of action and gesture recognition can be learned jointly, benefiting from each other. We constuct different multi-task learning models, where the tasks of action and gesture recognition are learned jointly. We evaluate the performance of the proposed models on the respective single- task learning models for action and gesture recognition. The results show that the proposed models achieve better performance compared to the single task models, demonstrating the benefits of multi-task learning in action and gesture recognition. Moreover, we extent this method to develop a multimodal multi-task learning model, where different modalities, specifically rgb and depth data, can be learnt jointly in the same framework, to achieve better performance in comparison to single task models and multimodal approaches.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/19247
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
kspathis_thesis.pdf		8.16 MB	Adobe PDF	View/Open

Show full item record