Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18751
Full metadata record
DC FieldValueLanguage
dc.contributor.authorΣβέζεντσεβ, Νταβίντ-
dc.date.accessioned2023-07-20T06:47:02Z-
dc.date.available2023-07-20T06:47:02Z-
dc.date.issued2023-07-12-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18751-
dc.description.abstractIn recent years, the computer vision community has exhibited growing interest in synthetic data. For the image modality, existing work has proposed learning visual representations by pre-training with synthetic samples produced by various generative processes instead of real data. Such an approach is advantageous as it resolves issues associated with real data: collection and labeling costs, copyright, privacy and human bias. Desirable properties of synthetic images have been carefully investigated and as a result the gap in performance between real and synthetic images has been alleviated significantly. The present work extends the aforementioned approach to the domain of video and applies it to the task of action recognition. Due to the addition of the temporal dimension, this modality is notably more complex than images. As such, employing fractal geometry and other generative processes, we present methods to automatically produce large-scale datasets of short synthetic video clips. This approach is applicable for both supervised and self-supervised learning. To narrow the domain gap, we manually observe real video samples and identify their key properties such as periodic motion, random background, camera displacement etc. These properties are then carefully emulated during pre-training. Through thorough ablations, we determine the properties that strengthen downstream results and offer general guidelines for pre-training with synthetic videos. The proposed approach is evaluated on small-scale action recognition datasets HMDB51 and UCF101 as well as four other video benchmarks. Compared to standard Kinetics pretraining, our reported results come close and are even superior on a portion of benchmarks.en_US
dc.languageenen_US
dc.subjectComputer Visionen_US
dc.subjectDeep Learningen_US
dc.subjectAction Recognitionen_US
dc.subjectSynthetic Dataen_US
dc.subjectFractal Geometryen_US
dc.titlePre-training for Video Action Recognition with Automatically Generated Datasetsen_US
dc.description.pages146en_US
dc.contributor.supervisorΜαραγκός Πέτροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File Description SizeFormat 
thesis_svyezhentsev.pdf5.99 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.