Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17528
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRetsinas, Georgios-
dc.date.accessioned2020-03-10T15:55:12Z-
dc.date.available2020-03-10T15:55:12Z-
dc.date.issued2020-02-27-
dc.identifier.urihttp://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17528-
dc.description.abstractDocument Analysis and Recognition is a prominent research area which combines the fields of Computer Vision and Machine Learning and has a great impact to humanitarian studies, by unraveling information stored in collections of historical documents all over the world. In this PhD thesis, we focus on extracting and learning visual representations capable of successfully detecting and recognizing text in handwritten documents. The main intention behind the developed methodologies, presented in thesis, is the creation of efficient systems with minimal computational requirements, aiming towards real-time applications. During the thesis, we tackle document-related problems of increasing difficulty, while the main goal is the development of a effective word detection approach by focusing on the improvement of the extracted visual representation of text. Specifically we explore feature extraction techniques along with possible improvement modifications, based on the specific characteristics of text images (possible text deformations e.t.c). Typical handcrafted feature extraction methods are compared to generating visual representations either from manifold embedding techniques or from deep learning approaches, which both show superior performance. An important part of this thesis is the study of Convolutional Neural Networks (CNNs) for the word detection problem along with their generalization capability, i.e.if it is possible to generate transferable and discriminative deep features. To this end, we propose several modified architectures in order to create compact, yet well-performing, features. Furthermore, we present a novel deep learning approach that combines both spotting and recognition tasks, leading to superior performance, while we also tackle the problem of line-level spotting from deep features viewpoint. Finally, we address the more generic neural network compression problem, which is not limited to document-related tasks. Specifically, we design two different approaches for model compression, both achieving significant compression according to size-accuracy trade-off on different datasets and settings, including image classification and keyword spotting tasks.en_US
dc.languageenen_US
dc.subjectDocument Analysis and Recognitionen_US
dc.subjectKeyword Spottingen_US
dc.subjectDeep learningen_US
dc.subjectNeural Network Compressionen_US
dc.titleVisual Representation Learning for Document Image Recognitionen_US
dc.description.pages220en_US
dc.contributor.supervisorΜαραγκός Πέτροςen_US
dc.departmentΤομέας Σημάτων, Ελέγχου και Ρομποτικήςen_US
Appears in Collections:Διδακτορικές Διατριβές - Ph.D. Theses

Files in This Item:
File Description SizeFormat 
phd_thesis_gretsinas_final.pdf16.94 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.