An Exploration of Deep Learning Architectures for Handwritten Text Recognition

Vasiliki, Tassopoulou

Εθνικό Μετσόβιο Πολυτεχνείο

Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών

Καλώς ήρθατε στο Άρτεμις

Σκοπός του Άρτεμις είναι η συστηματική αρχειοθέτηση και διαδοση της πνευματικής παραγωγής της Σχολής Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του Εθνικού Μετσόβιου Πολυτεχνείου, με τη βοήθεια της τεχνολογίας των ψηφιακών βιβλιοθηκών.

Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17428

Πλήρες αρχείο μεταδεδομένων

Πεδίο DC	Τιμή	Γλώσσα
dc.contributor.author	Vasiliki, Tassopoulou	-
dc.date.accessioned	2019-11-08T10:56:23Z	-
dc.date.available	2019-11-08T10:56:23Z	-
dc.date.issued	2019-11-07	-
dc.identifier.uri	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17428	-
dc.description.abstract	The objective of this thesis is the study of the Handwritten Text Recognition problem with the use of deep learning models. In this thesis, we experiment with a variety of tasks that apply to the whole pipeline that synthesizes our final model. At first, we implement the baseline architecture and then we experiment with dynamic data augmentation. We implement two new augmentation techniques, the local affine transform, and the local morphological transform. Our incentive behind this is the implementation of transformations that will augment the letters and not the whole text line. Generally, we deduced that dynamic data augmentation makes the model more able to generalize and improves recognition rates. Then, we experiment with the CTC alignments that our model learns. We augment the target sequence with bigrams, except for unigrams. We train such complex alignments so as to obtain a bigram level visual language model and we utilize it in two new CTC beam search decoding algorithms, extended in such way so as to support the integration of obtained bigram information, in order to improve the recognition rates. Thereinafter, we experiment with multitask architectures with CTC, both hierarchical and block. Our experiments culminate in significant improvement in the recognition rate. With the multitask approach we exploit the language information (domain knowledge) in two ways. We integrate it both in the learning procedure via the ngrams, that are selected as target units, and the decoding process via the statistical language models. Finally, we implement a fully convolutional architecture where both the optical and sequential models were composed of convolutions. We show that the CTC layer can be successfully employed on top of a CNN network. Also, we found out that one-dimensional convolution can model sufficiently the temporal relationships among the features. Finally, our fully convolutional model converges fast, has significantly lower training and inference time and has also respectfully fewer parameters than the aforementioned architectures.	en_US
dc.language	en	en_US
dc.subject	Handwritten Text Recognition	en_US
dc.subject	Multitask Learning	en_US
dc.subject	Sequence Modeling	en_US
dc.subject	Decoding Algorithms	en_US
dc.subject	Convolutional Networks	en_US
dc.subject	Dynamic Data Augmentation	en_US
dc.subject	Statistical Language Models	en_US
dc.title	An Exploration of Deep Learning Architectures for Handwritten Text Recognition	en_US
dc.description.pages	152	en_US
dc.contributor.supervisor	Μαραγκός Πέτρος	en_US
dc.department	Τομέας Σημάτων, Ελέγχου και Ρομποτικής	en_US
Εμφανίζεται στις συλλογές:	Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:

Αρχείο	Περιγραφή	Μέγεθος	Μορφότυπος
Thesis_TassopoulouVasiliki.pdf		5.57 MB	Adobe PDF	Εμφάνιση/Άνοιγμα

Δείξε τη σύντομη περιγραφή του τεκμηρίου

Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.