Transfer Learning and Attention-based Conditioning Methods for Natural Language Processing

Μαργατίνα, Αικατερίνη

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17295

Title:	Transfer Learning and Attention-based Conditioning Methods for Natural Language Processing
Authors:	Μαργατίνα, Αικατερίνη Ποταμιάνος Αλέξανδρος
Keywords:	deep learning natural language processing attention mechanism transfer learning machine learning sentiment analysis emotion recognition language modelling
Issue Date:	5-Jul-2019
Abstract:	In this work, we investigate methods to augment the inductive bias of deep neural models for natural language processing tasks. Our goal is to improve performance of recurrent neural networks in a family of sentiment analysis tasks. Specifically, our research includes; (1) transferring knowledge from pretrained models in order to leverage different domains and tasks, and (2) integrating prior information from human experts to deep neural architectures. First, we propose a method for successfully utilizing a pretrained sentiment analysis classification model to reduce the test error rate on an emotion recognition classification task. Transfer learning from pretrained classifiers exploits the representation learned for one supervised setting with plenty of data, to obtain competitive results on a related task where a smaller dataset is available. We aim to leverage the learned representation of the pretrained sentiment model to tackle the emotion classification task. Next, we utilize pretrained representations from language models to address the same emotion classification task. In this case, the learning algorithm uses information obtained in the unsupervised phase to perform better in the supervised learning stage. Specifically, pretrained word representations captured by language models are useful as they encode contextual information and model syntax and semantics. We propose a three-step transfer learning method that includes pretraining a language model, fine-tuning the weights on the target task and transferring the model to a classifier to leverage these representations. We show an improvement of 10% on the WASSA 2018 emotion recognition dataset baseline. We achieve an F1-score of 70.3%, ranking in the top-3 positions of the shared task. Finally, we experiment with feature-wise conditioning methods to integrate prior knowledge into deep neural networks. We propose the integration of lexicon features into the self-attention mechanism of RNN-based architectures. This form of conditioning on the attention distribution, enforces the contribution of the most salient words for the task at hand. We introduce three methods, namely attentional concatenation, feature-based gating and affine transformation. Experiments on six benchmark datasets show the effectiveness of our methods. Attentional feature-based gating yields consistent performance improvement across tasks. Our approach is implemented as a simple add-on module for RNN-based models with minimal computational overhead and can be adapted to any deep neural architecture. Overall, our work is divided into two main research areas; the first is transfer learning methods of pretrained representations for implicit emotion recognition, while the second is attention-based conditioning methods for external knowledge integration into recurrent neural networks. Both works culminated into research papers, [25] and [83] respectively.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17295
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
Eng_Thesis_Kate.pdf		3.99 MB	Adobe PDF	View/Open

Show full item record