Empathetic Dialogue Generation using generation-based models

Ζαράνης, Εμμανουήλ

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17808

Title:	Empathetic Dialogue Generation using generation-based models
Authors:	Ζαράνης, Εμμανουήλ Ποταμιάνος Αλέξανδρος
Keywords:	empathy dialog systems chatbots deep learning machine learning natural language processing dialogue generation transformers HRED BERT GPT2 T5 BERT2BERT BERT2GPT2 seq2seq multi-task learning transfer learning
Issue Date:	24-Nov-2020
Abstract:	Among the various approaches for building conversational agents able to entertain humans, open domain generation-based chatbots is a significant field of research. However, beyond understanding what is being discussed, human communication requires awareness of how someone is feeling. Following this perspective, in this diploma thesis, we study dialog generation and specifically we focus on the challenging task of building empathetic conversational agents, which are able to understand any implied feelings and respond accordingly. First, we provide the reader with a brief theoretical background on machine learning (ML), deep learning (DL) and Natural Language Processing (NLP). Then we study in depth generation-based models for dialog generation. More specifically, we analyze the traditional vanilla seq2seq architecture, the vanilla seq2seq with attention and the Hierarchical Recurrent Encoder Decoder (HRED) architecture. Afterwards, we study transformer-based models that can be used in dialogue generation such as the Transformer Encoder Decoder, the BERT, the GPT-2, and the T5 models. After presenting the theoretical background of those architectures, we analyze the most commonly used decoding methods in dialog generation providing typical examples for better understanding. Finally, we present the most common automatic and human evaluation metrics/methods used for ranking dialog systems. From the perspective of creating conversational agents that are able to understand the implied feelings of a conversation and respond accordingly, we focus on the Empathetic Dialogues task, a task proposed by Facebook. After, a brief introduction to the task and related work, we conduct several experiments and discuss the results. More specifically, at first, we analyze the datasets we used for the experiments (Empathetic Dialogues and ConvAI2) and then we present the baseline architectures used by other researchers on the task. Afterwards, we propose new ways for further improving the results of the task. More specifically, we experiment with the BERT2BERT and BERT2GPT2 architectures, achieving comparable results with already proposed models, but without reaching the state-of-the-art results. Furthermore, we experiment with three versions of the T5 model. In the first approach, we use the T5 model as is but fine-tune it on the Empathetic Dialogues dataset. In the second and the third approaches, we extend the T5 baseline architecture with multi-task learning. All of the T5-based approaches achieve state-of-the-art results in average BLEU score metric, while their performance as far as perplexity is concerned is close to the current state-of-the-art model. Moreover, after presenting the results of the experiments we provide various examples to demonstrate the performance of the proposed models more qualitatively. To further improve the proposed approach, we refer to promising future extensions and modifications that we suggest for future study.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17808
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
thesis_zaranis_manos.pdf		4.12 MB	Adobe PDF	View/Open

Show full item record