Ανίχνευση μεταφορικών γλωσσικών φαινομένων με την χρήση Συνδυασμού Βαθιών Νευρωνικών δικτύων

Ποταμιάς, Ρολάνδος Αλέξανδρος

National Technical University of Athens

School of Electrical and Computer Engineering

Artemis is Live!

Welcome to our digital repository! The aim of Artemis is the systematic archiving and dissemination of the scientific work produced in the School of Electrical and Computer Engineering, National Technical University of Athens, Greece, using the technology of digital libraries.

Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17170

Title:	Ανίχνευση μεταφορικών γλωσσικών φαινομένων με την χρήση Συνδυασμού Βαθιών Νευρωνικών δικτύων
Authors:	Ποταμιάς, Ρολάνδος Αλέξανδρος Σταφυλοπάτης Ανδρέας-Γεώργιος
Keywords:	Ανάλυση συναισθήματος τεχνητά νευρωνικά δίκτυα επεξεργασία φυσικής γλώσσας βαθιά μάθηση μεταφορικά γλωσσικά φαινόμενα ειρωνεία σαρκασμός μηχανική μάθηση sentiment analysis figurative language machine learning deep learning sarcasm irony natural language processing artificial neural networks
Issue Date:	3-Oct-2018
Abstract:	Αντικείμενο της διπλωματικής εργασίας είναι η ανάπτυξη μοντέλων αναγνώρισης μεταφορι- κών γλωσσικών φαινομένων(ΜΓΦ) με τεχνικές βαθιάς μηχανικής μάθησης (Deep Learning). Το πρόβλημα της αναγνώρισης και κατάταξης ΜΦΓ αποτελεί ένα ανοιχτό πρόβλημα της Συναισθηματικής Ανάλυσης στο πεδίο της Επεξεργασίας Φυσικής Γλώσσας λόγω της νοηματικής αντίθεσης που περιέχεται σε αυτά. Το πρόβλημα αυτό αποτελείται από την αναγνώριση τριών αλληλένδετων ΜΓΦ: του σαρκασμού, της ειρωνείας και της μεταφοράς, τα οποία, στα πλαίσια της παρούσας εργασίας, αντιμετωπίζονται με προηγμένες τεχνικές βαθείας μηχανικής μάθησης (RNN, LSTM) και με τεχνικές μηχανισμών διανυσματικής υποστήριξης (SVM). Αρχικά, διευρευενούνται μέσω εκτεταμένης βιβλιογραφικής έρευνας οι τεχνολογίες αιχμής (stat-of-the-art) και οι ερευνητικές εξελίξεις στην ανίχνευση και αναγνώριση ΜΓΦ και κατα- γράφονται οι σημαντικότερες προσεγγίσεις. Στην ανασκόπηση αυτή, δίνεται ιδιαίτερη έμφαση τόσο στην μέθοδο εξαγωγής χαρακτηριστικών όσο και στους αλγορίθμους μηχανικής μάθη- σης που χρησιμοποιούνται. Στην συνέχεια περιγράφονται συνοπτικά οι βασικές θεωρητικές αρχές πάνω στις οποίες βασίζεται η προτεινόμενη αντιμετώπιση του προβλήματος. Στην συνέχεια αναπτύσσεται το πλαίσιο και το στάδιο προεπεξεργασίας των σχετικών δεδομένων (από κοινωνικά δίκτυα, -tweets)με σκοπό τη βέλτιστη προετοιμασία τους πριν ει- σαχθούν στα μοντέλα βαθιάς μηχανικής μάθησης. Επιπρόσθετα, εξάγονται από τα δεδομένα χαρακτηριστικά που μπορούν να διαχωριστούν σε τέσσερις κατηγορίες: τα συντακτικά, εκ- φραστικά, συναισθηματικά και ψυχολογικά, καθένα από τα οποία αποτυπώνει πτυχές για την μέθοδο γραφής και εκφοράς λόγου του χρήστη των κοινωνικών δικτύων. Τέλος, δημιουργείται ένα πρωτότυπο μοντέλο Deep Ensemble Soft Classifier-DESC, που συνδυάζει αλγορίθμους βαθιάς μάθησης. Χρησιμοποιώντας τέσσερα διαφορετικά σύνολα δε- δομένων αναφοράς (benchmark data), από γνωστά και διαδεδομένα συνέδρια και σχετικούς διαγωνισμούς (Semantic Evaluation-SemVal), και εξαντλητική αξιολόγηση της ικανότητας αναγνώρισης, διακρίνουμε πως το μοντέλο DESC επιτυγχάνει πολύ καλή συμπεριφορά, άξια σύγκρισης με σχετικές μεθοδολογίες και τεχνολογίες αιχμής στο προκλητικό πεδίο της ανα- γνώρισης ΜΓΦ. The subject of the diploma thesis is the development of models for the recognition of figurative language (FL) utilizing deep learning techniques. The management, recognition and classification of FL is an open problem of Sentiment analysis in the broader field of natural language processing (NLP) due to the contradictory meaning contained in phrases with metaphorical content. The problem itself represent three interrelated FL recognition tasks: sarcasm, irony and metaphor which, in the present work, are dealt with advanced deep learning (Recurrent Neural Networks, LSTM) and support vector machine (SVM) techniques. Initially, the state-of-the-art technologies in the field of FL detection and recognition are being explored through extensive bibliographical research, and the most important approaches are documented. The emphasis of the review is placed on both the feature extraction methodologies and the machine learning algorithms being utilized. In the sequel, the basic theoretical principles and techniques, on which the proposed approach is based, are presented. Next, the prepossessing framework of the relevant social-media data (tweets) is pre- sented. Data prepossessing aims towards efficient data representation formats so that to optimize the respective inputs to the deep learning models. In addition, special features are extracted from the data in order to characterize the syntactic, expressive, emotional and temper content reflected in the respective social media text references. These features aim to capture aspects of the social network user’s writing method. Finally, a prototype, Deep Ensemble Soft Classifier-DESC is created which, is based on the combination of different deep learning techniques. Using four different sets of benchmark data-sets, from well-known and widespread conferences and related contests, and based on the assessment of the performance of different FL recognition approaches, we conclude that the DESC model achieves a very good performance, worthy of comparison with relevant methodologies and state-of-the-art technologies in the challenging field of FL recognition.
URI:	http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/17170
Appears in Collections:	Διπλωματικές Εργασίες - Theses

Files in This Item:

File	Description	Size	Format
figurative language detection.pdf		4.37 MB	Adobe PDF	View/Open

Show full item record