Παρακαλώ χρησιμοποιήστε αυτό το αναγνωριστικό για να παραπέμψετε ή να δημιουργήσετε σύνδεσμο προς αυτό το τεκμήριο: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18528
Τίτλος: Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis
Συγγραφείς: Χλαπάνης, Σπυρίδων Οδυσσέας
Ποταμιάνος Αλέξανδρος
Λέξεις κλειδιά: μηχανική μάθηση
νευρωνικά δίκτυα
βαθιά μάθηση
τεχνητή νοημοσύνη
πολυτροπικά δεδομένα
ανάλυση διάθεσης
προσαρμογείς
συγχώνευση
BERT
multimodal
sentiment analysis
fusion
Ημερομηνία έκδοσης: 2-Νοε-2022
Περίληψη: Over the past few years, the abundance of multimedia data and progress in core machine learning algorithms has set the scene for multimodal machine learning as one of the frontiers of applied AI research. Τhe usage of social networks has exploded leading to massive amounts of data available. In addition, the recent success of the so-called Pretrained Language Models (PLMs) has encouraged the creation of many fascinating new applications. However, training these deep networks in multiple stages, as this trend suggests, comes at the cost of increased model parameters. In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. Specifically, the task that is going to be tackled is sentiment analysis on videos with text, visual and acoustic data. BERT is a deep pretrained neural network architecture that was originally used for processing language information and consists of multiple neural network layers, which are called transformer layers. The adapter is a neural module that is interleaved in between the layers of BERT in order to adjust the pretrained language model for the task at hand. This allows for transfer learning to the new task, but in contrast with fine-tuning which is the prevalent method, adapters are parameter-efficient. The fusion layers are composed of a simpler feedforward neural network aiming to perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. During the adaptation process the pretrained language model parameters remain frozen, allowing for fast, parameter-efficient training. Extensive ablation studies are performed which reveal that this approach leads to an efficient model. Adapters prove empirically to help with performance although they train much less parameters, because they avoid some of the issues with standard approaches of transfer learning. They can outperform these costly approaches which consist of the aforementioned fine-tuning that refines the weights of the model to adapt it to the new task. Also, the proposed model shows signs of robustness to input noise, which is fundamental for real-life applications. The experiments on sentiment analysis with CMU-MOSEI reveal that AMB outperforms the current state-of-the-art across metrics, with 3.4% relative reduction in the resulting error and 2.1% relative improvement in 7-class classification accuracy.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18528
Εμφανίζεται στις συλλογές:Διπλωματικές Εργασίες - Theses

Αρχεία σε αυτό το τεκμήριο:
Αρχείο Περιγραφή ΜέγεθοςΜορφότυπος 
NTUA_ECE_Thesis_AMB_Chlapanis.pdf5.52 MBAdobe PDFΕμφάνιση/Άνοιγμα


Όλα τα τεκμήρια του δικτυακού τόπου προστατεύονται από πνευματικά δικαιώματα.