Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/15941
Title: Διαχωρισμός Και Κατηγοριοποίηση Καταχωρήσεων Ιστολογίων
Authors: Αναστασιαδης Αντωνιος
Σελλής Τιμολέων
Keywords: blogs
data mining
classification
Issue Date: 14-Mar-2011
Abstract: The scope of this thesis was the development of methods for the automaticextraction of the posts found in blog pages on the internet, and to classifythem as to the opinion they represent regarding a specific topic. Thosemethods take advantage of the syntactic information of the HTML code ofthe blog web pages, as well as their feeds and the date strings they contain.We also use an algorithm with Support Vector Machines to classify theextracted posts into two collections that represent the positive and negativeopinions respectively.Moreover, we developed a standaloneJava application, that given acorpus of blogs, it extracts their posts in an automatic and efficient way.We also developed tools that format the extracted data in feature vectorrepresentation format that is ready for classification, as well as classify it.This work can be used as a basis for a more complex system thatfinds, separates and classifies blogs using more advanced methods suchas lingual analysis and machine learning to extract and classify their posts.
URI: http://artemis-new.cslab.ece.ntua.gr:8080/jspui/handle/123456789/15941
Appears in Collections:Διπλωματικές Εργασίες - Theses

Files in This Item:
File SizeFormat 
DT2011-0040.pdf808.42 kBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.