Please use this identifier to cite or link to this item: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18193
Title: Managing Evolution in Web Data through Complex Changes
Authors: Γαλάνη, Θεοδώρα
Βασιλείου Ιωάννης
Keywords: change modeling, change definition language, change detection, RDF(S), querying data evolution, XML, XPath
Issue Date: 2-Nov-2021
Abstract: The increasing amount of information published on the web poses new challenges for data management. A central issue concerns evolution management. Data published on the web frequently change, as errors may need to be fixed or new knowledge has to be incorporated. Data consumers need to know what changed among versions, as well as how and why. Revisiting past data snapshots and versions may not be enough for tracking and understanding the semantics of data evolution. Such an activity may require a search that moves backwards and forwards in time, spread across disparate parts of a database, and perform complex queries on the semantics of the changes that modified the data, a task which may be even more intensive for large datasets. In our view, for understanding data evolution changes should be treated as first-class-citizens. This means that human-readable, semantically rich changes are supported, along with any relations between them. Treating changes as first-class-citizens poses several challenges regarding modeling, defining, detecting and querying changes. In this thesis, we study these directions and work upon two basic standards for web data: RDF and XML. First, we propose our approach on modeling, defining and detecting changes in the context of RDF(S) knowledge bases. Overall, the proposed approach offers expressiveness and flexibility in terms of evolution interpretation. The proposed complex changes provide additional information for interpreting past data, via capturing relations between changes and allowing interpreting evolution in multiple ways. Specifically, we proposed modeling and supporting simple and complex changes, as well as any relations among them, for interpreting evolution on RDF(S) knowledge-bases. Simple changes are fine-grained and application/data-agnostic changes, while complex changes are coarse-grained and application/data-specific changes. Furthermore, we formally defined an intuitive, user-friendly language, based on change semantics for defining complex changes. We formally defined the language syntax, via EBNF specification, as well as the language semantics. Moreover, we presented a detection algorithm for the proposed complex change definition language. The dynamics model followed is to detect changes between dataset versions. Therefore, the ultimate goal of defining complex changes is identifying complex change instances between dataset versions, via the complex change detection process. Also, the correctness of the proposed implementation with respect to the language semantics is presented. Finally, we extensively evaluated the proposed approach both qualitatively and experimentally. The qualitative evaluation showed the added value of our approach compared to related works. The experimental evaluation showed the complex change language expressiveness and the detection performance. The proposed language is proven to be adequate in expressing useful changes and facilitating user in analyzing evolution. The response time of the detection process is examined in terms of increasing dataset size. The experimental evaluation is performed over both artificial and real data, proving the effectiveness of our approach. Second, we propose a query language for querying both data versions and change structures in the context of semistructured XML data. This work builds upon evo-graph, a model that captures evolving data along with changes, and evoXML, an XML representation of evo-graph. Specifically, we formally defined evo-path, an XPath extension for performing time-aware and change-aware queries on evo-graph. Evo-path allows querying both data history and change structure in a uniform way, supporting temporal, evolution and causality queries. We presented the evo-path syntax, we defined evo-path formal semantics and we presented an implementation based on a formal translation of evo-path into equivalent XPath expressions over evoXML. Also, we implemented and experimentaly evaluated the basic concepts of evo-graph in the C2D framework, using XML technologies. The space efficiency of evoXML is examined for various configurations, as well as the performance of the reduction process, the process for generating a snapshot holding under a specific time instance from evo-graph.
URI: http://artemis.cslab.ece.ntua.gr:8080/jspui/handle/123456789/18193
Appears in Collections:Διδακτορικές Διατριβές - Ph.D. Theses

Files in This Item:
File Description SizeFormat 
phdTheodoraGalani.pdf1.73 MBAdobe PDFView/Open


Items in Artemis are protected by copyright, with all rights reserved, unless otherwise indicated.