XEM: XML Evolution Management

Kramer, D.
Kramer, D
Masters Thesis, Worchester Polytechnic Institute, 2001
Citations range: 
10 - 49

As information on the World Wide Web continues to proliferate at an astounding rate, the Extensible Markup Language (XML) has been emerging as a standard format for data representation on the web. In many application domains, specific document type definitions (DTDs) are designed to enforce a semantically agreed-upon structure of the XML documents. In XML context, these structural definitions serve as schemata. However, both the data and the structure (schema) of XML documents tend to change over time for a multitude of reasons, including to correct design errors in the DTD, to allow expansion of the application scope over time, or to account for the merging of several businesses into one. Most of the current software tools that enable the use of XML do not provide explicit support for such data or schema changes. Using these tools in a changing environment entails making manual edits to DTDs and XML data and reloading them from scratch. In this vein, we put forth the first solution framework, called XML Evolution Manager (XEM), to manage the evolution of DTDs and XML documents. XEM provides a minimal yet complete taxonomy of basic change primitives. These primitives, classified as either data or schema changes, are consistency-preserving. For a data change, they ensure that the modified XML document conforms to its DTD both in structure and constraints. For a schema change, they ensure that the new DTD is well-formed, and all existingXML documents are transformed also to conform to the modified DTD.We prove both the completeness of our evolution taxonomy, as well as its consistency-preserving nature. To verify the feasibility of our XEM approach we have implemented a working prototype system in Java, using the XML4J parser from IBM and PSE Pro as our backend storage system. We present an experimental study run on this system where we compare the relative efficiencies of the primitive operations in terms of their execution times. We then contrast these execution times against the time to reload the data, which would be required in a manual system. Based on the results of these experiments we conclude that our approach improves upon the previous method of making manual changes and reloading data from scratch by providing automated evolution management facilities for DTDs and XML documents.