Defining the XML Schema Matching Problem for a Personal Schema Based Query Answering System

Authors: 
Smiljanic, M.; van Keulen, M.; Jonker, W.
Author: 
Smiljanic, M
van Keulen, M
Jonker, W
Year: 
2004
URL: 
http://doc.utwente.nl/fid/6725
Citations: 
7
Citations range: 
1 - 9

XML brought several important qualities to data representation. Through the usage of tags, it
combined schema and data information. Tag nesting enabled a simple representation of hierarchical
relations. Such enrichments sparked off a new wave of research on how to improve querying
and searching of data within XML documents.
The Internet is practically an endless collection of data being used simultaneously by millions
of users. We expect that for a large part, this information will become available in XML. As such,
it is a valuable source of information for other users, who need information finding services to
guide them through the wealth of information.
In this report, we investigate a specific information finding approach – personal schema querying.
The target environment for this approach is the XML-Web – an Internet based collection of
XML data sources. Each data source in the XML-Web allows for querying of its data and access
to an XML schema of that data.
In personal schema querying, users need not know the structure of XML-Web data. For querying,
they use a self-defined XML schema, as their personal model of the ‘universe of discourse’.
This personal model is, in a sense, imposed on the XML-Web instead of the other way around.
Obviously, a personal XML schema is not likely to be the same as any of the schemas of the
XML-Web’s data sources. Therefore, finding data corresponding to the personal schema requires
it to be matched against the schemas of the data sources. This process is called XML schema
matching.
In this report, we analyze the problem of personal schema matching. We define the ingredients
of the XML schema matching problem using constraint logic programming. This allows us to
thourougly investigate specific matching problems. We do not have the ambition to provide for a
formalism that covers all kinds of schema matching problems. The target is specifically personal
schema matching using XML.