Interactive generation of integrated schemas

Authors: 
Chiticariu, L; Kolaitis, PG; Popa, L
Author: 
Chiticariu, L
Kolaitis, PG
Popa, L
Year: 
2008
Venue: 
Proc. 2008 ACM SIGMOD international ...
URL: 
http://portal.acm.org/citation.cfm?id=1376616.1376700
Citations: 
39
Citations range: 
10 - 49
AttachmentSize
Chiticariu2008Interactivegenerationofintegratedschemas.pdf319.75 KB

Schema integration is the problem of creating a unified target schema
based on a set of existing source schemas that relate to each other
via specified correspondences. The unified schema gives a stan-
dard representation of the data, thus offering a way to deal with the
heterogeneity in the sources. In this paper, we develop a method
and a design tool that provide: 1) adaptive enumeration of mul-
tiple interesting integrated schemas, and 2) easy-to-use capabili-
ties for refining the enumerated schemas via user interaction. Our
method is a departure from previous approaches to schema integra-
tion, which do not offer a systematic exploration of the possible
integrated schemas.
The method operates at a logical level, where we recast each
source schema into a graph of concepts with Has-A relationships.
We then identify matching concepts in different graphs by taking
into account the correspondences between their attributes. For ev-
ery pair of matching concepts, we have two choices: merge them
into one integrated concept or keep them as separate concepts. We
develop an algorithm that can systematically output, without dupli-
cation, all possible integrated schemas resulting from the previous
choices. For each integrated schema, the algorithm also generates a
mapping from the source schemas to the integrated schema that has
precise information-preserving properties. Furthermore, we avoid
a full enumeration, by allowing users to specify constraints on the
merging process, based on the schemas produced so far. These con-
straints are then incorporated in the enumeration of the subsequent
schemas. The result is an adaptive and interactive enumeration
method that significantly reduces the space of alternative schemas,
and facilitates the selection of the final integrated schema.