Composing Schema Mappings: Second-Order Dependencies to the Rescue

Authors: 
Fagin, R.; Kolaitis, P.G.; Popa, L.; Tan, W.C.
Author: 
Fagin, R
Kolaitis, P
Popa, L
Tan, W
Year: 
2004
Venue: 
PODS 2004, TODS 2005
URL: 
http://www.almaden.ibm.com/cs/people/fagin/compose.pdf
DOI: 
http://doi.acm.org/10.1145/1114244.1114249
Citations: 
252
Citations range: 
100 - 499
AttachmentSize
Fagin2004ComposingSchemaMappings.pdf486.39 KB

A schema mapping is a specification that describes how data structured under one schema (the
source schema) is to be transformed into data structured under a different schema (the target
schema). A fundamental problem is composing schema mappings: given two successive schema
mappings, derive a schema mapping between the source schema of the first and the target schema
of the second that has the same effect as applying successively the two schema mappings.
In this paper, we give a rigorous semantics to the composition of schema mappings and investigate
the definability and computational complexity of the composition of two schema mappings.
We first study the important case of schema mappings in which the specification is given by a
finite set of source-to-target tuple-generating dependencies (source-to-target tgds). We show that
the composition of a finite set of full source-to-target tgds with a finite set of tgds is always
definable by a finite set of source-to-target tgds, but the composition of a finite set of source-to-target
tgds with a finite set of full source-to-target tgds may not be definable by any set (finite
or infinite) of source-to-target tgds; furthermore, it may not be definable by any formula of least
fixed-point logic, and the associated composition query may be NP-complete. After this, we introduce
a class of existential second-order formulas with function symbols and equalities, which
we call second-order tgds, and make a case that they are the “right” language for composing
schema mappings. Specifically, we show that second-order tgds form the smallest class (up to
logical equivalence) that contains every source-to-target tgd and is closed under conjunction and
composition. Allowing equalities in second-order tgds turns out to be of the essence, even though
the “obvious” way to define second-order tgds does not require equalities. We show that secondorder
tgds without equalities are not sufficiently expressive to define the composition of finite sets
of source-to-target tgds. Finally, we show that second-order tgds possess good properties for data
exchange and query answering: the chase procedure can be extended to second-order tgds so that
it produces polynomial-time computable universal solutions in data exchange settings specified by
second-order tgds.