On Schema Matching with Opaque Column Names and Data Values

Guided search

Click a term to initiate a search.

Keyword search

On Schema Matching with Opaque Column Names and Data Values

Tue, 05/16/2006 - 19:27 — admin

Authors:

Kang, J.; Naughton, J. F.

Author:

Kang, J

Naughton, J

Year:

2003

Venue:

SIGMOD, 2003

URL:

http://dit.unitn.it/~accord/RelatedWork/Matching/Kang03.pdf

Citations:

182

Citations range:

100 - 499

Attachment	Size
Kang2003OnSchemaMatchingwithOpaque.pdf	227.27 KB

Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar" column names in the schemas to be matched, or by recognizing common domains in the data stored in the schemas. While each of these approaches is valuable in many cases, they are not infallible, and there exist instances of the schema matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are "opaque" or very difficult to interpret. In this paper we propose a two-step technique that works even in the presence of opaque column names and data values. In the first step, we measure the pair-wise attribute correlations in the tables to be matched and construct a dependency graph using mutual information as a measure of the dependency between attributes. In the second stage, we find matching node pairs in the dependency graphs by running a graph matching algorithm. We validate our approach with an experimental study, the results of which suggest that such an approach can be a useful addition to a set of (semi) automatic schema matching techniques.

cs.wisc.edu

websearch

Schema Evolution publication categorizer

Guided search

Schema Evolution

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

On Schema Matching with Opaque Column Names and Data Values

Related categories

User login