Statistical Schema Matching across Web Query Interfaces

Guided search

Click a term to initiate a search.

Keyword search

Statistical Schema Matching across Web Query Interfaces

Tue, 05/16/2006 - 19:27 — admin

Authors:

He, B.; Chang, K.C.C.

Author:

He, B

Chang, K

Year:

2003

Venue:

SIGMOD, 2003

URL:

http://dit.unitn.it/~accord/RelatedWork/Matching/He03.pdf

Citations:

320

Citations range:

100 - 499

Attachment	Size
He2003StatisticalSchemaMatching.pdf	235.9 KB

Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondence. This paper proposes a different approach, motivated by integrating large numbers of data sources on the Internet. On this "deep Web," we observe two distinguishing characteristics that offer a new view for considering schema matching: First, as the Web scales, there are ample sources that provide structured information in the same domains (e.g., books and automobiles). Second, while sources proliferate, their aggregate schema vocabulary tends to converge at a relatively small size. Motivated by these observations, we propose a new paradigm, statistical schema matching: Unlike traditional approaches using pairwise-attribute correspondence, we take a holistic approach to match all input schemas by finding an underlying generative schema model. We propose a general statistical framework MGS for such hidden model discovery, which consists of hypothesis modeling, generation, and selection. Further, we specialize the general framework to develop Algorithm MGSsd, targeting at synonym discovery, a canonical problem of schema matching, by designing and discovering a model that specifically captures synonym attributes. We demonstrate our approach over hundreds of real Web sources in four domains and the results show good accuracy.

websearch

Schema Evolution publication categorizer

Guided search

Schema Evolution

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Statistical Schema Matching across Web Query Interfaces

Related categories

User login