Record Matching over Query Results from Multiple Web Databases

Guided search

Click a term to initiate a search.

Keyword search

Record Matching over Query Results from Multiple Web Databases

Mon, 04/11/2011 - 08:30 — cat

Authors:

Su, W; Wang, J; Lochovsky, F.H.

Author:

Su, Weifeng

Wang, Jiying

Lochovsky, Frederick H.

Year:

2010

Venue:

IEEE Transactions on Knowledge and Data Engineering

URL:

http://www.computer.org/portal/web/csdl/doi/10.1109/TKDE.2009.90

Citations:

Citations range:

10 - 49

Record matching, which identifies the records that represent the same real-world entity, is an important step for data
integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These
methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated onthe-
fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on
the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised,
online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of
multiple Web databases. After removal of the same-source duplicates, the “presumed” nonduplicate records from the same source can
be used as training examples alleviating the burden of users having to manually label training examples. Starting from the nonduplicate
set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify
duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database
scenario where existing supervised methods do not apply.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Record Matching over Query Results from Multiple Web Databases

Related categories

User login