Example-driven Design of Efficient Record Matching Queries

Guided search

Click a term to initiate a search.

Keyword search

Example-driven Design of Efficient Record Matching Queries

Wed, 03/19/2008 - 15:10 — koepcke

Authors:

Chaudhuri, Surajit;Chen, Bee-Chung;Ganti, Venkatesh;Kaushik, Raghav

Author:

Chaudhuri, S

Chen, B

Ganti, V

Kaushik, R

Year:

2007

Venue:

VLDB

URL:

http://portal.acm.org/citation.cfm?id=1325891&jmp=cit&coll=&dl=ACM

Citations:

Citations range:

10 - 49

Attachment	Size
Chaudhuri2007ExampledrivenDesignofEfficientRecordMatchingQueries.pdf	11.7 KB

Record matching is the task of identifying records that match the same real world entity. This is a problem of great significance for a variety of business intelligence applications. Implementations of record matching rely on exact as well as approximate string matching (e.g., edit distances) and use of external reference data sources. Record matching can be viewed as a query composed of a small set of primitive operators. However, formulating such record matching queries is difficult and depends on the specific application scenario. Specifically, the number of options both in terms of string matching operations as well as the choice of external sources can be daunting. In this paper, we exploit the availability of positive and negative examples to search through this space and suggest an initial record matching query. Such queries can be subsequently modified by the programmer as needed. We ensure that the record matching queries our approach produces are (1) efficient: these queries can be run on large datasets by leveraging operations that are well-supported by RDBMSs, and (2) explainable: the queries are easy to understand so that they may be modified by the programmer with relative ease. We demonstrate the effectiveness of our approach on several real-world datasets.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Example-driven Design of Efficient Record Matching Queries

Related categories

User login