Automatic Training Example Selection for Scalable Unsupervised Record Linkage

Guided search

Click a term to initiate a search.

Keyword search

Automatic Training Example Selection for Scalable Unsupervised Record Linkage

Tue, 05/20/2008 - 09:56 — koepcke

Authors:

Christen, Peter

Author:

Christen, P

Year:

2008

Venue:

PAKDD

URL:

http://datamining.anu.edu.au/publications/2008/pakdd2008automatic.pdf

Citations:

Citations range:

10 - 49

Linking records from two or more databases is becoming
increasingly important in the data preparation step of many data min-
ing projects, as linked data can enable analysts to conduct studies that
are not feasible otherwise, or that would require expensive and time-
consuming collection of specific data. The aim of such linkages is to match
all records that refer to the same entity. One of the main challenges in
record linkage is the accurate classification of record pairs into matches
and non-matches. With traditional techniques, classification thresholds
have to be set either manually or using an EM-based approach. Many
modern classification techniques, on the other hand, are based on super-
vised machine learning and thus require training data, which is often not
available in real world situations. A novel two-step approach to unsu-
pervised record pair classification is presented in this paper. In the first
step, training examples are selected automatically, and in the second step
these examples are used to train a binary classifier. An experimental eval-
uation shows that this approach can outperform k-means clustering and
can also be much faster than other classification techniques.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Automatic Training Example Selection for Scalable Unsupervised Record Linkage

Related categories

User login