Record linkage: Current practice and future directions

Authors: 
Gu, L; Baxter, R; Vickers, D; Rainsford, C
Author: 
Gu, L
Baxter, R
Vickers, D
Rainsford, C
Year: 
2003
Venue: 
CMIS Technical Report No. 03/83, CSIRO Mathematical and Information Sciences, http://datamining.csiro.au
URL: 
http://www.act.cmis.csiro.au/rohanb/PAPERS/record_linkage.pdf
Citations: 
137
Citations range: 
100 - 499
AttachmentSize
Gu2003RecordlinkageCurrentpracticeandfuturedirections.pdf257.95 KB

Record linkage is the task of quickly and accurately identifying
records corresponding to the same entity from one or more data
sources. Record linkage is also known as data cleaning, entity reconciliation
or identification and the merge/purge problem. This paper presents
the “standard” probabilistic record linkage model and the associated
algorithm. Recent work in information retrieval, federated database systems
and data mining have proposed alternatives to key components of
the standard algorithm. The impact of these alternatives on the standard
approach are assessed. The key question is whether and how these
new alternatives are better in terms of time, accuracy and degree of
automation for a particular record linkage application.