csiro.au

A comparison of fast blocking methods for record linkage

Authors: 
Baxter, R; Christen, P; Churches, T
Year: 
2003
Venue: 
ACM SIGKDD

Record linkage of millions of individual health records for ethically-approved research purposes is a computationally expensive task. Blocking methods are used in record linkage systems to reduce the number of candidate record comparison pairs to a feasible number whilst still maintaining linkage accuracy. New blocking methods have been implemented recently using high-dimensional indexing or clustering algorithms.

Record linkage: Current practice and future directions

Authors: 
Gu, L; Baxter, R; Vickers, D; Rainsford, C
Year: 
2003
Venue: 
CMIS Technical Report No. 03/83, CSIRO Mathematical and Information Sciences, http://datamining.csiro.au

Record linkage is the task of quickly and accurately identifying
records corresponding to the same entity from one or more data
sources. Record linkage is also known as data cleaning, entity reconciliation
or identification and the merge/purge problem. This paper presents
the “standard” probabilistic record linkage model and the associated
algorithm. Recent work in information retrieval, federated database systems
and data mining have proposed alternatives to key components of
the standard algorithm. The impact of these alternatives on the standard

Syndicate content