Learning-based Entity Resolution with MapReduce

Authors: 
Kolb, L; Köpcke, H; Thor, A; Rahm, E
Author: 
Kolb, L
Köpcke, H
Thor, A
Rahm, E

Entity resolution is a crucial step for data quality and data
integration. Learning-based approaches show high effective-
ness at the expense of poor efficiency. To reduce the typ-
ically high execution times, we investigate how learning-
based entity resolution can be realized in a cloud infras-
tructure using MapReduce. We propose and evaluate two
efficient MapReduce-based strategies for pair-wise similar-
ity computation and classifier application on the Cartesian
product of two input sources. Our evaluation is based on
real-world datasets and shows the high efficiency and effec-
tiveness of the proposed approaches.

Year: 
2011
Venue: 
CloudDB 2011
URL: 
http://dbs.uni-leipzig.de/de/publication/learning_based_er_with_mr
Citations: 
0
Citations range: 
n/a
AttachmentSize
learning_based_er_with_mr.pdf702.91 KB