Exploiting relationships for domain-independent data cleaning

Authors: 
Kalashnikov, D.V.; Mehrotra, S.; Chen, Z.
Author: 
Kalashnikov, D
Mehrotra, S
Chen, Z
Year: 
2005
Venue: 
SIAM Data Mining (SDM), 2005
URL: 
http://www.ics.uci.edu/~dvk/RelDC/TR/TR-RESCUE-04-20.pdf
Citations: 
111
Citations range: 
100 - 499
AttachmentSize
Kalashnikov2005Exploitingrelationshipsfordomainindependentdatacleaning.pdf1.07 MB

In this paper we address the problemo f reference disambiguation. Specifically, we consider a situation
where entities in the database are referred to using descriptions (e.g., a set of instantiated attributes).
The objective of reference disambiguation is to identify the unique entity to which each description
corresponds. The key difference between the approach we propose (called RelDC) and the traditional
techniques is that RelDC analyzes not only object features but also inter-object relationships to improve
the disambiguation quality. Our extensive experiments over two real data sets and also over synthetic
datasets show that analysis of relationships significantly improves quality of the result.