A probabilistic model for entity disambiguation using relationships

Authors: 
Kalashnikov, DV; Mehrotra, S
Author: 
Kalashnikov, D
Mehrotra, S
Year: 
2005
Venue: 
SIAM International Conference on Data Mining (SDM). Newport
URL: 
http://www.ics.uci.edu/~dvk/RelDC/TR/TR-RESCUE-04-12.pdf
Citations: 
16
Citations range: 
10 - 49
AttachmentSize
Kalashnikov2005Aprobabilisticmodelforentitydisambiguationusing.pdf603.5 KB

Graphs representing relationships among sets of entities are of increasing focus of interest in the
context of data analysis applications. These graphs are typically constructed from existing datasets
from which entities and relationships are extracted. For some of the entities, values in certain attributes
would refer to other entities – such references determine relationships. Often, for certain datasets such
references are given in the form of (string) descriptions. Each such description alone may not uniquely
identify one entity as it is supposed to, but rather can match descriptions of multiple entities. Such
cases are especially common if the datasets are collected not from one but multiple heterogeneous
sources. Thus the correct linking of entities via relationships can be a nontrivial challenge which, if
done incorrectly, can in turn impede further graph-based analyses. To overcome this problem, standard
feature-based data cleaning approaches can be employed. In this paper we argue a better solution exist
which analyzes not only features but also relationships.