Authors:
Chen, Z; Kalashnikov, DV; Mehrotra, S
Author:
Chen, Z
Kalashnikov, D
Mehrotra, S
Venue:
Proc. Conf. Digital Libraries, 2007
URL:
http://portal.acm.org/citation.cfm?id=1255215
Entity resolution is a very common Information Quality (IQ)
problem with many different applications. In digital libraries,
it is related to problems of citation matching and author
name disambiguation; in Natural Language Processing, it is
related to coreference matching and object identity; in Web
application, it is related to Web page disambiguation. The
problem of Entity Resolution arises because objects/entities
in real world datasets are often referred to by descriptions,
which might not be unique identifiers of these entities, lead-
ing to ambiguity. The goal is to group all the entity descrip-
tions that refer to the same real world entities. In this pa-
per we present a graphical approach for entity resolution. It
complements the traditional methodology with the analysis
of the entity-relationship graph constructed for the dataset
being analyzed. The paper demonstrates that a technique
that measures the degree of interconnectedness between var-
ious pairs of nodes in the graph can significantly improve the
quality of entity resolution. Furthermore, the paper presents
an algorithm for making that technique self-adaptive to the
underlying data, thus minimizing the required participation
from the domain-analyst and potentially further improving
the disambiguation quality.