ics.uci.edu

Exploiting context analysis for combining multiple entity resolution systems

Authors: 
Chen, Zhaoqi; Kalashnikov, Dmitri V.; Mehrotra, Sharad
Year: 
2009
Venue: 
SIGMOD

Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER.

Adaptive graphical approach to entity resolution

Authors: 
Chen, Z; Kalashnikov, DV; Mehrotra, S
Year: 
2007
Venue: 
Proc. Conf. Digital Libraries, 2007

Entity resolution is a very common Information Quality (IQ)
problem with many different applications. In digital libraries,
it is related to problems of citation matching and author
name disambiguation; in Natural Language Processing, it is
related to coreference matching and object identity; in Web
application, it is related to Web page disambiguation. The
problem of Entity Resolution arises because objects/entities
in real world datasets are often referred to by descriptions,
which might not be unique identifiers of these entities, lead-

Efficient record linkage in large data sets

Authors: 
Jin, L.; Li, C.; Mehrotra, S.
Year: 
2003
Venue: 
Eighth International Conference on Database Systems for Advanced Applications, 2003

This paper describes an efficient approach to record linkage.
Given two lists of records, the record-linkage problem
consists of determining all pairs that are similar to each
other, where the overall similarity between two records is
defined based on domain-specific similarities over individual
attributes constituting the record. The record-linkage
problem arises naturally in the context of data cleansing
that usually precedes data analysis and mining. We explore
a novel approach to this problem. For each attribute
of records, we first map values to a multidimensional

Syndicate content