Adaptive sorted neighborhood methods for efficient record linkage

Yan, S; Lee, D; Kan, MY; Giles, CL
Proc. 2007 Conf. on Digital libraries

Traditionally, record linkage algorithms have played an important role in maintaining digital libraries - i.e., identifying matching citations or authors for consolidation in updating or integrating digital libraries. As such, a variety of record linkage algorithms have been developed and deployed successfully. Often, however, existing solutions have a set of parameters whose values are set by human experts off-lineand are fixed during the execution.

Learning metadata from the evidence in an on-line citation matching scheme

Councill, Isaac G.; Li, Huajing; Zhuang, Ziming; Debnath, Sandip; Bolelli, Levent; Lee, Wang-Chien; Sivasubramaniam, Anand; Giles, C. Lee
Joint Conference on Digital Libraries 2006 (JCDL 2006): 276-285, 2006

Citation matching, or the automatic grouping of bibliographic
references that refer to the same document, is a data management
problem faced by automatic digital libraries for scientific
literature such as CiteSeer and Google Scholar. Although several
solutions have been offered for citation matching in large
bibliographic databases, these solutions typically require
expensive batch clustering operations that must be run offline.
Large digital libraries containing citation information can reduce
maintenance costs and provide new services through efficient

Group Linkage

On, Byung-Won; Koudas, Nick; Lee, Dongwon; Srivastava, Divesh

Poor quality data is prevalent in databases due to a variety
of reasons, including transcription errors, lack of standards
for recording database fields, etc. To be able to query
and integrate such data, considerable recent work has focused
on the record linkage problem, i.e., determine if two
entities represented as relational records are approximately
the same. Often entities are represented as groups of relational
records, rather than individual relational records,
e.g., households in a census survey consist of a group of persons.

Improving Grouped-Entity Resolution using Quasi-Cliques

On, BW; Elmacioglu, E; Lee, D; Kang, J; Pei, J

The entity resolution (ER) problem, which identifies duplicate
entities that refer to the same real world entity, is
essential in many applications. In this paper, in particular,
we focus on resolving entities that contain a group of
related elements in them (e.g., an author entity with a list
of citations, a singer entity with song list, or an intermediate
result by GROUP BY SQL query). Such entities, named
as grouped-entities, frequently occur in many applications.
The previous approaches toward grouped-entity resolution
often rely on textual similarity, and produce a large number

Syndicate content