Conference

Same, Same but Different: A Survey on Duplicate Detection Methods for Situation Awareness

Wed, 03/24/2010 - 15:16 — cat

Authors:

Baumgartner, N; Gottesheim, W; Mitsch, S.;Retschitzegger, W.; Schwinger, W.

Year:

2009

Venue:

Proc. OTM 2009, LNCS 5871

Systems supporting situation awareness typically deal with a vast stream of information about a large number of real-world objects anchored in time and space provided by multiple sources. These sources are often characterized by frequent updates, heterogeneous formats and most crucial, identical, incomplete and often even contradictory information. In this respect, duplicate detection methods are of paramount importance allowing to explore whether or not information having, e.g., different origins or different observation times concern one and the same real-world object.

Read more

Tagging of name records for genealogical data browsing

Thu, 01/08/2009 - 10:42 — cat

Authors:

Perrow, Mike; Barber, David

Year:

2008

Venue:

Proc. 6th ACM/IEEE-CS joint conference on Digital libraries

In this paper we present a method of parsing unstructured textual records briefly describing a person and their direct relatives, which we use in the construction of a browsing tool for genealogical data. The records have been created by researchers who are currently digitising a collection of historical archives stored at the Abbaye de Saint-Maurice, Switzerland. The string 'Beatrix, daughter of Johannes Trona, of Saillon' is a typical example of a record. We wish to annotate every term (word and symbol) in our records with a label which describes whether the term is a name (e.g.

Automatic Training Example Selection for Scalable Unsupervised Record Linkage

Tue, 05/20/2008 - 09:56 — koepcke

Authors:

Christen, Peter

Year:

2008

Venue:

PAKDD

Linking records from two or more databases is becoming
increasingly important in the data preparation step of many data min-
ing projects, as linked data can enable analysts to conduct studies that
are not feasible otherwise, or that would require expensive and time-
consuming collection of specific data. The aim of such linkages is to match
all records that refer to the same entity. One of the main challenges in
record linkage is the accurate classification of record pairs into matches
and non-matches. With traditional techniques, classification thresholds

Read more

Learning Blocking Schemes for Record Linkage

Wed, 03/19/2008 - 15:26 — koepcke

Authors:

Michelson, Matthew; Knoblock, Craig A.

Year:

2006

Venue:

AAAI

Record linkage is the process of matching records across data
sets that refer to the same entity. One issue within record
linkage is determining which record pairs to consider, since
a detailed comparison between all of the records is impractical.
Blocking addresses this issue by generating candidate
matches as a preprocessing step for record linkage. For example,
in a person matching problem, blocking might return
all people with the same last name as candidate matches. Two
main problems in blocking are the selection of attributes for

Read more

Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

Thu, 02/28/2008 - 03:49 — mbilenko

Authors:

Bilenko, Mikhail; Basu, Sugato; Sahami, Mehran

Year:

2005

Venue:

Fifth IEEE International Conference on Data Mining (ICDM'05)

The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data.

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Same, Same but Different: A Survey on Duplicate Detection Methods for Situation Awareness

Tagging of name records for genealogical data browsing

Automatic Training Example Selection for Scalable Unsupervised Record Linkage

Learning Blocking Schemes for Record Linkage

Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

User login