Automatic Training Example Selection for Scalable Unsupervised Record Linkage

Christen, Peter

Linking records from two or more databases is becoming
increasingly important in the data preparation step of many data min-
ing projects, as linked data can enable analysts to conduct studies that
are not feasible otherwise, or that would require expensive and time-
consuming collection of specific data. The aim of such linkages is to match
all records that refer to the same entity. One of the main challenges in
record linkage is the accurate classification of record pairs into matches
and non-matches. With traditional techniques, classification thresholds

Febrl - A freely available record linkage system with a graphical user interface

Christen, Peter
Australasian Workshop Health Data and Knowledge Management

Record or data linkage is an important enabling tech-
nology in the health sector, as linked data is a cost-
effective resource that can help to improve research
into health policies, detect adverse drug reactions, re-
duce costs, and uncover fraud within the health sys-
tem. Significant advances, mostly originating from
data mining and machine learning, have been made
in recent years in many areas of record linkage tech-
niques. Most of these new methods are not yet im-
plemented in current record linkage systems, or are
hidden within ‘black box’ commercial software. This

Learning object identification rules for information integration

Tejada, S; Knoblock, CA; Minton, S
Information Systems

When integrating information from multiple websites, the same data objects can exist in inconsistent text formats
across sites, making it difficult to identify matching objects using exact text match. We have developed an object
identification system called Active Atlas, which compares the objects’ shared attributes in order to identify matching
objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous
methods of object identification have required manual construction of object identification rules or mapping rules for

Example-driven Design of Efficient Record Matching Queries

Chaudhuri, Surajit;Chen, Bee-Chung;Ganti, Venkatesh;Kaushik, Raghav

Record matching is the task of identifying records that match the same real world entity. This is a problem of great significance for a variety of business intelligence applications. Implementations of record matching rely on exact as well as approximate string matching (e.g., edit distances) and use of external reference data sources. Record matching can be viewed as a query composed of a small set of primitive operators. However, formulating such record matching queries is difficult and depends on the specific application scenario.

Syndicate content