Source-aware entity matching: A compositional approach

Shen, W.; DeRose, P.; Vu, L.; Doan, A.; Ramakrishnan, R.
Shen, W
DeRose, P
Vu, L
Doan, A
Ramakrishnan, R
Proceedings of ICDE 2007
Citations range: 
10 - 49
Shen2007SourceawareentitymatchingA.pdf273.92 KB

Entity matching (a.k.a. record linkage) plays a crucial
role in integrating multiple data sources, and numerous
matching solutions have been developed. However, the solutions
have largely exploited only information available in
the mentions and employed a single matching technique.
We show how to exploit information about data sources
to significantly improve matching accuracy. In particular,
we observe that different sources often vary substantially
in their level of semantic ambiguity, thus requiring different
matching techniques. In addition, it is often beneficial
to group and match mentions in related sources first, before
considering other sources. These observations lead
to a large space of matching strategies, analogous to the
space of query evaluation plans considered by a relational
optimizer. We propose viewing entity matching as a composition
of basic steps into a “match execution plan”. We
analyze formal properties of the plan space, and show how
to find a good match plan. To do so, we employ ideas from
social network analysis to infer the ambiguity and relatedness
of data sources. We conducted extensive experiments
on several real-world data sets on the Web and in the domain
of personal information management (PIM). The results
show that our solution significantly outperforms current
best matching methods.