Learning object identification rules for information integration

Authors: 
Tejada, S; Knoblock, CA; Minton, S
Author: 
Tejada, S
Knoblock, C
Minton, S
Year: 
2001
Venue: 
Information Systems
URL: 
http://www.isi.edu/integration/papers/tejada01-is.pdf
DOI: 
http://dx.doi.org/10.1016/S0306-4379(01)00042-4
Citations: 
219
Citations range: 
100 - 499
AttachmentSize
Tejada2001Learningobjectidentificationrulesforinformation.pdf335.16 KB

When integrating information from multiple websites, the same data objects can exist in inconsistent text formats
across sites, making it difficult to identify matching objects using exact text match. We have developed an object
identification system called Active Atlas, which compares the objects’ shared attributes in order to identify matching
objects. Certain attributes are more important for deciding if a mapping should exist between two objects. Previous
methods of object identification have required manual construction of object identification rules or mapping rules for
determining the mappings between objects. This manual process is time consuming and error-prone. In our approach.
Active Atlas learns to tailor mapping rules, through limited user input, to a specific application domain. The
experimental results demonstrate that we achieve higher accuracy and require less user involvement than previous
methods across various application domains.