Learning Object Identification Rules for Information Integration

Authors: 
Tejada, S
Author: 
Tejada, S
Year: 
2002
Venue: 
Ph.D. Thesis, University of Southern California's Information Sciences Institute, Los Angeles, 2002
URL: 
http://www.isi.edu/info-agents/papers/tejada02-thesis.pdf
Citations: 
219
Citations range: 
100 - 499
AttachmentSize
Tejada2002LearningObjectIdentification.pdf447.82 KB

When integrating information from multiple websites, the same data objects can
exist in inconsistent text formats across sites, making it di±cult to identify match-
ing objects using exact text match. We have developed an object identi¯cation
system called Active Atlas, which compares the objects' shared attributes in order
to identify matching objects. Certain attributes are more important for decid-
ing if a mapping should exist between two objects. Previous methods of object
identi¯cation have required manual construction of object identi¯cation rules or
mapping rules for determining the mappings between objects, as well as domain-
dependent transformations for recognizing format inconsistencies. This manual
process is time consuming and error-prone. In our approach, Active Atlas learns
to simultaneously tailor both mapping rules and a set of general transformations
to a speci¯c application domain, through limited user input. The experimen-
tal results demonstrate that we achieve higher accuracy and require less user
involvement than previous methods across various application domains.