Venue:
Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, August, 1996
URL:
http://citeseer.ist.psu.edu/monge96field.html
To combine information from heterogeneous sources
equivalent data in the multiple sources must be identified.
This task is the field matching problem. Specifically,
the task is to determine whether or not. two syntactic
values are alternative designations of the same
semantic entity. For example the addresses Dept. of
Comput. Sci. (:real Eng. , University of California, San
Diego, 9500 Gilman Dr. Dept. 0111, La Jolla, (7.4
92093 and UCSD, Computer Science and Engineerirng
Department, CA 92093-01 11 do designate the salve
departntent. This paper describes three field matching
algorithms and evaluates their performance on
real-world datasets. One proposed method is the
well-known Smith-Waterman algorithm for colnparing
DNA and protein sequences. Several applications of
field matching in knowledge discovery' are described
briefly, including WEBFIND which is a new software
tool that discovers scientific papers published on the
worldwide web. WEBFIND uses external information
sources to guide its search for authors and papers.
Like many other worldwide web tools WEBFIND needs
to solve the field matching problems in order to navigate
between information sources.