Methods for linking and mining massive heterogeneous databases

Authors: 
Pinheiro, J.C.; Sun, D.X.
Author: 
Pinheiro, J
Sun, D
Year: 
1998
Venue: 
Fourth International conference on Knowledge Discovery and Data Mining, 1998
URL: 
http://cm.bell-labs.com/stat/dxsun/papers/pdf/kdd98.pdf
Citations: 
33
Citations range: 
10 - 49
AttachmentSize
Pinheiro1998Methodsforlinkingandmining.pdf143.98 KB

Many real-world KDD expeditions involve investigation of relationships between variables in
different, heterogeneous databases. We present
a dynamic programming technique for linking
records in multiple heterogeneous databases using loosely defined fields that allow free-style verbatim entries. We develop an interestingness
measure based on non-parametric randomization
tests, which can be used for mining potentially
useful relationships among variables. This mea-
sure uses distributional characteristics of historical events, hence accommodating variable-length
records in a natural way. As an illustration, we
include a successful application of the proposed
methodology to a real-world data mining problem
at Lucent Technologies.