Tuning Schema Matching Software using Synthetic Scenarios

Sayyadian, Mayssam; Lee, Yoonkyong; Doan, AnHai; Rosenthal, Arnon
VLDB 2005: 994-1005
Most recent schema matching systems assemble multiple components, each employing a
particular matching technique. The domain
user must then tune the system: select the
right component to be executed and correctly
adjust their numerous \"knobs\" (e.g., thresholds, formula coefficients). Tuning is skill- and
time-intensive, but (as we show) without it the
matching accuracy is significantly inferior.
We describe eTuner, an approach to automatically tune schema matching systems. Given
a schema S, we match S against synthetic
schemas, for which the ground truth mapping
is known, and find a tuning that demonstrably improves the performance of matching S
against real schemas. To efficiently search the
huge space of tuning configurations, eTuner
works sequentially, starting with tuning the
lowest level components. To increase the applicability of eTuner, we develop methods to
tune a broad range of matching components.
While the tuning process is completely automatic, eTuner can also exploit user assistance
(whenever available) to further improve the
tuning quality. We employed eTuner to tune
four recently developed matching systems on
several real-world domains. eTuner produced
tuned matching systems that achieve higher
accuracy than using the systems with cur-
rently possible tuning methods, at virtually
no cost to the domain user.