Venue:
Proc. International Conf. on Very Large Data Bases (VLDB)
URL:
http://www.vldb.org/conf/2001/P381.pdf
Cleaning data of errors in structure and content is important
for data warehousing and integration. Current
solutions for data cleaning involve many iterations of
data “auditing” to find errors, and long-running transformations
to fix them. Users need to endure long
waits, and often write complex transformation scripts.
We present Potter’s Wheel, an interactive data cleaning
system that tightly integrates transformation and
discrepancy detection. Users gradually build transformations
to clean the data by adding or undoing
transforms on a spreadsheet-like interface; the effect
of a transform is shown at once on records visible on
screen. These transforms are specified either through
simple graphical operations, or by showing the desired
effects on example data values. In the background,
Potter’sWheel automatically infers structures
for data values in terms of user-defined domains, and
accordingly checks for constraint violations. Thus
users can gradually build a transformation as discrepancies
are found, and clean the data without writing
complex programs or enduring long delays.