Data Cleaning: Problems and Current Approaches

Authors: 
Rahm, Erhard; Do, Hong Hai
Author: 
Rahm, E
Do, H
Year: 
2000
Venue: 
IEEE Data Engineering Bulletin
URL: 
http://www.acm.org/sigs/sigmod/disc/disc01/out/websites/deb_december/rahm.pdf
Citations: 
778
Citations range: 
500 - 999
AttachmentSize
Rahm2000DataCleaningProblemsand.pdf109.63 KB

We classify data quality problems that are addressed by data cleaning and provide an overview of the main
solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and
should be addressed together with schema-related data transformations. In data warehouses, data cleaning is
a major part of the so-called ETL process. We also discuss current tool support for data cleaning.