Data Cleansing: Beyond Integrity Analysis

Authors: 
Maletic, J.I.; Marcus, A.
Author: 
Maletic, J
Marcus, A
Year: 
2000
Venue: 
Proceedings of the Conference on Information Quality
URL: 
http://www.sdml.info/papers/IQ2000.pdf
Citations: 
138
Citations range: 
100 - 499
AttachmentSize
Maletic2000DataCleansingBeyond.pdf42.93 KB

The paper analyzes the problem of data cleansing and automatically identifying
potential errors in data sets. An overview of the diminutive amount of existing literature
concerning data cleansing is given. Methods for error detection that go beyond integrity
analysis are reviewed and presented. The applicable methods include: statistical outlier
detection, pattern matching, clustering, and data mining techniques. Some brief results
supporting the use of such methods are given. The future research directions necessary to
address the data cleansing problem are discussed.