Problems, Methods, and Challenges in Comprehensive Data Cleansing

Müller, Heiko; Freytag, Johann-Christoph
HUB-IB-164, Humboldt University Berlin

Cleansing data from impurities is an integral part of data processing and maintenance. This has lead to the development of a broad range of methods intending to enhance the accuracy and thereby the usability of existing data. This paper presents a survey of data cleansing problems, approaches, and methods. We classify the various types of anomalies occurring in data that have to be eliminated, and we define a set of quality criteria that comprehensively cleansed data has to accomplish.

Quality-driven Integration of Heterogeneous Information Systems

Naumann, F; Leser, U; Freytag, J
VLDB Conference

Integrated access to information that is spread
over multiple, distributed, and heterogeneous
sources is an important problem in many scientific
and commercial domains. While much
work has been done on query processing and
choosing plans under cost criteria, very little is
known about the important problem of incorporating
the information quality aspect into
query planning.
In this paper we describe a framework for
multidatabase query processing that fully includes
the quality of information in many
facets, such as completeness, timeliness, accuracy,

Syndicate content