Data Cleaning for Decision Support

Authors: 
Benedikt, M.; Bohannon, P.; Bruns, G.
Author: 
Benedikt, M
Bohannon, P
Bruns, G
Year: 
2006
Venue: 
Clean DB, 2006
URL: 
http://pike.psu.edu/cleandb06/papers/CameraReady_119.pdf
Citations: 
6
Citations range: 
1 - 9
AttachmentSize
Benedikt2006DataCleaningforDecision.pdf187.25 KB

Data cleaning may involve the acquisition, at
some effort or expense, of high-quality data.
Such data can serve not only to correct individual
errors, but also to improve the reliability
model for data sources. However, there
has been little research into this latter role for
acquired data. In this short paper we define
a new data cleaning model that allows a user
to estimate the value of further data acquisition
in the face of specific business decisions.
As data is acquired, the reliability model of
sources is updated using Bayesian techniques,
thus aiding the user in both developing reasonable
probability models for uncertain data
and in improving the quality of that data. Although
we do not deal here with the problem
of finding optimal methods for utilizing external
data sources, we do show how our formalization
reduces cleaning to a well-studied
optimization problem.