Data cleaning in microsoft SQL server 2005

Guided search

Click a term to initiate a search.

Keyword search

Data cleaning in microsoft SQL server 2005

Wed, 09/13/2006 - 15:11 — cat

Authors:

Chaudhuri, S.; Ganjam, K.; Ganti, V.; Kapoor, R.; Narasayya, V.; Vassilakis, T.

Author:

Chaudhuri, S

Ganjam, K

Ganti, V

Kapoor, R

Narasayya, V

Vassilakis, T

Year:

2005

Venue:

Proc. ACM SIGMOD 2005 (Demo)

URL:

http://portal.acm.org/citation.cfm?id=1066287

Citations:

Citations range:

10 - 49

Attachment	Size
Chaudhuri2005DatacleaninginmicrosoftSQL.pdf	320.31 KB

When collecting and combining data from various sources into a data warehouse, ensuring high data quality and consistency becomes a significant, often expensive, challenge. Common data quality problems include inconsistent data conventions amongst sources such as different abbreviations or synonyms; data entry errors such as spelling mistakes; missing, incomplete, outdated or otherwise incorrect attribute values. These data defects generally manifest themselves as foreign-key mismatches and approximately duplicate records, both of which make further data mining and decision support analyses either impossible or suspect. We demonstrate two new data cleansing operators, Fuzzy Lookup and Fuzzy Grouping, which address these problems in a scalable and domain-independent manner. These operators are implemented within Microsoft SQL Server 2005 Integration Services. Our demo will explain their functionality and highlight multiple real-world scenarios in which they can be used to achieve high data quality.

microsoft.com

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Data cleaning in microsoft SQL server 2005

Related categories

User login