Duplicate Detection in Biological Data using Association Rule Mining

Guided search

Click a term to initiate a search.

Keyword search

Duplicate Detection in Biological Data using Association Rule Mining

Wed, 03/21/2007 - 17:32 — cat

Authors:

Koh, JLY; Lee, ML; Khan, AM;Tan, PTJ ; Brusic, V

Author:

Koh, J

Lee, M

Khan, A

Tan, P

Brusic, V

Year:

2004

Venue:

Proc. ECML/PKDD Workshop on Data Mining and Text Mining for Bioinformatics

URL:

http://informatik.hu-berlin.de/Forschung_Lehre/wm/ws04/5.pdf

Citations:

Citations range:

10 - 49

Attachment	Size
Koh2004DuplicateDetectionin.pdf	445.64 KB

Recent advancement in biotechnology has produced a massive
amount of raw biological data which are accumulating at an
exponential rate. Errors, redundancy and discrepancies are
prevalent in the raw data, and there is a serious need for
systematic approaches towards biological data cleaning. This
work examines the extent of redundancy in biological data and
proposes a method for detecting duplicates in biological data.
Duplicate relations in a real-world biological dataset are modeled
into forms of association rules so that these duplicate relations or
rules can be induced from data with known duplicates using
association rule mining. Our approach of using association rule
induction to find duplicate relations is new. Evaluation of our
method on a real-world dataset shows that our duplicate
association rules can accurately identify up to 96.8% of the
duplicates in the dataset at the accuracy of 0.3% false positives
and 0.0038% false negatives.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Duplicate Detection in Biological Data using Association Rule Mining

Related categories

User login