i2r.a-star.edu.sg

Duplicate Detection in Biological Data using Association Rule Mining

Authors: 
Koh, JLY; Lee, ML; Khan, AM;Tan, PTJ ; Brusic, V
Year: 
2004
Venue: 
Proc. ECML/PKDD Workshop on Data Mining and Text Mining for Bioinformatics

Recent advancement in biotechnology has produced a massive
amount of raw biological data which are accumulating at an
exponential rate. Errors, redundancy and discrepancies are
prevalent in the raw data, and there is a serious need for
systematic approaches towards biological data cleaning. This
work examines the extent of redundancy in biological data and
proposes a method for detecting duplicates in biological data.
Duplicate relations in a real-world biological dataset are modeled
into forms of association rules so that these duplicate relations or

Syndicate content