An Entity Resolution Framework for Deduplicating Proteins

Guided search

Click a term to initiate a search.

Keyword search

An Entity Resolution Framework for Deduplicating Proteins

Fri, 02/27/2009 - 11:25 — cat

Authors:

Lochovsky, L; Topaloglou, T

Author:

Lochovsky, L

Topaloglou, T

Year:

2008

Venue:

Lecture Notes in Computer Science

URL:

http://www.springerlink.com/index/g8w144u643570581.pdf

Citations:

Citations range:

n/a

Attachment	Size
Lochovsky2008AnEntityResolutionFrameworkforDeduplicatingProteins.pdf	1.13 MB

An important prerequisite to successfully integrating protein data is detecting duplicate records spread across different databases. In this paper, we describe a new framework for protein entity resolution, called PERF, which deduplicates protein mentions using a wide range of protein attributes. A mention refers to any recorded information about a protein, whether it is derived from a database, a high-throughput study, or literature text mining, among others. PERF can be easily extended to deduplicate protein-protein interactions (PPIs) as well. This framework translates mentions into instances of a reference schema to facilitate mention comparisons. PERF also uses "virtual attribute dependencies" to "enhance" mentions with additional attribute values. PERF computes a likelihood measure based upon the textual value similarity of mention attributes. A prototype implementation of the framework was tested, and these tests indicate that PERF can clearly separate duplicate mentions from non-duplicate mentions.

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

An Entity Resolution Framework for Deduplicating Proteins

Related categories

User login