Identity uncertainty and citation matching

Guided search

Click a term to initiate a search.

Keyword search

Identity uncertainty and citation matching

Wed, 04/11/2007 - 15:20 — cat

Authors:

Pasula, H; Marthi, B; Milch, B; Russell, S; Shpitser, I

Author:

Pasula, H

Marthi, B

Milch, B

Russell, S

Shpitser, I

Year:

2003

Venue:

Advances in Neural Information Processing (NIPS)

URL:

http://people.csail.mit.edu/milch/papers/nipsnewer.pdf

Citations:

267

Citations range:

100 - 499

Attachment	Size
Pasula2003Identityuncertaintyand.pdf	97.5 KB

Identity uncertainty is a pervasive problem in real-world data analysis. It
arises whenever objects are not labeled with unique identifiers or when
those identifiers may not be perceived perfectly. In such cases, two observations
may or may not correspond to the same object. In this paper,
we consider the problem in the context of citation matching—the problem
of deciding which citations correspond to the same publication. Our
approach is based on the use of a relational probability model to define
a generative model for the domain, including models of author and title
corruption and a probabilistic citation grammar. Identity uncertainty is
handled by extending standard models to incorporate probabilities over
the possible mappings between terms in the language and objects in the
domain. Inference is based on Markov chain Monte Carlo, augmented
with specific methods for generating efficient proposals when the domain
contains many objects. Results on several citation data sets show that
the method outperforms current algorithms for citation matching. The
declarative, relational nature of the model also means that our algorithm
can determine object characteristics such as author names by combining
multiple citations of multiple papers.

cs.berkeley.edu

websearch

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

Identity uncertainty and citation matching

Related categories

User login