Unsupervised Personal Name Disambiguation

Authors: 
Mann, GS; Yarowsky, D
Author: 
Mann, G
Yarowsky, D
Year: 
2003
Venue: 
Proc. 7th Conf. on Natural language learning
URL: 
http://portal.acm.org/citation.cfm?id=1119181
Citations: 
283
Citations range: 
100 - 499
AttachmentSize
Mann2003UnsupervisedPersonalName.pdf121.99 KB

This paper presents a set of algorithms for distinguishing personal names with multiple real referents in text, based on little or no supervision. The approach utilizes an unsupervised clustering technique over a rich feature space of biographic facts, which are automatically extracted via a language-independent bootstrapping process. The induced clustering of named entities are then partitioned and linked to their real referents via the automatically extracted biographic data. Performance is evaluated based on both a test set of handlabeled multi-referent personal names and via automatically generated pseudonames.