PPJoin+

MapDupReducer: Detecting Near Duplicates over Massive Datasets

Authors: 
Wang, Chaokun; Wang, Jianmin; Lin, Xuemin; Wang, Wei, Wang, Haixun; Li, Hongsong; Tian, Wanpeng; Xu, Jun; Li, Rui

Near duplicate detection benefits many applications, e.g.,
on-line news selection over the Web by keyword search. The
purpose of this demo is to show the design and implemen-
tation of MapDupReducer, a MapReduce based system ca-
pable of detecting near duplicates over massive datasets ef-
ficiently.

Year: 
2010
Syndicate content