centralized (n=1)

Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

Authors: 
Bilenko, Mikhail; Basu, Sugato; Sahami, Mehran
Year: 
2005
Venue: 
Fifth IEEE International Conference on Data Mining (ICDM'05)

The problem of record linkage focuses on determining whether two object descriptions refer to the same underlying entity. Addressing this problem effectively has many practical applications, e.g., elimination of duplicate records in databases and citation matching for scholarly articles. In this paper, we consider a new domain where the record linkage problem is manifested: Internet comparison shopping. We address the resulting linkage setting that requires learning a similarity function between record pairs from streaming data.

Managing the Quality of Person Names in DBLP

Authors: 
Reuther, P; Walter, B; Ley, M; Weber, A; Klink, S
Year: 
2006
Venue: 
Proc. ECDL, LNCS

Quality management is, not only for digital libraries, an important task in which many dimensions and different aspects have to be considered. The following paper gives a short overview on DBLP in which the data acquisition and maintenance process underlying DBLP is discussed from a quality point of view. The paper finishes with a new approach to identify erroneous person names.

XML Duplicate Detection Using Sorted Neighborhoods

Authors: 
Puhlmann, Sven; Weis, Melanie; Naumann, Felix
Year: 
2006
Venue: 
Conference on Extending Database Technology (EDBT) 2006

Detecting duplicates is a problem with a long tradition in many domains, such as customer relationship management and data warehousing. The problem is twofold: First define a suitable similarity measure, and second efficiently apply the measure to all pairs of objects. With the advent and pervasion of the XML data model, it is necessary to find new similarity measures and to develop efficient methods to detect duplicate elements in nested XML data.

Syndicate content