cse.psu.edu

CiteSeerX: an Architecture and Web Service Design for an Academic Document Search Engine

Tue, 02/06/2007 - 17:04 — Anonymous

Authors:

Li, Huajing; Councill, Isaac; Lee, Wang-Chien; Giles, C. Lee

Year:

2006

Venue:

15th International World Wide Web Conference (WWW2006):(poster) 2006

CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the field of computer and information science. After serving as a public search engine for nearly ten years, CiteSeer is starting to have scaling problems for handling of more documents, adding new feature and more users. Its monolithic architecture design prevents it from effectively making use of new web technologies and providing new services. After analyzing the current system problems, we propose a new architecture and data model, CiteSeerx.

Learning metadata from the evidence in an on-line citation matching scheme

Tue, 02/06/2007 - 17:03 — Anonymous

Authors:

Councill, Isaac G.; Li, Huajing; Zhuang, Ziming; Debnath, Sandip; Bolelli, Levent; Lee, Wang-Chien; Sivasubramaniam, Anand; Giles, C. Lee

Year:

2006

Venue:

Joint Conference on Digital Libraries 2006 (JCDL 2006): 276-285, 2006

Citation matching, or the automatic grouping of bibliographic
references that refer to the same document, is a data management
problem faced by automatic digital libraries for scientific
literature such as CiteSeer and Google Scholar. Although several
solutions have been offered for citation matching in large
bibliographic databases, these solutions typically require
expensive batch clustering operations that must be run offline.
Large digital libraries containing citation information can reduce
maintenance costs and provide new services through efficient

Clustering Scientific Literature Using Sparse Citation Graph Analysis

Tue, 02/06/2007 - 17:03 — Anonymous

Authors:

Bolelli, Levent; Ertekin, Seyda; Giles, C. Lee

Year:

2006

Venue:

10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2006): 30-41, 2006

Group Linkage

Tue, 02/06/2007 - 16:16 — Anonymous

Authors:

On, Byung-Won; Koudas, Nick; Lee, Dongwon; Srivastava, Divesh

Year:

2007

Venue:

ICDE

Poor quality data is prevalent in databases due to a variety
of reasons, including transcription errors, lack of standards
for recording database fields, etc. To be able to query
and integrate such data, considerable recent work has focused
on the record linkage problem, i.e., determine if two
entities represented as relational records are approximately
the same. Often entities are represented as groups of relational
records, rather than individual relational records,
e.g., households in a census survey consist of a group of persons.

Data Cleaning publication categorizer

Guided search

Data Cleaning

Data sets

Data type

Paper type

Venue type

Author

Year

mailpart

Citations range

Keyword search

CiteSeerX: an Architecture and Web Service Design for an Academic Document Search Engine

Learning metadata from the evidence in an on-line citation matching scheme

Clustering Scientific Literature Using Sparse Citation Graph Analysis

Group Linkage

User login