We consider the coverage testing problem where we are given a document and a corpus with a limited query interface and asked to find if the corpus contains a near-duplicate of the document. This problem has applications in search engines for competitive coverage testing. To solve this problem, we propose approaches that work in three main steps: generate a query signature from the document, query the corpus using the query signature and scrape the returned results, and validate the similarity between the input document and the returned results.
MashMaker: Mashups for the Masses
Rob Ennals
Intel Research Berkeley
Minos Garofalakis minos@yahoo-inc.com
Yahoo Research
Categories and Subject Descriptors: H.4.3 [Information Systems Applications]: Information Browsers General Terms: Management, Design, Human Factors, Languages Keywords: Mashup, web, end-users.
Info Toggle Search Box Search Results
1.
INTRODUCTION
Properties Property Children Arg Formula Selected Widget Properties View
MashMaker is an interactive tool for editing, querying, manipulating, and visualizing "live" semi-structured data.