Ad-hoc data processing has proven to be a critical paradigm
for Internet companies processing large volumes of unstruc-
tured data. However, the emergence of cloud-based com-
puting, where storage and CPU are outsourced to multi-
ple third-parties across the globe, implies large collections
of highly distributed and continuously evolving data. Our
demonstration combines the power and simplicity of the
MapReduce abstraction with a wide-scale distributed stream
processor, Mortar. While our incremental MapReduce op-
erators avoid data re-processing, the stream processor man-
ages the placement and physical data flow of the operators
across the wide area. We demonstrate a distributed web
indexing engine against which users can submit and deploy
continuous MapReduce jobs. A visualization component il-
lustrates both the incremental indexing and index searches
in real time.
Attachment | Size |
---|---|
Logothetis2008Adhocdataprocessinginthecloud.pdf | 571.88 KB |