Performance and resource optimization is an important
research problem in data intensive distributed comput-
ing. We present a new batched stream processing model
that captures query correlations to expose I/O and com-
putation redundancies for optimizations. The model is
inspired by our empirical study on a trace from a pro-
duction large-scale data processing cluster, which reveals
significant redundancies caused by strong temporal and
spatial correlations among queries.
We have developed Comet, a query processing
system that embraces the batched stream processing