Comet: Batched Stream Processing in Data Intensive Distributed Computing

Keyword search

Guided search

Click a term to initiate a search.

Comet: Batched Stream Processing in Data Intensive Distributed Computing

Mon, 04/26/2010 - 09:46 — admin

Authors:

He, B; Yang, M; Guo, Z; Chen, R; Su, B; Lin, W; Zhou, L

Author:

He, B

Yang, M

Guo, Z

Chen, R

Su, B

Lin, W

Zhou, L

Performance and resource optimization is an important
research problem in data intensive distributed comput-
ing. We present a new batched stream processing model
that captures query correlations to expose I/O and com-
putation redundancies for optimizations. The model is
inspired by our empirical study on a trace from a pro-
duction large-scale data processing cluster, which reveals
signiﬁcant redundancies caused by strong temporal and
spatial correlations among queries.
We have developed Comet, a query processing
system that embraces the batched stream processing
model for optimizations. We have integrated Comet with
DryadLINQ. With its roots in query optimizations for
database systems, Comet enables a set of new heuristics
and opportunities tailored for distributed computing in
DryadLINQ. Optimizations in Comet are effective. The
evaluation of a micro-benchmark on a 40-machine clus-
ter shows a 42% reduction in total machine time and over
40% reduction in total I/O. Our simulation on a real trace
covering over 19 million machine hours shows an esti-
mated I/O saving of over 50%.

Year:

2008

Venue:

SoCC'10

URL:

http://research.microsoft.com/pubs/117830/Comet.pdf

Citations:

Citations range:

n/a

Attachment	Size
He2008CometBatchedStreamProcessinginDataIntensiveDistributed.pdf	579.16 KB

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Comet: Batched Stream Processing in Data Intensive Distributed Computing

Navigation

Related categories

User login