In recent years Cloud Computing has emerged as a promis-
ing new approach for ad-hoc parallel data processing. Major
cloud computing companies have started to integrate frame-
works for parallel data processing in their product portfolio,
making it easy for customers to access these services and to
deploy their programs. However, the processing frameworks
which are currently used stem from the field of cluster com-
puting and disregard the particular nature of a cloud. As a
result, the allocated compute resources may be inadequate
for big parts of the submitted job and unnecessarily increase
processing time and cost. In this paper we discuss the oppor-
tunities and challenges for efficient parallel data processing
in clouds and present our ongoing research project Nephele.
Nephele is the first data processing framework to explicitly
exploit the dynamic resource allocation offered by today’s
compute clouds for both, task scheduling and execution. It
allows assigning the particular tasks of a processing job to
different types of virtual machines and takes care of their in-
stantiation and termination during the job execution. Based
on this new framework, we perform evaluations on a com-
pute cloud system and compare the results to the existing
data processing framework Hadoop.
Attachment | Size |
---|---|
Warneke2009NepheleEfcientParallelDataProcessingintheCloud.pdf | 250.08 KB |