Today, growing datasets require new technologies as standard tech-
nologies — such as parallel DBMSs — do not easily scale to such
level. On the one side, there is the MapReduce paradigm allow-
ing non-expert users to easily define large distributed jobs. On the
other side, there is Cloud Computing providing a pay-as-you-go
infrastructure for such computations. This PhD project aims at im-
proving the combination of both technologies, especially for the
following issues: (i) predictability of performance, (ii) runtime op-
timization and (iii) Cloud-aware scheduling. These issues can re-
sult in significant runtime overhead or non-optimal use of comput-
ing resources, which in a Cloud setting directly correlates to high
monetary cost. We present preliminary results that confirm a signif-
icant improvement on performance when addressing some of these
issues. Further, we discuss research challenges and initial ideas for
above mentioned issues.
Attachment | Size |
---|---|
Schad2010FlyingYellowElephantPredictableandEfficientMapReducein.pdf | 473.57 KB |