Timely and cost-effective processing of large datasets has become
a critical ingredient for the success of many academic, govern-
ment, and industrial organizations. The combination of MapRe-
duce frameworks and cloud computing is an attractive proposition
for these organizations. However, even to run a single program
in a MapReduce framework, a number of tuning parameters have
to be set by users or system administrators. Users often run into
performance problems because they don’t know how to set these
parameters, or because they don’t even know that these parame-
ters exist. With MapReduce being a relatively new technology, it
is not easy to find qualified administrators. In this position paper,
we make a case for techniques to automate the setting of tuning
parameters for MapReduce programs. The objective is to provide
good out-of-the-box performance for ad hoc MapReduce programs
run on large datasets. This feature can go a long way towards im-
proving the productivity of users who lack the skills to optimize
programs themselves due to lack of familiarity with MapReduce or
with the data being processed.
Attachment | Size |
---|---|
Babu2010TowardsAutomaticOptimizationofMapReducePrograms.pdf | 154.34 KB |