Abstract
MapReduce is emerging as an important programming
model for large-scale data-parallel applications such as
web indexing, data mining, and scientific simulation.
Hadoop is an open-source implementation of MapRe-
duce enjoying wide adoption and is often used for short
jobs where low response time is critical. Hadoop’s per-
formance is closely tied to its task scheduler, which im-
plicitly assumes that cluster nodes are homogeneous and
tasks make progress linearly, and uses these assumptions
to decide when to speculatively re-execute tasks that ap-