Increasingly, organizations capture, transform and analyze
enormous data sets. Prominent examples include internet
companies and e-science. The Map-Reduce scalable dataflow
paradigm has become popular for these applications. Its
simple, explicit dataflow programming model is favored by
some over the traditional high-level declarative approach:
SQL. On the other hand, the extreme simplicity of Map-
Reduce leads to much low-level hacking to deal with the
many-step, branching dataflows that arise in practice. More-
over, users must repeatedly code standard operations such