Increasingly, organizations capture, transform and analyze
enormous data sets. Prominent examples include internet
companies and e-science. The Map-Reduce scalable dataflow
paradigm has become popular for these applications. Its
simple, explicit dataflow programming model is favored by
some over the traditional high-level declarative approach:
SQL. On the other hand, the extreme simplicity of Map-
Reduce leads to much low-level hacking to deal with the
many-step, branching dataflows that arise in practice. More-
over, users must repeatedly code standard operations such
The goal of this document is to illustrate the use of DryadLINQ parallel computation framework through
a set of examples. For each program we present the essential source code and a brief description. This
document does not describe the installation or configuration of DryadLINQ or the configuration
parameters which can be used to influence the compilation and execution. A non-commercial release of
the DryadLINQ research software is available for download at http://connect.microsoft.com/DryadLINQ.
DryadLINQ is a system and a set of language extensions
that enable a new programming model for large scale dis-
tributed computing. It generalizes previous execution en-
vironments such as SQL, MapReduce, and Dryad in two
ways: by adopting an expressive data model of strongly
typed .NET objects; and by supporting general-purpose
imperative and declarative operations on datasets within
a traditional high-level programming language.
A DryadLINQ program is a sequential program com-
posed of LINQ expressions performing arbitrary side-