Building a High-Level Dataﬂow System on top of Map-Reduce: The Pig Experience

Keyword search

Guided search

Click a term to initiate a search.

Building a High-Level Dataﬂow System on top of Map-Reduce: The Pig Experience

Tue, 04/20/2010 - 11:53 — admin

Authors:

Gates, Alan F.; Natkovich, Olga; Chopra, Shubham; Kamath, Pradeep; Narayanamurthy, Shravan M.; Olston, Christopher; Reed, Benjamin; Srinivasan, Santhosh; Srivastava, Utkarsh

Author:

Reed, B

Srinivasan, S

Srivastava, U

Olston, C

Narayanamurthy, S

Chopra, S

Kamath, P

Gates, A

Natkovich, O

Increasingly, organizations capture, transform and analyze
enormous data sets. Prominent examples include internet
companies and e-science. The Map-Reduce scalable dataﬂow
paradigm has become popular for these applications. Its
simple, explicit dataﬂow programming model is favored by
some over the traditional high-level declarative approach:
SQL. On the other hand, the extreme simplicity of Map-
Reduce leads to much low-level hacking to deal with the
many-step, branching dataﬂows that arise in practice. More-
over, users must repeatedly code standard operations such
as join by hand. These practices waste time, introduce bugs,
harm readability, and impede optimizations.
Pig is a high-level dataﬂow system that aims at a sweet
spot between SQL and Map-Reduce. Pig oﬀers SQL-style
high-level data manipulation constructs, which can be as-
sembled in an explicit dataﬂow and interleaved with custom
Map- and Reduce-style functions or executables. Pig pro-
grams are compiled into sequences of Map-Reduce jobs, and
executed in the Hadoop Map-Reduce environment. Both Pig
and Hadoop are open-source projects administered by the
Apache Software Foundation.
This paper describes the challenges we faced in develop-
ing Pig, and reports performance comparisons between Pig
execution and raw Map-Reduce execution.

Year:

2009

Venue:

VLDB 2009

URL:

http://www.vldb.org/pvldb/2/vldb09-1074.pdf

Citations range:

n/a

Attachment	Size
Reed2009BuildingaHighLevelDataowSystemontopofMapReduce.pdf	529.63 KB

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Building a High-Level Dataﬂow System on top of Map-Reduce: The Pig Experience

Navigation

Related categories

User login