SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets

Keyword search

Guided search

Click a term to initiate a search.

SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets

Wed, 04/14/2010 - 15:18 — kolb

Authors:

Chaiken, R; Jenkins, B; Larson, PÅ; Ramsey, B; Shakib, D; Weaver, S; Zhou, J

Author:

Chaiken, R

Jenkins, B

Larson, P

Ramsey, B

Shakib, D

Weaver, S

Zhou, J

Companies providing cloud-scale services have an increasing
need to store and analyze massive data sets such as search logs
and click streams. For cost and performance reasons, processing is
typically done on large clusters of shared-nothing commodity
machines. It is imperative to develop a programming model that
hides the complexity of the underlying system but provides flex-
ibility by allowing users to extend functionality to meet a variety
of requirements.
In this paper, we present a new declarative and extensible script-
ing language, SCOPE (Structured Computations Optimized for
Parallel Execution), targeted for this type of massive data analy-
sis. The language is designed for ease of use with no explicit par-
allelism, while being amenable to efficient parallel execution on
large clusters. SCOPE borrows several features from SQL. Data is
modeled as sets of rows composed of typed columns. The select
statement is retained with inner joins, outer joins, and aggregation
allowed. Users can easily define their own functions and imple-
ment their own versions of operators: extractors (parsing and con-
structing rows from a file), processors (row-wise processing),
reducers (group-wise processing), and combiners (combining
rows from two inputs). SCOPE supports nesting of expressions
but also allows a computation to be specified as a series of steps,
in a manner often preferred by programmers. We also describe
how scripts are compiled into efficient, parallel execution plans
and executed on large clusters.

Year:

2008

Venue:

VLDB 2008

URL:

http://portal.acm.org/citation.cfm?id=1454159.1454166

Citations:

Citations range:

n/a

Attachment	Size
Chaiken2008SCOPEEasyandEfficientParallelProcessingofMassiveData.pdf	434.2 KB

microsoft.com

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets

Navigation

Related categories

User login