Processing theta-joins using MapReduce

Keyword search

Guided search

Click a term to initiate a search.

Processing theta-joins using MapReduce

Thu, 08/18/2011 - 11:22 — kolb

Authors:

Okcan, Alper; Riedewald, Mirek

Author:

Okcan, A

Riedewald, M

Joins are essential for many data analysis tasks, but are
not supported directly by the MapReduce paradigm. While
there has been progress on equi-joins, implementation of join
algorithms in MapReduce in general is not suﬃciently un-
derstood. We study the problem of how to map arbitrary
join conditions to Map and Reduce functions, i.e., a parallel
infrastructure that controls data ﬂow based on key-equality
only. Our proposed join model simpliﬁes creation of and
reasoning about joins in MapReduce. Using this model, we
derive a surprisingly simple randomized algorithm, called 1-
Bucket-Theta, for implementing arbitrary joins (theta-joins)
in a single MapReduce job. This algorithm only requires
minimal statistics (input cardinality) and we provide evi-
dence that for a variety of join problems, it is either close
to optimal or the best possible option. For some of the
problems where 1-Bucket-Theta is not the best choice, we
show how to achieve better performance by exploiting addi-
tional input statistics. All algorithms can be made ’memory-
aware’, and they do not require any modiﬁcations to the
MapReduce environment. Experiments show the eﬀective-
ness of our approach.

Year:

2011

Venue:

Sigmod 2011

URL:

http://www.ccs.neu.edu/home/okcan/pubs/sigmod11-okcan.pdf

Citations:

Citations range:

n/a

Attachment	Size
sigmod11-okcan.pdf	368.88 KB

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Processing theta-joins using MapReduce

Navigation

Related categories

User login