Map-reduce-merge: simplified relational data processing on large clusters

Authors: 
Yang, Hung-chih; Dasdan, Ali; Hsiao, Ruey-Lung; Parker, D. Stott
Author: 
Yang, H
Dasdan, A
Hsiao, R
Parker, D

Map-Reduce is a programming model that enables easy de-
velopment of scalable parallel applications to process vast
amounts of data on large clusters of commodity machines.
Through a simple interface with two functions, map and re-
duce, this model facilitates parallel implementation of many
real-world tasks such as data processing for search engines
and machine learning.
However, this model does not directly support processing
multiple related heterogeneous datasets. While processing
relational data is a common need, this limitation causes dif-
ficulties and/or inefficiency when Map-Reduce is applied on
relational operations like joins.
We improve Map-Reduce into a new model called Map-
Reduce-Merge. It adds to Map-Reduce a Merge phase that
can efficiently merge data already partitioned and sorted (or
hashed) by map and reduce modules. We also demonstrate
that this new model can express relational algebra operators
as well as implement several join algorithms.

Year: 
2007
Venue: 
SIGMOD 2007
URL: 
http://portal.acm.org/ft_gateway.cfm?id=1247602&type=pdf
Citations: 
0
Citations range: 
n/a