index

Keyword search

Guided search

Click a term to initiate a search.

The Performance of MapReduce: An in-depth Study

Tue, 12/21/2010 - 16:44 — kolb

Authors:

Jiang, D; Ooi, BC; Shi, L; Wu, S

Large-scale data analysis has become increasingly impor-
tant for many enterprises. Recently, a new distributed com-
puting paradigm, called MapReduce, and its open source
implementation Hadoop, has been widely adopted due to
its impressive scalability and ﬂexibility to handle structured
as well as unstructured data. In this paper, we describe
our data warehouse system, called Cheetah, built on top of
MapReduce. Cheetah is designed speciﬁcally for our online
advertising application to allow various simpliﬁcations and
custom optimizations. First, we take a fresh look at the data

Year:

2010

Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

Tue, 12/21/2010 - 16:31 — kolb

Authors:

Dittrich, J; Quiane-Ruiz, J; Jindal, A; Kargin, Y; Setty, V; Schad, J

MapReduce is a computing paradigm that has gained a lot of at-
tention in recent years from industry and research. Unlike paral-
lel DBMSs, MapReduce allows non-expert users to run complex
analytical tasks over very large data sets on very large clusters
and clouds. However, this comes at a price: MapReduce pro-
cesses tasks in a scan-oriented fashion. Hence, the performance of
Hadoop — an open-source implementation of MapReduce — often
does not match the one of a well-conﬁgured parallel DBMS. In this
paper we propose a new type of system named Hadoop++: it boosts

Year:

2010

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

index

The Performance of MapReduce: An in-depth Study

Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)

Navigation

User login