A comparison of approaches to large-scale data analysis

Keyword search

Guided search

Click a term to initiate a search.

A comparison of approaches to large-scale data analysis

Mon, 10/19/2009 - 09:43 — admin

Authors:

Pavlo, Andrew; Paulson, Erik; Rasin, Alexander; Abadi, Daniel J.; DeWitt, David J.; Madden, Samuel; Stonebraker, Michael

Author:

DeWitt, D

Pavlo, A

Stonebraker, M

Madden, S

Paulson, E

Rasin, A

Abadi, D

There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control ﬂow of this framework has existed in parallel SQL
database management systems (DBMS) for over 20 years, some
have called MR a dramatically new computing model [8, 17]. In
this paper, we describe and compare both paradigms. Furthermore,
we evaluate both kinds of systems in terms of performance and de-
velopment complexity. To this end, we deﬁne a benchmark con-
sisting of a collection of tasks that we have run on an open source
version of MR as well as on two parallel DBMSs. For each task,
we measure each system’s performance for various degrees of par-
allelism on a cluster of 100 nodes. Our results reveal some inter-
esting trade-offs. Although the process to load data into and tune
the execution of parallel DBMSs took much longer than the MR
system, the observed performance of these DBMSs was strikingly
better. We speculate about the causes of the dramatic performance
difference and consider implementation concepts that future sys-
tems should take from both kinds of architectures.

Year:

2009

Venue:

SIGMOD 2009

URL:

http://portal.acm.org/citation.cfm?id=1559865

Citations:

Citations range:

n/a

Attachment	Size
p165-pavlo.pdf	470.62 KB

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

A comparison of approaches to large-scale data analysis

Navigation

Related categories

User login