csail.mit.edu

Column-Stores vs. Row-Stores: How different are they really?

Fri, 01/08/2010 - 16:07 — kolb

Authors:

Abadi, Daniel J.; Madden, Samuel R.; Hachem, Nabil

There has been a signiﬁcant amount of excitement and recent work
on column-oriented database systems (“column-stores”). These
database systems have been shown to perform more than an or-
der of magnitude better than traditional row-oriented database sys-
tems (“row-stores”) on analytical workloads such as those found in
data warehouses, decision support, and business intelligence appli-
cations. The elevator pitch behind this performance difference is
straightforward: column-stores are more I/O efﬁcient for read-only
queries since they only have to read from disk (or from memory)

Year:

2008

Column-stores for wide and sparse data

Fri, 01/08/2010 - 16:04 — kolb

Authors:

Abadi, Daniel J.

ABSTRACT
While it is generally accepted that data warehouses and
OLAP workloads are excellent applications for column-stores,
this paper speculates that column-stores may well be suited
for additional applications. In particular we observe that
column-stores do not see a performance degradation when
storing extremely wide tables, and column-stores handle sparse
data very well. These two properties lead us to conjecture
that column-stores may be good storage layers for Semantic
Web data, XML data, and data with GEM-style schemas.

Year:

2007

csail.mit.edu

A comparison of approaches to large-scale data analysis

Mon, 10/19/2009 - 09:43 — admin

Authors:

Pavlo, Andrew; Paulson, Erik; Rasin, Alexander; Abadi, Daniel J.; DeWitt, David J.; Madden, Samuel; Stonebraker, Michael

There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control ﬂow of this framework has existed in parallel SQL
database management systems (DBMS) for over 20 years, some
have called MR a dramatically new computing model [8, 17]. In
this paper, we describe and compare both paradigms. Furthermore,
we evaluate both kinds of systems in terms of performance and de-
velopment complexity. To this end, we deﬁne a benchmark con-
sisting of a collection of tasks that we have run on an open source

Year:

2009

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Column-Stores vs. Row-Stores: How different are they really?

Column-stores for wide and sparse data

A comparison of approaches to large-scale data analysis

Navigation

User login