Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

Keyword search

Guided search

Click a term to initiate a search.

Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

Tue, 12/21/2010 - 16:37 — kolb

Authors:

Chen, Songting

Author:

Chen, S

Large-scale data analysis has become increasingly impor-
tant for many enterprises. Recently, a new distributed com-
puting paradigm, called MapReduce, and its open source
implementation Hadoop, has been widely adopted due to
its impressive scalability and ﬂexibility to handle structured
as well as unstructured data. In this paper, we describe
our data warehouse system, called Cheetah, built on top of
MapReduce. Cheetah is designed speciﬁcally for our online
advertising application to allow various simpliﬁcations and
custom optimizations. First, we take a fresh look at the data
warehouse schema design. In particular, we deﬁne a virtual
view on top of the common star or snowﬂake data warehouse
schema. This virtual view abstraction not only allows us to
design a SQL-like but much more succinct query language,
but also makes it easier to support many advanced query
processing features. Next, we describe a stack of optimiza-
tion techniques ranging from data compression and access
method to multi-query optimization and exploiting materi-
alized views. In fact, each node with commodity hardware in
our cluster is able to process raw data at 1GBytes/s. Lastly,
we show how to seamlessly integrate Cheetah into any ad-
hoc MapReduce jobs. This allows MapReduce developers
to fully leverage the power of both MapReduce and data
warehouse technologies.

Year:

2010

Venue:

VLDB 2010

URL:

http://www.turn.com/wp-content/uploads/2010/09/White_Paper_Cheetah.pdf

Citations:

Citations range:

n/a

Attachment	Size
Chen2010CheetahAHighPerformanceCustomDataWarehouseonTopof.pdf	520.35 KB

websearch

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

Navigation

Related categories

User login