Cloud Database

Apache Hadoop Goes Realtime at Facebook

Mon, 10/17/2011 - 17:35 — kolb

Authors:

Borthakur, Dhruba; Sarma, Joydeep Sen; Gray, Jonathan; Muthukkaruppan, Kannan; Spiegelberg, Nicolas; Kuang, Hairong; Ranganathan, Karthik; Molkov, Dmytro; Menon, Aravind; Rash, Samuel; Schmidt, Rodrigo; Aiyer, Amitanand

Facebook recently deployed Facebook Messages, its first ever
user-facing application built on the Apache Hadoop platform.
Apache HBase is a database-like layer built on Hadoop designed
to support billions of messages per day. This paper describes the
reasons why Facebook chose Hadoop and HBase over other
systems such as Apache Cassandra and Voldemort and discusses
the application’s requirements for consistency, availability,
partition tolerance, data model and scalability. We explore the
enhancements made to Hadoop to make it a more effective

Year:

2011

Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

Tue, 12/21/2010 - 16:37 — kolb

Authors:

Chen, Songting

Large-scale data analysis has become increasingly impor-
tant for many enterprises. Recently, a new distributed com-
puting paradigm, called MapReduce, and its open source
implementation Hadoop, has been widely adopted due to
its impressive scalability and ﬂexibility to handle structured
as well as unstructured data. In this paper, we describe
our data warehouse system, called Cheetah, built on top of
MapReduce. Cheetah is designed speciﬁcally for our online
advertising application to allow various simpliﬁcations and
custom optimizations. First, we take a fresh look at the data

Year:

2010

The Case for Determinism in Database Systems

Thu, 09/16/2010 - 02:59 — flaviosousa

Authors:

Thomson, Alexander; Abadi, Daniel J.

Replication is a widely used method for achieving high availability in database systems. Due to the nondeterminism inherent in traditional concurrency control schemes, however, special care must be taken to ensure that replicas don’t
diverge. Log shipping, eager commit protocols, and lazy synchronization protocols are well-understood methods for
safely replicating databases, but each comes with its own cost in availability, performance, or consistency.
In this paper, we propose a distributed database system which combines a simple deadlock avoidance technique with

Year:

2010

Hadoop: The Definitive Guide MapReduce for the Cloud - MapReduce for the Cloud

Thu, 01/07/2010 - 19:15 — kolb

Authors:

White, Tom; Gray, Jonathan; Stack, Michael

Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters.

Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:

Year:

2009

Read more

Introduction to cloud computing

Thu, 01/07/2010 - 18:40 — kolb

Authors:

Lu, Jiaheng

Year:

2009

Hive User Meeting August 2009 Facebook

Thu, 01/07/2010 - 18:33 — kolb

Year:

2009

HBase-0.20.0 Performance Evaluation

Mon, 11/16/2009 - 15:06 — kolb

Authors:

Rao, Anty; Zhang, Schubert

We have been using HBase for around a year in our development and projects, from 0.17.x to
0.19.x. We and all in the community know the critical performance and reliability issues of these
releases.

Now, the great news is that HBase‐0.20.0 will be released soon. Jonathan Gray from Streamy,
Ryan Rawson from StumbleUpon, Michael Stack from Powerset/Microsoft, Jean‐Daniel Cryans
from OpenPlaces, and other contributors had done a great job to redesign and rewrite many

Year:

2009

Wie passen Dokumente und Datenbanken zusammen? CouchDB als komfortable REST-basierte Datenbankalterative

Fri, 11/13/2009 - 16:44 — kolb

Authors:

Pientka, Frank

Als dokumentenorientierte Datenbank für das Internet unterscheidet sich CouchDB bereits grundlegend von klassischen relationalen Datenbanken. Dabei setzt es konsequent auf den populären MapReduce-Algorithmus und Internetstandards, wie das JSON-Austauschformat und das REST-Protokoll. In diesem Beitrag werden wir die Hintergründe diskutieren, wie eine hochskalierbare Datenarchitektur für das Web heute aussehen könnte und wie wir diese am Beispiel der CouchDB realisieren können.

Year:

2009

Hive - A Warehousing Solution Over a Map-Reduce Framework

Mon, 10/19/2009 - 10:15 — admin

Authors:

Thusoo, Ashish; Sarma, Joydeep Sen; Jain, Namit; Shao, Zheng; Chakka, Prasad; Anthony, Suresh; Liu, Hao; Wyckoff, Pete; Murthy, Raghotham

The size of data sets being collected and analyzed in the
industry for business intelligence is growing rapidly, mak-
ing traditional warehousing solutions prohibitively expen-
sive. Hadoop [3] is a popular open-source map-reduce im-
plementation which is being used as an alternative to store
and process extremely large data sets on commodity hard-
ware. However, the map-reduce programming model is very
low level and requires developers to write custom programs
which are hard to maintain and reuse.
In this paper, we present Hive, an open-source data ware-

Year:

2009

Building a database on S3

Mon, 10/19/2009 - 09:55 — admin

Authors:

Brantner, Matthias; Florescu†, Daniela; Graf, David; Kossmann, Donald; Kraska, Tim

There has been a great deal of hype about Amazon’s simple storage
service (S3). S3 provides inﬁnite scalability and high availability at
low cost. Currently, S3 is used mostly to store multi-media docu-
ments (videos, photos, audio) which are shared by a community of
people and rarely updated. The purpose of this paper is to demon-
strate the opportunities and limitations of using S3 as a storage sys-
tem for general-purpose database applications which involve small
objects and frequent updates. Read, write, and commit protocols

Year:

2008

Cloud Computing publication categorizer

Keyword search

Guided search

Author

Year

Topic

Tags

mailpart

Citations range

Apache Hadoop Goes Realtime at Facebook

Cheetah: A High Performance, Custom Data Warehouse on Top of MapReduce

The Case for Determinism in Database Systems

Hadoop: The Definitive Guide MapReduce for the Cloud - MapReduce for the Cloud

Introduction to cloud computing

Hive User Meeting August 2009 Facebook

HBase-0.20.0 Performance Evaluation

Wie passen Dokumente und Datenbanken zusammen? CouchDB als komfortable REST-basierte Datenbankalterative

Hive - A Warehousing Solution Over a Map-Reduce Framework

Building a database on S3

Navigation

User login