The traditional architecture for a DBMS engine has the recovery,
concurrency control and access method code tightly bound
together in a storage engine for records. We propose a different
approach, where the storage engine is factored into two layers
(each of which might have multiple heterogeneous instances). A
Transactional Component (TC) works at a logical level only: it
knows about transactions and their ―logical‖ concurrency control
and undo/redo recovery, but it does not know about page layout,
B-trees etc. A Data Component (DC) knows about the physical
Companies providing cloud-scale services have an increasing
need to store and analyze massive data sets such as search logs
and click streams. For cost and performance reasons, processing is
typically done on large clusters of shared-nothing commodity
machines. It is imperative to develop a programming model that
hides the complexity of the underlying system but provides flex-
ibility by allowing users to extend functionality to meet a variety
of requirements.
In this paper, we present a new declarative and extensible script-
There is currently considerable enthusiasm around the MapReduce
(MR) paradigm for large-scale data analysis [17]. Although the
basic control flow of this framework has existed in parallel SQL
database management systems (DBMS) for over 20 years, some
have called MR a dramatically new computing model [8, 17]. In
this paper, we describe and compare both paradigms. Furthermore,
we evaluate both kinds of systems in terms of performance and de-
velopment complexity. To this end, we define a benchmark con-
sisting of a collection of tasks that we have run on an open source