A user-defined function (UDF) is a powerful database fea-
ture that allows users to customize database functional-
ity. Though useful, present UDFs have numerous limita-
tions, including install-time specification of input and out-
put schema and poor ability to parallelize execution. We
present a new approach to implementing a UDF, which we
call SQL/MapReduce (SQL/MR), that overcomes many of
these limitations. We leverage ideas from the MapReduce
programming paradigm to provide users with a straightfor-
ward API through which they can implement a UDF in
the language of their choice. Moreover, our approach al-
lows maximum flexibility as the output schema of the UDF
is specified by the function itself at query plan-time. This
means that a SQL/MR function is polymorphic. It can pro-
cess arbitrary input because its behavior as well as output
schema are dynamically determined by information avail-
able at query plan-time, such as the function’s input schema
and arbitrary user-provided parameters. This also increases
reusability as the same SQL/MR function can be used on
inputs with many different schemas or with different user-
specified parameters.
In this paper we describe the motivation for this new ap-
proach to UDFs as well as the implementation within Aster
Data Systems’ nCluster database. We demonstrate that in
the context of massively parallel, shared-nothing database
systems, this model of computation facilitates highly scal-
able computation within the database. We also include ex-
amples of new applications that take advantage of this novel
UDF framework.
Attachment | Size |
---|---|
Friedman2009SQLMapReduceApracticalapproachtoselfdescribing.pdf | 283.63 KB |