Online reorganization of databases

Sockut, GH, Iyer, BR
Sockut, Gary H
Iyer, Balakrishna R.
ACM Computing Surveys (CSUR)
Citations range: 
10 - 49
a14-sockut.pdf865.38 KB

In practice, any database management system sometimes needs reorganization, that is, a change in some
aspect of the logical and/or physical arrangement of a database. In traditional practice, many types of reorganization
have required denying access to a database (taking the database offline) during reorganization.
Taking a database offline can be unacceptable for a highly available (24-hour) database, for example, a
database serving electronic commerce or armed forces, or for a very large database. A solution is to reorganize
online (concurrently with usage of the database, incrementally during users’ activities, or interpretively).
This article is a tutorial and survey on requirements, issues, and strategies for online reorganization. It analyzes
the issues and then presents the strategies, which use the issues. The issues, most of which involve
design trade-offs, include use of partitions, the locus of control for the process that reorganizes (a background
process or users’ activities), reorganization by copying to newly allocated storage (as opposed to reorganizing
in place), use of differential files, references to data that has moved, performance, and activation of reorganization.
The article surveys online strategies in three categories of reorganization. The first category,
maintenance, involves restoring the physical arrangement of data instances without changing the database
definition. This category includes restoration of clustering, reorganization of an index, rebalancing of parallel
or distributed data, garbage collection for persistent storage, and cleaning (reclamation of space) in a logstructured
file system. The second category involves changing the physical database definition; topics include
construction of indexes, conversion between B+-trees and linear hash files, and redefinition (e.g., splitting) of
partitions. The third category involves changing the logical database definition. Some examples are changing
a column’s data type, changing the inheritance hierarchy of object classes, and changing a relationship from
one-to-many to many-to-many. The survey encompasses both research and commercial implementations, and
this article points out several open research topics. As highly available or very large databases continue to
become more common and more important in the world economy, the importance of online reorganization is
likely to continue growing.