|
Immortal DB: Built-in Transaction Time Database Support
People
1. Introduction
There has been research in temporal databases within the database community for around twenty years. Two forms of temporal functionality have been identified.
Frequently, the valid time and the transaction time are very close, perhaps identical. For example, the transaction time for an on-line purchase of a book at Amazon is also the valid time for the sale of the book. The basic idea of temporal databases is to maintain multiple versions of data in the database. Our focus in the Immortal DB project is transaction time databases, so we will describe the functionality in these terms. Each transaction, when it enters data into the database, will place with the data a timestamp T1 of when the transaction occurred, which indicates the beginning of the lifetime of the data. When that data is changed, a new version of the data is put into the database, also marked with a timestamp T2 indicating its start time and the old version is marked (perhaps implicitly) as having an end time of T2. There are interesting user scenarios in which temporal (and specifically transaction time) database support is useful.
Several database systems, including SQL Server, provide what is called snapshot support, essentially transaction time functionality limited to the very recent past. This provides a form of transaction isolation called snapshot isolation, in which reads are not blocked by concurrent updates because the reader reads a recent version instead of waiting for access to the current version. But there are no commercial database systems that support temporal functionality beyond snapshots. Part of the problem is that it is difficult to layer temporal support on top of an existing database system. A layered implementation is cumbersome and typically has bad performance. The Immortal DB project aims to provide transaction time database support built into SQL Server, not layered on top. Our intent is to demonstrate that we can support transaction time functionality in addition to snapshot functionality, and that we can do this with good performance. Immortal DB meshes well with another project of the Database Research Group called MT-Cache (MT = middle-tier). MT-Cache implements a database supported cache with user specified currency. A limitation is that if only part of a query can be answered from the cache, but another part requires going to the back end database source, then the entire query, in order to be transaction consistent, is executed at the backend database source. With Immortal DB, the cache contents are describable as a version with an as of time. One can query the backend database as of the same time for only the non-cached portion, producing a result that is transaction consistent with the cached data. This increases the utility of the cached data. There are two aspects to providing snapshot support that are of great significance in guiding transaction time database implementation and in determining its performance(1) nature of timestamps; and (2) how versions are stored. In Section 2, we outline how snapshots are typically provided. This will reveal why it is difficult to extend that approach to full transaction time support. In Section 3, we describe how Immortal DB implements these aspects to provide transaction time support, how it can also support snapshots, and why our performance for both should be excellent. We conclude with a brief summary in section 4. While important for full transaction time functionality, we do not discuss query processing considerations. What we describe here is our current focus, which is on as of queries, i.e. SQL select queries that are qualified with an as of time- hence querying an historical version of the database.
2. Conventional Snapshot Approach
2.1 Timestamps
With transaction time databases or snapshots, it is important that the timestamp used to delineate version begin and version end be consistent with transaction serialization order. Typically, snapshot implementers rely on the commit list technique, which represents the set of transactions that have committed prior to the start of a snapshot transaction, and hence should be visible to the snapshot transaction. Transactions are identified by a transaction sequence number (TSN) that is stored in the versions maintained by the system. A commit list could be implemented by keeping track of the TSN of each transaction that has committed thus far, hence producing a very long list. To reduce the commit list to a manageable size, the list of explicit transactions whose commit status is maintained is bounded both with an upper and lower bound.
For each snapshot transaction, the system maintains an explicit list of transactions with TSNs in (LB,UB), sometimes as a bit vector indexed by TSN (or TSN-LB). This list enumerates the transactions between LB and UB that have already committed, and hence whose updates should be visible to the snapshot transaction. Alternatively, the list can be of those transactions that were active (uncommitted) at the time the snapshot started and hence whose updates should not be visible. The system must maintain the list of transactions that are active at any time. The snapshot transaction X, to get its timestamp, scans this list of active transactions. All transactions with TSNs smaller than the smallest TSN among active transactions have terminated, and if committed, will have their TSNs posted to the record versions they updated. Thus this smallest TSN -1 becomes LB for X. All active transactions are not visible to X and are included in its list (or excluded, depending on the technique used). All transactions with TSNs larger than the TSN issued (at this time) for X are uncommitted, and hence this TSN+1 becomes Xs UB. The major advantage of the commit list approach is that it avoids having to return to records updated by a transaction to post a transaction time. It also avoids having to maintain a long term connection between TSN and real time, which can require the updating of a persistent table that maps each TSN to real (commit) time. The commit list approach is not, however, suitable for timestamping when versions are kept for extended periods of time and when general as of queries can be asked. Essentially, only snapshot queries are given timestamps (their commit lists). Thus, when an as of query is requested, and it encounters a version marked with a TSN, there is no way to answer whether this version is the one desired or not. Specifically, there is no way to map from the as of time to a commit list. To support as of queries, we need to be able to find the latest version of data that is earlier than the as of time. The TSN stored in the record, which indicates the transaction that updated it, only provides the order in which transactions were initiated, not transaction serialization order.
2.2 Maintaining Versions
The usual way of storing versions for snapshots is to provide a separate storage area for the historical version. Thus, the storage area for the current database is unchanged, and ordinary transactions (as opposed to snapshot transactions) need not even be aware of the historical versions. There is an added cost to separating versions in this way however. We discuss these costs below in terms of one common way of providing an historical-version store. A convenient way to provide an historical-version store is to define for each snapshot enabled table, a history table as well. When an update occurs, the existing committed version of an updated record is inserted into the history table. Then the version is updated by the updating transaction. Note that this requires that there be two database operations per update, one for the history table and one for the current table. This doubles the number of log records written. Further, these updates are in two different disk pages, so twice the number of pages is made dirty (in need of flushing) than is the case when snapshots are not being supported. When executing a snapshot query, we will be looking for the latest committed transaction earlier than the start time for our snapshot. When we visit the current table and examine the version of the record there, if that version is either uncommitted or is later than our snapshot time, we will need to visit the history table as well. However, we can never visit only the history table (a current table visit is always required) because we need to ensure that the current version is NOT the version we need. So, while maintaining separate tables for current and historical versions of data is a simple and clean implementation approach, it can substantially increase the overhead for both maintaining the versions and for accessing them during a snapshot transaction.
3. Immortal DB Approach
3.1. Timestamps Using Real Transaction Time
When each transaction commits, Immortal DB will give it a timestamp, i.e. a clock time, which is consistent with the transaction serialization order. Initially, this time will be the transactions commit time, but if a transaction requests to see its time during the transaction (a possibility with SQL and its CURRENT TIME function), then the time may have to be earlier [5]. (Our prototype does not currently support this, but we know how to do so.) Unlike commit lists, timestamps directly support querying versions at arbitrary times in the past. Further, it is possible to index on time, or on key and time, so as to be able to access the appropriate versions directly. We are working on an index structure (based on [1,2]) that indexes versioned data. The big advantage of commit lists is that the data records can continue to contain merely their TSNs, given when each transaction starts. But timestamps are not known until much later. This necessitates that we revisit updated records to timestamp them appropriately with the timestamp for the updating transaction. So we originally timestamp records with TSNs, exactly as with commit lists. But we eventually replace the TSNs with timestamps when they are known. Our Immortal DB prototype maintains a <TSN, timestamp> mapping in a Transaction-Timestamp table (TTT), which is a persistent table that is updated by each transaction as it commits. This stably records the mapping so that we can lazily revisit the updated records after the transaction has committed. Our prototype reference counts not yet timestamped records of a transaction. When a transactions reference count goes to zero, this permits us to eventually remove its entry from the TTT, hence keeping this table small.
3.2. Maintaining Versions with the Current Data
Rather than providing a separate store for the historical versions for tuples (records) of a table, we include historical versions in the current page so long as that is possible. When the current page fills up, we can either do a key split, as we would in a conventional B-tree, or a time split. The result of a key split is that we go from one to two pages that hold both current and historical versions of records. When we do a time split, all versions prior to the split time go to a historical page, while the versions that are valid after the split time remain in the current page. Importantly, all committed versions of data are in the current page. The approach of using the current page to store historical versions means that many updates can be done with a single page access. In addition, we only log the update once, as a change to the current version. The logging needed to keep of old versions is reflected in the operation that we associate with the logged update. No second log record with redundant information is needed in this approach. Occasionally, we need to do either a key or a time split. Splitting activity is a bit more frequent than is the case without version support, but the extra writes can be amortized over multiple updates. There is not an extra write and perhaps read per update. When we do a time split of a current node, we include in that node a pointer to the historical node being formed by the split. In addition, we include with that pointer the split time we used to divide the versions between current and history pages. When we access versions during a snapshot transaction, we can tell by looking at the split time in the current page whether we will find the version of interest in the current page or that we need to access the historical page. Most of the time, we expect to find the desired version, even if it is not the current version, in the current page. (Occasionally, we will need to access the historical page.) This greatly increases the access efficiency of snapshot queries.
3.3. Indexing Versions
As described above, were we to search for historical versions, we would need to always visit the current page, perhaps located by a B-tree key search, and then follow the time split chain of pages back until we encountered a page containing versions in the time range of interest. This is ok for snapshots versioning, but is completely unacceptable for persistent versions, where the time split linked chain of pages can become very long. In [1,2], we described an index structure that we call the time-split B-tree (TSB-tree). This structure indexes the collection of time split and key split data pages that we have been describing. This will permit Immortal DB to index directly to the data pages that are needed to satisfy an as of query. The key to making this index efficient is to store copies of a version of a record that lives across a time split in both of the pages resulting from the time split. This improves access performance while only modestly increasing the number of copies of versions.
4. Summary and Status
Immortal DB supports both persistent versions and snapshot versions. The requirements for tables supporting these versions differ. We summarize some of these differences and their implications here.
Our Immortal DB prototype currently supports as of transactions and snapshot transactions against both persistent versioned tables and snapshot enabled tables and mixes of the two. The prototype provides time and key splitting so that the number of records and historical versions can grow. The prototype does lazy, post commit timestamping. It garbage collects both the TTT and snapshot versions that are too old. Database concurrency control and recovery are fully integrated with and use the Sql Server mechanisms.
Bibliography
The following papers describe work related to transaction time databases. The papers in green were done within the Immortal DB project.
|