*
Quick Links|Home|Worldwide
Microsoft*
Search for



Immortal DB: Built-in Transaction Time Database Support

People

David Lomet

Roger Barga

1. Introduction

There has been research in temporal databases within the database community for around twenty years. Two forms of temporal functionality have been identified.

  1. Valid time functionality: valid time is the real world time at which some data recorded in the database became true (and at what point it was no longer true).  For example, John Smith started work at XYZ Corp. on April 1, 1994.  He may have left the company on June 27, 2001.
  2. Transaction time functionality: transaction time is the time at which information is posted to the database.  In our example above, John Smiths original record might not have been posted until the end of the quarter, e.g. June 30, 1994. 

Frequently, the valid time and the transaction time are very close, perhaps identical.  For example, the transaction time for an on-line purchase of a book at Amazon is also the valid time for the sale of the book.

The basic idea of temporal databases is to maintain multiple versions of data in the database.  Our focus in the Immortal DB project is transaction time databases, so we will describe the functionality in these terms.  Each transaction, when it enters data into the database, will place with the data a timestamp T1 of when the transaction occurred, which indicates the beginning of the lifetime of the data.  When that data is changed, a new version of the data is put into the database, also marked with a timestamp T2 indicating its start time and the old version is marked (perhaps implicitly) as having an end time of T2.

There are interesting user scenarios in which temporal (and specifically transaction time) database support is useful.

  1. For auditing purposes, a bank may find it useful to keep previous states of the database to facilitate checking that accounts balance correctly, and to provide customers with a detailed history of their account.
  2. A retailer may want to keep versions of the sales transaction data to answer queries about changes in inventory, and may want to mine that data for sales trends.
  3. A system admin, perhaps the user of a desktop system, or some tool working on behalf of the user, may find it useful to examine the historical states of the software and hardware configurations on the system.  This may aid in diagnosing problems, and could facilitate simultaneously supporting software applications that require different versions of DLLs or different settings of system parameters.

Several database systems, including SQL Server, provide what is called snapshot support, essentially transaction time functionality limited to the very recent past.  This provides a form of transaction isolation called snapshot isolation, in which reads are not blocked by concurrent updates because the reader reads a recent version instead of waiting for access to the current version.   But there are no commercial database systems that support temporal functionality beyond snapshots.  Part of the problem is that it is difficult to layer temporal support on top of an existing database system.  A layered implementation is cumbersome and typically has bad performance.

The Immortal DB project aims to provide transaction time database support built into SQL Server, not layered on top.   Our intent is to demonstrate that we can support transaction time functionality in addition to snapshot functionality, and that we can do this with good performance.

Immortal DB meshes well with another project of the Database Research Group called MT-Cache (MT = middle-tier).  MT-Cache implements a database supported cache with user specified currency.  A limitation is that if only part of a query can be answered from the cache, but another part requires going to the back end database source, then the entire query, in order to be transaction consistent, is executed at the backend database source.  With Immortal DB, the cache contents are describable as a version with an as of time.  One can query the backend database as of the same time for only the non-cached portion, producing a result that is transaction consistent with the cached data.  This increases the utility of the cached data.

There are two aspects to providing snapshot support that are of great significance in guiding transaction time database implementation and in determining its performance(1) nature of timestamps; and (2) how versions are stored.  In Section 2, we outline how snapshots are typically provided.  This will reveal why it is difficult to extend that approach to full transaction time support.  In Section 3, we describe how Immortal DB implements these aspects to provide transaction time support, how it can also support snapshots, and why our performance for both should be excellent.  We conclude with a brief summary in section 4.  While important for full transaction time functionality, we do not discuss query processing considerations. What we describe here is our current focus, which is on as of queries, i.e. SQL select queries that are qualified with an as of time- hence querying an historical version of the database.

2. Conventional Snapshot Approach

2.1 Timestamps

With transaction time databases or snapshots, it is important that the timestamp used to delineate version begin and version end be consistent with transaction serialization order.  Typically, snapshot implementers rely on the commit list technique, which represents the set of transactions that have committed prior to the start of a snapshot transaction, and hence should be visible to the snapshot transaction. Transactions are identified by a transaction sequence number (TSN) that is stored in the versions maintained by the system. A commit list could be implemented by keeping track of the TSN of each transaction that has committed thus far, hence producing a very long list. To reduce the commit list to a manageable size, the list of explicit transactions whose commit status is maintained is bounded both with an upper and lower bound.

  1. The lower bound LB is the largest TSN for which all transactions with smaller have terminated.  Hence the snapshot transaction can see all these earlier transaction updates.
  2. The upper bound UB is the smallest TSN of transactions all of which have not yet committed.  This can be one larger than the TSN for the snapshot transaction itself.

For each snapshot transaction, the system maintains an explicit list of transactions with TSNs in (LB,UB), sometimes as a bit vector indexed by TSN (or TSN-LB).  This list enumerates the transactions between LB and UB that have already committed, and hence whose updates should be visible to the snapshot transaction.  Alternatively, the list can be of those transactions that were active (uncommitted) at the time the snapshot started and hence whose updates should not be visible.

The system must maintain the list of transactions that are active at any time.  The snapshot transaction X, to get its timestamp, scans this list of active transactions.  All transactions with TSNs smaller than the smallest TSN among active transactions have terminated, and if committed, will have their TSNs posted to the record versions they updated.  Thus this smallest TSN -1 becomes LB for X.   All active transactions are not visible to X and are included in its list (or excluded, depending on the technique used).  All transactions with TSNs larger than the TSN issued (at this time) for X are uncommitted, and hence this TSN+1 becomes Xs UB.

The major advantage of the commit list approach is that it avoids having to return to records updated by a transaction to post a transaction time.  It also avoids having to maintain a long term connection between TSN and real time, which can require the updating of a persistent table that maps each TSN to real (commit) time.

The commit list approach is not, however, suitable for timestamping when versions are kept for extended periods of time and when general as of queries can be asked.  Essentially, only snapshot queries are given timestamps (their commit lists).  Thus, when an as of query is requested, and it encounters a version marked with a TSN, there is no way to answer whether this version is the one desired or not.  Specifically, there is no way to map from the as of time to a commit list. To support as of queries, we need to be able to find the latest version of data that is earlier than the as of time.  The TSN stored in the record, which indicates the transaction that updated it, only provides the order in which transactions were initiated, not transaction serialization order. 

2.2 Maintaining Versions

The usual way of storing versions for snapshots is to provide a separate storage area for the historical version.  Thus, the storage area for the current database is unchanged, and ordinary transactions (as opposed to snapshot transactions) need not even be aware of the historical versions.  There is an added cost to separating versions in this way however.  We discuss these costs below in terms of one common way of providing an historical-version store.

A convenient way to provide an historical-version store is to define for each snapshot enabled table, a history table as well.  When an update occurs, the existing committed version of an updated record is inserted into the history table.  Then the version is updated by the updating transaction.  Note that this requires that there be two database operations per update, one for the history table and one for the current table.  This doubles the number of log records written.  Further, these updates are in two different disk pages, so twice the number of pages is made dirty (in need of flushing) than is the case when snapshots are not being supported.

When executing a snapshot query, we will be looking for the latest committed transaction earlier than the start time for our snapshot.  When we visit the current table and examine the version of the record there, if that version is either uncommitted or is later than our snapshot time, we will need to visit the history table as well.  However, we can never visit only the history table (a current table visit is always required) because we need to ensure that the current version is NOT the version we need.

So, while maintaining separate tables for current and historical versions of data is a simple and clean implementation approach, it can substantially increase the overhead for both maintaining the versions and for accessing them during a snapshot transaction.

3. Immortal DB Approach

3.1. Timestamps Using Real Transaction Time

When each transaction commits, Immortal DB will give it a timestamp, i.e. a clock time, which is consistent with the transaction serialization order.  Initially, this time will be the transactions commit time, but if a transaction requests to see its time during the transaction (a possibility with SQL and its CURRENT TIME function), then the time may have to be earlier [5].  (Our prototype does not currently support this, but we know how to do so.)  Unlike commit lists, timestamps directly support querying versions at arbitrary times in the past.  Further, it is possible to index on time, or on key and time, so as to be able to access the appropriate versions directly.  We are working on an index structure (based on [1,2]) that indexes versioned data. 

The big advantage of commit lists is that the data records can continue to contain merely their TSNs, given when each transaction starts.  But timestamps are not known until much later.  This necessitates that we revisit updated records to timestamp them appropriately with the timestamp for the updating transaction.  So we originally timestamp records with TSNs, exactly as with commit lists.   But we eventually replace the TSNs with timestamps when they are known.  Our Immortal DB prototype maintains a <TSN, timestamp> mapping in a Transaction-Timestamp table (TTT), which is a persistent table that is updated by each transaction as it commits.  This stably records the mapping so that we can lazily revisit the updated records after the transaction has committed. 

Our prototype reference counts not yet timestamped records of a transaction.  When a transactions reference count goes to zero, this permits us to eventually remove its entry from the TTT, hence keeping this table small.

3.2. Maintaining Versions with the Current Data

Rather than providing a separate store for the historical versions for tuples (records) of a table, we include historical versions in the current page so long as that is possible.  When the current page fills up, we can either do a key split, as we would in a conventional B-tree, or a time split.  The result of a key split is that we go from one to two pages that hold both current and historical versions of records.  When we do a time split, all versions prior to the split time go to a historical page, while the versions that are valid after the split time remain in the current page.  Importantly, all committed versions of data are in the current page.

The approach of using the current page to store historical versions means that many updates can be done with a single page access.   In addition, we only log the update once, as a change to the current version.  The logging needed to keep of old versions is reflected in the operation that we associate with the logged update.  No second log record with redundant information is needed in this approach.  Occasionally, we need to do either a key or a time split.  Splitting activity is a bit more frequent than is the case without version support, but the extra writes can be amortized over multiple updates.  There is not an extra write and perhaps read per update.

When we do a time split of a current node, we include in that node a pointer to the historical node being formed by the split.  In addition, we include with that pointer the split time we used to divide the versions between current and history pages.  When we access versions during a snapshot transaction, we can tell by looking at the split time in the current page whether we will find the version of interest in the current page or that we need to access the historical page.  Most of the time, we expect to find the desired version, even if it is not the current version, in the current page.  (Occasionally, we will need to access the historical page.)  This greatly increases the access efficiency of snapshot queries.

3.3. Indexing Versions

As described above, were we to search for historical versions, we would need to always visit the current page, perhaps located by a B-tree key search, and then follow the time split chain of pages back until we encountered a page containing versions in the time range of interest.  This is ok for snapshots versioning, but is completely unacceptable for persistent versions, where the time split linked chain of pages can become very long.

In [1,2], we described an index structure that we call the time-split B-tree (TSB-tree).  This structure indexes the collection of time split and key split data pages that we have been describing.  This will permit Immortal DB to index directly to the data pages that are needed to satisfy an as of query.  The key to making this index efficient is to store copies of a version of a record that lives across a time split in both of the pages resulting from the time split.  This improves access performance while only modestly increasing the number of copies of versions.

4. Summary and Status

Immortal DB supports both persistent versions and snapshot versions.  The requirements for tables supporting these versions differ.  We summarize some of these differences and their implications here.

  1. Snapshot versions
    • Versions only exist for a limited time: We need to garbage collect old versions that can no longer satisfy a snapshot query. Access to historical versions by first accessing a current page is acceptable because the number of versions for any record is small.
    • Versions need not persist across system crashes: We can forget versions create prior to a crash. No snapshot transaction will persist across a crash.
  2. Persistent versions
    • Versions need to persist for an extended time: We might not have to garbage collect old versions. Because of the sizable number of historical pages, we need an index directly to the page to avoid the costly overhead of searching back along the time split list.
    • Versions need to persist across system crashes: We need to maintain the association between TSN and timestamp in the transaction timestamp table (TTT). To keep this table small, we reference count its entries.

Our Immortal DB prototype currently supports as of transactions and snapshot transactions against both persistent versioned tables and snapshot enabled tables and mixes of the two.  The prototype provides time and key splitting so that the number of records and historical versions can grow.   The prototype does lazy, post commit timestamping. It garbage collects both the TTT and snapshot versions that are too old.  Database concurrency control and recovery are fully integrated with and use the Sql Server mechanisms.

Bibliography

The following papers describe work related to transaction time databases. The papers in green were done within the Immortal DB project.

  1. Lomet, D., Vagena, Z., and Barga, R. Recovery from "Bad" User Transactions. SIGMOD Conference, Chicago, IL (June 2006) (to appear) pdf, .150MB
  2. Lomet, D., Barga, R., Mokbel, M., Shegalov, G., Wang, R., and Zhu, Y. Transaction Time Support Inside a Database Engine. ICDE Conference, Atlanta, GA (April 2006) PDF, .30MB
  3. Lomet, D., Snodgrass, R., and Jensen, C. Using the Lock Manager to Choose Timestamps. IDEAS Conference, Montreal, Canada (July 2005) 357-368. PDF, .27MB
  4. Lomet, D., Barga, R., Mokbel, M., Shegalov, G., Wang, R., and Zhu, Y. Immortal DB: Transaction Time Support for Sql Server. SIGMOD Conference, Baltimore, MD (June 2005) 939-941. pdf, .283MB
  5. Salzberg, B., Jiang, L., Lomet, D., Barrena Garca, M., Shan, J., Kanoulas, E. A Framework for Access Methods for Versioned Data. EDBT Conference Heraklion, Crete, Greece (March 2004) 730-747. PDF, .237MB
  6. Jiang, L., Salzberg, B., Lomet, D., and Barrena, M. The BTR-Tree: Path-Defined Version-Range Splitting in a Branched and Temporal Structure. SSTD Conference. Santorini, Greece (July 2003) PDF, .268MB
  7. Jiang, L., Salzberg, B., Lomet, D., and Barrena, M. The BT-Tree: A Branched and Temporal Access Method. VLDB Conference Cairo, Egypt (Sept. 2000) 451-460 postscript,.559MB
  8. Jensen, C., and Lomet, D. Transaction timestamping in (temporal) databases. VLDB Conference, Rome, Italy (Sept. 2001) 441-450 PDF,.12MB
  9. Lomet, D. and Salzberg, B. Transaction-Time Databases. Chapter in Temporal Databases: Theory, Design, and Implementation A Tansel et al eds., Benjamin/Cummings(1993) PDF, .14MB
  10. Lomet, D.B. and Salzberg, B. The Performance of a Multiversion Access Method. SIGMOD Conference, Atlantic City, NJ (May 1990) 353-363. PDF,.87MB
  11. Lomet, D. and Salzberg, B. Access methods for Multiversion Data. SIGMOD Conference, Portland, OR (May 1989) 315-324. PDF,.85MB

©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement