DRAFT -- August 12, 1988 11:41:43 am PDT -- DRAFT Internal Memo XEROX To From Distribution Bob Hagmann PARC/CSL Subject Date Yggdrasil Document Update August 12, 1988 Introduction This memo sketches the overall philosophy and design of the data structures and algorithms used to read, modify, and create documents. This memo reflects the code, not the desired design. The big picture Stable and volatile forms of the document Documents are stored on disk (magnetic or optical). This is called the stable form of the document. This is the last committed version of the object (except for modifications held in the log). The document is flattened into a byte stream and written to some number of files. Eventually, small documents will be written to slots of pages, but currently no objects share pages or use slots. A document (usually) has allocated secondary storage. Where this document is located is found via the DID Map. The location of the parts of a document are mostly unchanged during update, but (of course) are changed by creation, deletion, extension, or contraction of the contents or attributes of the document. The location information in the DID Map is for all versions and alternatives (not implemented yet) of the document. Think of it as only being for committed transactions. When a document is referenced by a transaction, a volatile form of it is created a (VDoc). This is a data structure where the stable form of the document has been unflattened (as needed). If the document has previously been referenced, the volatile form may have already been created and cached. Multiple volatile forms of the document may be current in the system. Caching Committed VDocs are cached under the DID and the null transaction identifier. Whenever a document is referenced by a transaction, the cache is examined. If the document is not found for the transaction, then all the parents of the transaction are searched in order. Finally, the null transaction is tried. If any of the parents or null transaction hit, then an entry in the cache is created for the DID and transaction identifier based on the found VDoc. Below, we will call the initial VDoc as the base form of the VDoc. If nothing hits, the DID Map is examined to find the document on secondary storage, and the disk is read and the data parsed to build a cache entry. Intentions As updates occur, they are added to the intentions lists of the VDoc (intentions lists are also called changes). Whenever a VDoc is examined, the base form of the VDoc is viewed through the intentions  that is, it appears that the base form always has the intentions applied before any question about the document is answered. As a subtransaction commits, its intentions are promoted to the parent (sub)transaction. If a subtransaction aborts, then the intentions are thrown away by removing the VDoc from the cache. Interaction with the Camelot transaction layer Yggdrasil provides a layer over the Camelot transaction layer. Any system that can communicate with Yggdrasil can start, abort, or commit transactions; and can use a transaction with other systems that use Camelot. However, we restrict the start, abort, and commit to always be via Yggdrasil. This is to give Yggdrasil a chance to apply its intentions before Camelot transaction commit. Subtransaction commit and abort are easy. Let's only consider top level transactions. As a transaction runs, it can start to finish in one of two ways: it can start to commit or start to abort. The abort may happen due to Camelot (e. g., via the detection of distributed deadlock) or by a client calling Yggdrasil requesting abort. This kind of abort is easy. Just destroy the VDocs for the transaction and do other cleanup. A commit is much more complicated. When Yggdrasil is told to commit a transaction, it first enters its internal PreCommit phase. The various parts of Yggdrasil that maintain intentions lists are called in a carefully determined order (the VDocs are not the only part of the system that are maintained using intention lists). Each part applies the intentions for the transaction. The intentions for the VDoc are applied and are then cached under the DID and null transaction. (Recall that the DID and null transaction are the key for the last committed value of a document.) Recoverable storage maintained by Camelot is now updated. Yggdrasil then calls Camelot to commit the transaction. We expect that Camelot will normally commit (why not? single site it should be obvious to commit). It Yggdrasil is asked to vote, it will vote commit. If Yggdrasil is asked to vote on a transaction that it has not PreCommited, it will vote no. If Camelot informs Yggdrasil that the transaction commits, then various clean up will have to be done (e. g., we may have latched the VDoc for the DID and null transaction to prevent a concurrent update (say for updating the inlinks while updating the contents)). If abort occurs, then the undo log is replayed and recoverable storage is returned to its proper state. Then, all modified VDocs for the transaction are destroyed. When needed, they can be rebuilt from the stable form of the database. This means that the intentions only have to be for "redo" operations, and cannot be undone. Some clean up will also have to be done.