DRAFT -- March 22, 1988 2:38:20 pm PST -- DRAFT
Internal Memo
ToFrom
DistributionBob Hagmann
 PARC/CSL
SubjectDate
Yggdrasil Implementation: Quick and Dirty March 22, 1988
Introduction
This memo addresses the issue of how to implement phase zero of Yggdrasil. In this plan, we attempt to get a minimum function server usable at an early date. To a large extent, we use existing software to provide the bulk of the functionality.
In particular, we would use the UNIX file system to store the data. We defer many of the more interesting aspects of the system, and try to get early users. We provide a basic hypertext system with few of the frills. This system should be useful to NoteCards.
Alternatively, the use the UNIX file system can be viewed as scaffolding to build the higher layers of the system. This decouples the storage layer from the basic upper layers proposed here. We can proceed down both paths at the same time.
Features needed in Phase 0
Below are the features of the system as taken from the overview memo. In strikeout are the features that are to be omitted in phase 0.
f large number of documents
but, no archival storage in Phase 0
f documents can be large
f documents can be small
but, they all are stored in files so there can be lots of fragmentation!
f documents ``types'' understood by the system include text, bitmaps, digitized audio, digitized video (?), graphics, ...
f documents ``types'' are extensible
f documents can be connected via links (hypertext)
there are no references and there is only historic and foreign links (no version links)
f documents can be named via a ``hierarchical'' name system
cycles are not allowed
f documents can have attributes and keywords
f documents are grouped into contexts called containers
cycles are not allowed in the container DAG
containment and children are identical
f keyword and other indices maintained per container
f support for versions and alternatives
f data compression and decompression
f support for archival storage
f alerters (send a message when a system event occurs)
f query language and query optimizer
f page level access within a document
f transactions
f robust
f good performance
f fast recovery
f good availability
f access control
f hooks for multi-server and foreign server support
Overall Implementation Plan
The goal of Phase 0 is to get up a running server with a restricted interface, modest performance, poor recovery time, modest availability, and poor fragmentation. Phase 0 of Yggdrasil will sacrifice many of the interesting systems features of the project. However, the server would support the basic hypertext, naming, and indexing that are the key features of the data model. When the server runs in this mode, then it is possible to put up a server for a few clients. By putting up a server early, we hope to be able to measure its use, obtain client feedback, and build a history of success in the project. In any case, this approach allows us to scaffold up the higher levels of the system on available pieces of software.
The essential idea is to get maximum leverage out of the UNIX file system. We would run under SUN OS and use the "Fast File System." Reserved files would be used for key system data.
The logging code of Yggdrasil would be the same. It would log to a UNIX file, using the fsync system call to force the log file.
The directory system of the UNIX file system is used extensively. A directory is a container. Sub-containers are directories in the container's directory. Specially named files in the directory have the contents ($contents), index ($index), links ($links), and properties ($properties) files. Objects directly contained in the container appear as file triplets: a unique name followed by one of three extensions. The extension contents contains the file's contents, the extension links contains the file's links, and extension properties contains the file's properties. Either or both of the links and properties files may be missing if they are empty.
Links are encoded using the file name of the "from" or "to" object. If the object moves, then all links to it must be updated.
It will not be possible to find out in which containers an object resides.
A typical object will take three (3) disk pages. If (when) this is a problem, we could merge the three files into one file for the simple cases (e. g., if each was < 1/3 the size of a file block, then use a fixed allocation scheme). Conversion to the general case would have to be automatic.
Large containers are a problem. The UNIX file system searchs a directory linearly, so finding a particular entry can take a long time.
Advantages and Disadvantages of this Approach Verses Previous Approach
Advantages
f early clients/client feedback
f apparent progress
f lots of shared code with final system
f implications for System 33 (indexing and storage)
f flexible about next stage (archival storage, better file layer, more of the data model, ...)
Disadvantages
f doing this may prevent us from later "doing it right"
f bad fragmentation (memory and disk)
f modest performance
using the directory system is a bad idea due to UNIX's linear search
only 64 file descriptors Ò "page" the open files
links as file names means lots of directory searches
multiple files per object (Ò? small objects encoded into one file)
if an object moves, then all links to it must be updated
f only basic hypertext data model
f backup, crash recovery, and other "systems" issues only partly satisfied
Parts of Cedar Needed (not already done?)
BTree
processes
RedBlackTree
File (FileStream or UXIO streams would be OK)
SafeStorage (Finalization)
Interfaces Needed by Cypress/Alpine (less those already ported, those with trivial use (e. g., wordsPerPage from PrincOps), or only needed by SkiPatrol, FTP, debugging, AlpineFSImpl, Cedar RPC generated files, directory implementations, or AlpineControlPanelImpl)
BTree
CountedVM
File
FileBackdoor (erase volume and root file finding)
GVNames (authentication)
PrincOpsUtils (LongCopy, LongMove, LongZero, BITAND)
Process (Pause, SecondsToTicks, Detach, Ticks, SetPriority, EnableAborts, SetTimeout, Seconds, MsecToTicks, GetCurrent, Milliseconds, DisableTimeout, Yield, CheckForAbort)
RedBlackTree (used in file page and log managers)
RefTab
RPC
RPCLupine
SymTab
UserCredentials (Get)
VM
Interfaces Needed by Cypress/Alpine (already ported)
Ascii
Atom
Basics
BasicTime
CardTab
DebuggerSwap
IO
List
Pup
RefTab
Rope
RuntimeError
SafeStorage
SymTab
Implementation factoring
Server Level 0
Disk.mesa and below
Use one big UNIX file for metadata storage. [Or use a few files.]
Low level communications support
SUN OS/Mach should provide this.
Runtime support
B-tree and other data structures from the SUN OS release or from Cedar.
Server Level 1
Æ Buffer manager
Use the UNIX file system cache!
Æ Log manager
Similar to Alpine. Use a UNIX file as the log. Open it with the "sync bit" set so that writes go directly to the file.
Æ Lock manager
Evaluate Alpine. Only need object (file) locks.
Æ Transaction manager
Evaluate Alpine.
Server Level 2
ÆÆ Object encoder/decoder
Similar to the page layer of Cypress. Only needed for metadata (links, properties, and indices). This is a major part of the effort. [To go to phase 1, the property and index code should be reuseable, while the link code will have to be changed to use DIDs instead of file names.]
Backup and backup recovery
Store everything in UNIX files and use UNIX backup. Live with the problems of inconsistencies. Bring the server down just before backup at night and back up after backup.
Container implementation
Build off of the UNIX file system hierarchical directories. A container is a directory. All directories found in this directory are children. No loops.
Crash recovery
Try to force the page cache out, but mostly loose here for data. Be careful for metadata and the log.
DID manager
Encode DID as prefix
Performance monitoring and historical/debugging logging
Fake it
Client
Client stub for Cedar and Interlisp (NoteCards)
Test code from Cedar client