DRAFT -- December 1, 1987 -- DRAFT
Internal Memo
ToFrom
DistributionBob Hagmann
 PARC/CSL
SubjectDate
Yggdrasil Overview December 1, 1987
Introduction
Xerox corporate strategy requires multimedia, large scale, unstructured, robust, and efficient servers for the storage and retrieval of documents. Yggdrasil is a project in Xerox's Palo Alto Research Center, Computer Sciences Laboratory to address many of the research issues needed to provide such a service. This research is needed for Xerox to understand how to implement a filing strategy.
Many organizations require the more or less permanent storage of information. Over the years, there have evolved three types of electronic services that have satisfied parts of the problem: databases, file systems, and file servers.
Each of these systems have their strengths and weaknesses. Over time, the problems and technologies change. What was appropriate at one time is not a good solution at a different time.
The cost of secondary storage has continued to decrease over the years. In addition, the range of size of individual objects that we wish to store has grown (one storage object in CSL is a 160 megabyte file). We are not always satisfied with the simple yet elegant hierarchal naming systems such as we have in file systems. Nor are the concepts of relational database systems a panacea: it is missing several features that make it difficult for it to model some applications (e.g., good solutions for repeating groups, hypertext, stored procedures, triggers, transitive closure, large objects, ...). Object-oriented databases are a current research topic, but significant problems arise in their performance, locking, security, evolution, archiving, execution model, robustness, as well as other problems.
Although much of the research in database systems is involved with increasing the semantics of the system, many of the potential clients of these systems simply want data storage. They want the database to ``store the bits and get out of the way.'' Semantics will be provided by the applications. The database must be fast, efficient, and robust.
There seems to be a good deal of overlap between the needs of several groups to store and retrieve documents. The needs include the storage of documents of vastly varying sizes, the naming and indexing of some of the documents, the interconnection of (some of) the documents, and notification of significant events of documents.
The following is a partial list of CRG projects, proposals, groups, and needs that could be served by a Document Server.
f FolioPub project at WRC
f ERSG proposal at WRC
f VLSI DA (CAD) group in CSL
f Voice project in CSL
f Software storage for the Cedar Programming Environment in CSL
f Storage of scanned images
f Distributed Notecards in ISL
f Large capacity and high performance file server
f Mail storage
f TIC support, both at WRC and PARC
Xerox also has the current focus of ``document processing.'' Central to most (if not all) document processing systems is the storage and retrieval of documents.
We are interested in the subclass of documents called electronic documents; these are ``documents'' that can be handled in some way via computers. Although there is no precise definition of ``electronic document,'' we believe that a stored electronic document should have (some of) the following features:
f Primitive elements include plain text, structured text (e.g., a Tioga or Viewpoint file), scanned image, mail message, digitized voice, digitized video, and compressed versions of the above. This should be an extensible list.
f A document is a primitive element with an optional ordered collection of documents. Note that is a recursive definition.
f Documents have attributes. Some attributes are interpreted and some are uninterpreted. Keywords may be distinguished attributes.
f Documents may be interconnected with links. Links have names. Links are also documents.
The basic idea is to build an ``Interim Document Storage and Retrieval System'' called Yggdrasil. This system will be the central code of several server systems. Yggdrasil could have front-ends that made it looks like some or all of an XNS File Server, an OSI File Server, an NFS File Server, or a native Yggdrasil server. A single server could export multiple interfaces over the same ``Document Base'' if they are compatible, possibly allowing only restricted or stylized access from the simpler protocols. Yggdrasil should be configurable with various secondary storage devices to meet a variety of requirements for response time, storage capacity, crash recovery, availability, archival storage, backup, and robustness.
We in CSL are interested in providing the basic storage and retrieval system, together with its associated administrative functions. We hope to cooperate with other groups to make this a useful system. Other groups could provide, for example, an automatic keyword generation, selective dissemination of information (SDI), inverse typesetting, document capture (e. g., scanning), and other facilities.
The name Yggdrasil was chosen because of its definition:
Ygg|dra|sil n. Also Yg|dra|sil. Norse Mythology. The great ash tree that holds together earth, heaven, and hell by its roots and branches. [Old Norse, probably the horse of Yggr'' : Yggr, name of Odin, from yggr, variant of uggr, frightful (see ugly) + drasill, horse.]
The rest of this memo outlines the ideas for the system. For compact presentation, an outline in bullet form is used.
Basic Features
f large number of documents
f documents can be large
f documents can be small
f documents ``types'' understood by the system include text, bitmaps, digitized audio, digitized video (?), graphics, ...
f documents ``types'' are extensible
f documents can be connected via links (hypertext)
f documents can be named via a ``hierarchical'' name system
f documents can have attributes and keywords
f documents are grouped into contexts called containers
f keyword and other indices maintained per container
f support for versions and alternatives
f data compression and decompression
f support for archival storage
f alerters (send a network message when a system event occurs)
f query language and query optimizer
f page level access within a document
f transactions
f robust
f good performance
f fast recovery
f good availability
f hooks for multi-server and foreign server support
Hardware Base
f > 3 MIP CPU (or much more)
f > 8 Megabytes of main memory possible
f e 2 Megabytes of main memory minimum
f e 32 Megabytes of main memory expected
f Good communications hardware
back-to-back packets
sufficient bandwidth (10 megabits/second or much more)
f SCSI bus (or other way to connect efficiently to a large amount of storage)
for WORM optical and magnetic hard disks (if WORM optical not on a separate server)
f Online archiving (optional)
1) WORM optical disks, most likely with a jukebox
might be accessible over the net as an ``archive server'' instead of connected by a bus
2) Other archival device with appropriate size and speed
f Magnetic hard disks
possibly in large numbers even if WORM optical disks are used!!
5-20+ Gigabytes
varying capacity and performance
mirrored hardware for the log
f Plan for read/write optical disks
will not perform as well as magnetic hard disks (at least at first)
f Plan for stable main memory
battery backed RAM
f Flexible design to accept new storage and communications technology
Operating System and Programming Environment
f Supports sharing and concurrency
f Good existing communications
supports protocol(s) needed for server
support likely for new protocols
f Reasonable programming environment support
Cedar seems to be a clear choice
f Server should be able to run under
1) SUN OS
2) Cedar on a Dorado
3) Cedar on a Dragon
High Level Protocols
f ``Native'' protocol with (eventual) full support for data model
f Build server so that a protocol veneer can be built later to handle (at least)
XNS Filing
ANSI/CCITT/OSI/... Filing and Retrieval
NFS
If the system talks Pup, then consider Pup FTP
Data Storage
f The system will implement a storage hierarchy including some or all of these levels
memory
stable main memory (battery backed up DRAM)
magnetic hard disk log
magnetic hard disk buffers
magnetic hard disk permanent storage
WORM optical disk jukebox
WORM optical disk backup or tape backup
f Log based recovery
f Use group commit and/or stable main memory for the log tail for high update load (if needed)
f All naming, indexing, buffering, linkages, etc. are saved and updated on magnetic hard disk (or other fast secondary storage)
f Delay processing (e.g., indexing)
accept ``image'' quickly; index at leisure
Ò stores quickly and avoids conflicts
f Large object support
f Hints for data storage (clustering)
f ``Real time'' support -- at least enough for voice
f Archival storage possibilities
a) write to archival device when not referenced recently (current design)
b) write to archival device ASAP but when convenient
c) write to archival device preserving time order of documents (possibly do to legal requirements)
f Plan for new technologies and changes in old technologies
XDM might replace WORM optical disk
optical tape
digital video tape
f Key is to build a framework for varying performance storage devices; the actual devices might change
Data Model
Objects
f An object is either a document or a link
f All objects (both documents and links) can have ``contents'' as well as attributes (a property list)
f Links have both source and destination objects
f Links are first class objects. They can have ``contents'', attributes, and be the source and destination of other links
f A link has a ``type''
the ``type'' is just a character string
any number of links of the same ``type'' on a node, but they are unordered
f Links point at objects. To get finer granularity, attributes may be used in an application specific way.
Naming of documents
f Hierarchical directories
a slash separated path name (``[]<>'' format accepted and converted)
loops allowed, but a server specific evaluation depth will be imposed
directories have as members
directories
hard links to other directories
symbolic links
f Ordered or unordered children links
document can have subdocuments (children); there is an order to the children
f Not all objects have names
Containers
f An object belongs to some object, its container
f The container does not have to have a name
f The containment graph describes the container relation
Versions and alternatives
f Objects can have versions and named alternatives
f Directories have numbered versions and named alternatives
f A partial name looks like objectName%alternative!version
``/Luther.alpine/Hagmann.pa/Buffering%FCFS!5/TechnicalReport.tioga!3'' is a valid file name
f Versions and alternatives are maintained for both named and unnamed objects
f A forward or backward ``delta'' technique is used to conserve storage
f Minor versions, called revisions, are kept until the next version
Attributes
f May be associated with an object (document or link)
f An attribute has a value
f A value is an unordered or ordered set of values or a primitive document element
f Interpreted attributes may be used in queries (e. g., keywords)
Keywords
f Similar to attributes, but are distinguished
f Keywords have a string name
Indices
f Maintain indices for attributes and keywords in the containment graph at the levels specified by the "owner" of the subtree
f An index at a level replicates data of indices at lower levels
Updates
f Primitive navigational updates (e.g., link A to B, add attribute C of type D value E to F, remove G, store document H)
f One server interaction can create an object, set its contents, set its attributes, and create links from/to it.
Alerters
f Simple triggers on insert, delete, read, or update
f Trigger queues a message for some (probably) external server
f Uses for alerters:
screen refresh due to update of a document being viewed
cache maintenance on client machines
triggering of services (e. g., automatic indexing of a new document)
Queries
This is only partly specified at this time.
f A query results in a set or an enumerator (a list of dynamically constructed handles) of the matching objects. Each object may be examined further by the code under the enumerator (e.g., the attributes may be examined or the document read).
f A subquery is specified by object-selection and a match-condition, which yields a new subquery.
f A primitive subquery is specified by name-pattern and a match-condition.
Transactions and concurrency control
Locking
f a document (object) is the natural unit of locking
f locks come in (at least) three flavors
browse lock: the weakest of all the locks. Can break when a stronger lock is granted or a transaction commits that holds a write lock. Useful for alerters.
read lock: locks the object so that it cannot be modified
write lock: locks the object for modification.
Transactions
f they are supposed to be short
Security
f Use physical security. Don't bother to encrypt the disk.
f Use ``secure'' RPC, or other ``secure'' protocols.
f Access control lists permit access.
f Restrict access via naming, indexing, and links.
Administration
f Implement historical logging and monitoring
f Have hooks for debugging
f Backup
f Restrict access via naming, indexing, and links.
f Garbage collection via delete
Real system
f Standards
authentication
protocols
secure RPC
character sets
f Replication and availability
all committed data always stored in two stable places
multiple CPU's per server for availability and load balancing
redundancy using mirrored buses, communications networks, controllers, and disks and/or dual porting of controllers and disks
parity disk trick to mask failure of a disk drive
mirror the log
f Robustness
f Fast recovery
recover the server in under two minutes after boot completes if
1) there is power
2) the net is up
3) there are not multiple severe failures
f Performance
performance, performance, performance