Introduction and Overview
This memo covers the conceptual design for the data model for a "document server". The name for the system is Yggdrasil. Most of the data model covered in this memo has been discussed in a working group that consisted of, at various times, Danny Bobrow, Randy Gobbel, Jack Kent, Stan Lanning, Mark Miller, Randy Trigg, and the author. This memo's contents is a summary written solely by the author, and should not be construed as the unanimous opinion of the group.
Xerox also has the current focus of "document processing." Central to most (if not all) document processing systems is the storage and retrieval of documents. This system will focus on the retrieval of documents.
Yggdrasil objects are electronic documents. Although there is no precise definition of "electronic document", a Yggdrasil electronic document will have (some of) the following features:
f Primitive elements of the content of a document includes plain text, structured text, scanned image, mail message, digitized voice, digitized video, and compressed versions of the above. This is an extensible list.
f A document is a primitive element with an (optional) ordered collection of documents. These documents are its children. Note that is a recursive definition.
f Documents have attributes; some attributes are interpreted and some are uninterpreted. Keywords are distinguished attributes.
f Documents may be interconnected with links; links have names (types). Links are first class objects and can have contents and attributes.
Yggdrasil is a basic storage and retrieval system, together with its associated administrative functions. Yggdrasil will depend on its clients to specify keywords, attributes, names, links and the like. The amount of client code run on the server will be severely limited. Yggdrasil objects are data, not active objects. An Yggdrasil server can be used to emulate a file server, to store the data for our programming environment, and to store hypertext.
Yggdrasil requirements include the storage of documents of varying sizes, versions and alternatives, compression, archival storage, the naming and indexing of (some of) the documents, the interconnection of (some of) the documents, and notification of significant events of documents. Yggdrasil will support three types of retrieval. First, the naming system allows for finding objects (as is performed in file systems). Second, Yggdrasil will index keywords (and other attributes?) and optimize queries on them. Third, Yggdrasil will have links (as in hypertext exemplified by Notecards) between objects.
One of the goals of Yggdrasil is to make the server handle "large" amounts of data. Yggdrasil will have a storage hierarchy that will span main memory, rotating magnetic, and jukebox optical disks. The initial system will store about 100 Gigabytes, but the system will be designed so that it can support dramatically more storage and a variety of secondary storage devices. For large objects, page level access will be provided.
Yggdrasil must be efficient. It should stress the CPU, memory, I/O, and communications of the server. It also must be robust, have good availability (likely by using replication), and have hooks for multi-server and foreign server support.
Yggdrasil will support a query language to aid in the retrieval of documents. Transaction support will be provided. Locking will be at the granularity of a document.
Yggdrasil will be the central code of several server systems. Yggdrasil could have front-ends that implement different protocols. Likely choices for protocols are for an File Server, an OSI File Server, an NFS, or a native Document Server. A single server could export multiple interfaces over the same "Document Base" if they are compatible. Yggdrasil should be configurable with various secondary storage devices to meet a variety of requirements for response time, storage capacity, archival storage, backup, and robustness.
We in CSL are interested in providing the basic storage and retrieval system, together with its associated administrative functions. We have cooperated and hope to continue to cooperate with other groups to make this a useful system. Other groups could provide an automatic keyword generation, selective dissemination of information (SDI), inverse typesetting, document capture (e. g., scanning), and other facilities.
Historical Context
Many organizations require permanent (more or less) storage of information. Over the years, there have evolved three types of electronic services that have satisfied parts of the problem: databases, file systems, and file servers.
Each of these systems have their strengths and weaknesses. Over time, the problems and technologies change. What was appropriate at one time is not a good solution later.
The cost of secondary storage has continued to decrease over the years. In addition, the range of size of individual objects that we wish to store has grown (one storage object in CSL is a 160 megabyte file). We are not always satisfied with the simple yet elegant hierarchal naming systems such as we have in file systems. Nor are the concepts of relational database systems a panacea: it is missing several features that make it difficult to use to model some applications (e.g., good solutions for repeating groups, hypertext, stored procedures, triggers, transitive closure, large objects, ...). Object-oriented databases are a current research topic, but significant problems arise in their performance, locking, security, evolution, archiving, execution model, robustness, as well as other problems.
There seems to be a good deal of overlap between the needs of several groups to store and retrieve documents. The needs include the storage of documents of varying sizes, the naming and indexing of some of the documents, the interconnection of (some of) the documents, and notification of significant events of documents.
Data Model
Objects
Links and objects
An Yggdrasil server contains a collection of objects. Each object has three parts: a contents, a set of attributes, and an ordered set of children. There is special kind of objects called links. They have a "link type" (a string that stands for a name) and "references" to source and destination objects (see below). Links encode some kind of semantic connection between two objects of the variety of the link type. We call a pointer to an object a reference (see below).
Contents
The contents of a object and the primitive value of an attribute are primitive elements. Primitive elements include references to objects, integers, infinite precision integers, floating point numbers, booleans, dates, user identifiers (e. g., Grapevine or Clearinghouse names), plain text, structured text of known formats, scanned images, mail messages, digitized voice, digitized video, uninterpreted byte arrays, and compressed versions of the above. This is an extensible list. Primitive elements are actually a pair: the type and the value. However, the system attaches no semantics to types that are not well known.
Children
Children form an ordered or unordered set of objects attached to another object. Whether the children are ordered or not is a property of the object. For ordered children, Yggdrasil will provide for processing this set in order and for indexed access to elements (e. g., access the fifth child).
Attributes
Any object can have attributes. This is a set of pairs. The first field of the pair is the "name" of the attribute and is called the attribute name. The attribute name is specified as a string. Yggdrasil does not guarantee that a particular attribute name is unique for an object (but higher level software may make this guarantee.)
The second field of the pair is the "value" of the attribute and is called the attribute value. An attribute value is a set of name-value pairs. The name is called the field name and is specified as a string. Null is used when there is no field name (a common occurance). Field names are not necessarily unique within an object. The value part of the name-value pair is a set of typed primitive elements. A typed primitive element is a type and primitive element pair (primitive elements were discussed above).
Some types are interpreted and some are uninterpreted. That is, some types are understood by the server. Type dependent operations can be performed on them (e. g., testing whether two dates are within a week of each other.) Uninterpreted types only support default operations. The only default operation may be byte comparison for equality.
Attribute names starting with "$" are reserved for system use, but are available for read-only access via normal attribute operations. $directoryContents, $contents, $children, $inlinks, $outlinks, $alternative, $keepRevisions, and $keywords are interesting reserved attributes.
Ordered and unordered sets
The notion of set occurs in several places in this data model. Often the set is unordered. Sometimes sets may be ordered or unordered, with selection of ordering is a property of the set (or object). The term "ordered set" is used instead of "list" because "list" has certain implementation connotations.
It is intended that unordered or ordered sets can be sorted by the server, producing a new ordered set that can be used (further) in query processing or stored on the server. However, this is beyond the scope of this memo.
Creating and deleting objects
Yggdrasil will provide means to create and set the contents, attributes, and links of objects. Explicit object delete will be provided. Yggdrasil will not support "garbage collection".
References to objects
References are a primitive element. They are pointers to objects. They do not guarantee that the object still exists. However, the address space for references is not reused. Hence, a reference to a deleted object will never be confused with a reference to an existing object. Deleting an object does not have the side effect of removing references to it.
References come in three types: historic, version, and foreign. Every object has an identity which it retains no matter how much it changes. A reference to it is called a historic reference. A reference to a particular version/alternative of an object is a version reference (see below for more discussion of versions and alternatives). A reference to a foreign system (another Yggdrasil, a different document base system, a read-only data storage system, or whatever) is a foreign reference.
Links between objects
Links are first class objects. In addition, they contain a link type (specified by a string) and references for the from and to objects. Links guarantee that both the from and to references always points at objects that still exist. [When the to object is stored on a different server this guarantee does not hold.] The from object is always a local object on the server (see Surrogates below). Deleting an object has the side effect that it deletes all the links from or to it. This is a recursive process.
Links can be thought of as properties that satisfy some extra invariants. In fact, the implementation will probably keep out links as a special kind of property. However, by describing and treating links specially, the model is easier to explain.
Containment
Any object may have other objects that belong to it. That is, one object is contained in another object. They are in the belongs to bag for the object. An object may belong to many other objects, or it may belong to no other object. Objects may be freely added and deleted from belongs to bags. Indexing side effects may occur during addition or deletion. However, children always are in the belongs to bag of their parent.
An object may have a primary container. This is used for a "hint" for data storage (see below).
An object's bag may be ordered. That is, there is an order to the objects in the belongs to bag for the object. Child order is the same as the order in the belongs to bag.
An object A is recursively contained in another object B if the transitive closure of the belongs to relation is computed starting with A. An object A is primarily recursively contained in another object B if the transitive closure of the belongs to relation, restricted to primary containers, is computed starting with A.
It is intended that containment can be viewed as a DAG. However, cycles will not be prevented and the server will not fail if a cycle is present.
There is no enforced relationship between containment and children. Applications may wish to enforce their own invariants.
Connections
Clients will communicate with Yggdrasil remotely via connections. A connection takes place between two authenticated individuals: the client and the Yggdrasil server. A client may have more than one connection to a server. Servers maintain a large number of concurrent connections.
Environment
A connection has some properties. One of them is called the environment. It contains certain default information about the connection. Some of this information is:
working node: the default bag where to create new objects and the default naming context (similar to the concept of working directory).
alternative/version selector: the selector to impose on an object when a historic reference is followed. It can be used to specify a named alternative or a time.
Naming of objects
Yggdrasil has a "hierarchical" directory naming system similar to names in file servers and file systems. Each name is the concatenation (with an inserted "/") of directory name parts. Consider a directory in the name path of some name. Some directory name parts have already been used to find this directory. The next directory name part in the name specifies an object that is (or should be) in the directory. The name part has a stem (a string of a limited alphabet) together with an alternative and version selector (see Versions and alternatives below). The next directory name part is looked up in the directory. If found, it binds to another object. Name lookup continues in this new object with the next name part. A directory always has a corresponding underlying object.
The directory naming system is not strictly hierarchical. Loops are allowed in the naming, but an Yggdrasil server can impose a limit as to the depth of any name evaluation. Loops are allowed since they occur naturally and are hard to prevent (particularly during complex version specifications).
Directories are attached to their underlying object via the $directoryContents attribute. [Yggdrasil should allow read-only access to this attribute in the same manner as access to any attribute.]
The use of naming can vary wildly. For an application like Notecards, names might only be used to name "Notefiles". There are few names compared to the number of objects. An application like the Cedar Programming Environment might have a name for every object. It might even have multiple names for many of the objects.
Indices
Indices are maintained for keywords and other attributes at some objects. Indices assist in queries since they are intended to keep efficient data structures about recursively contained attributes. Where to maintain what index is specified by the owner, administrator, or user of the system. Indices do not change any of the semantics of the system, but are performance accelerators for some query operations. An index at a object replicates data of indices at contained objects.
[The current implementation proposal is that an index is maintained for all objects primarily recursively contained in the indexed object. For containers that are recursively contained but not in the primarily recursively contained expansion, a search to entry is made in the index. Queries using the index must process the search to entries recorded in the index.]
Many applications will have indices only at named objects. Notecards might maintain indices at the "notefile" object and the Cedar programming environments might maintain indices at the module level. However, the system does not restrict indices to named objects.
Keywords
Keywords are a special kind of attribute. They are named by the $keywords attribute. Their value field names are always "word".
Updates
Yggdrasil will support at least an interface for primitive navigational updates. One server interaction can create an object (in some bag), set its contents, set its attributes, and create links from/to it. [Unresolved is the question as to whether bulk updates can be performed via the query language.]
Alerters
Alerters are a simple form of triggers. At some event on a Yggdrasil server, a "message" will be sent.
Events can include insert, delete, read, or update of some class of objects. The alerter will cause a message for some (probably) external entity to be queued. The alert will specify an object identifier for the object causing the alert, and optionally with a tag specified by the client. Alerts can either be one-shot or continuous. Alerts will be either for an ongoing conversation or for an external server.
If the message is for an ongoing conversation, then the message will be set via the conversation without acknowledgement. If the conversation terminates, then the messages queued for the conversation will be destroyed and no new alerts will occur for the conversation. This facility is meant to allow for screen refresh due to update of a document being viewed and cache maintenance on clients.
Alerts can also be for external servers. A set of servers will be registered at the Yggdrasil server for a service. Yggdrasil will distribute the alerts to the registered servers. Servers must acknowledge that they are done with the alert. If the server crashes, the alert will be delivered to another server.
Versions, revisions, and alternatives
An object has a current state: its contents, properties, links, containers to which it belongs, and its value as a containter. Objects can have also have versions, revisions and alternatives. An object, then, is really a tree of states. States can differ in any of contents, properties, links, containers to which it belongs, and its value as a containter. A new branch may be started from any state in the tree. This is called an alternative. Alternatives are distinguished by some attribute(s). These may include attribute values as well as time of updates. Major changes along a branch are called versions while minor changes are called revisions. Changes are applied at the leaves of the tree, which is called the current alternative. A new alternative at some state is viewed as a leaf (even though it has branches). A new version or revision is built whenever a transaction commits that changes some part of the object. The version or revision is added to the version chain for the current alternative.
A reference to an object is is either an object reference, a version reference or a revision reference. An object reference refers to the object as a whole, and must be evaluated in the current environment to get a particular version, revision, and alternative. A version reference is mostly evaluated and specifies a particular node in the state tree; it resolves to the most current revision of this node that is not a version. A revision reference is already evaluated and specifies a particular node in the state tree.
Versions, revisions, and alternatives apply to all objects, whether they are named or not and whether the object contains other objects. The versions and some alternatives are also visible in the name space. Each name part of a name looks like "objectName%alternative!version". (e. g., "/Luther.alpine/Hagmann.pa/Buffering%FCFS!5/TechnicalReport.tioga!3" is a valid file name.) The version specified by the "!" maps to a version reference. The "alternative" is found by evaluation the $alternative attribute. This attribute is system maintained and has the value of a string. Other alternatives are invisible to the naming system. Explicit revision references are not possible using the naming system.
Yggdrasil will use some "delta" and compression techniques. The system will be optimized for access to the current version.
Yggdrasil will not directly provide for the merging of alternatives. However, the system will keep alternative indices for containers (i. e., an index will be maintained as to which objects in a container have alternatives). Client level code can do the merge in an application specific manner for a container.
Not all versions are interesting. Yggdrasil will have methods to merge changes made by different transactions into one (apparent) update per container. Note that this destroys some serializability and referential guarantees that might otherwise be provided by the system. Also, objects normally keep the most recent revision. Objects may be flagged via the $keepRevisions attribute (set to TRUE) to also keep revisions.
Transactions and concurrency control
Transactions
Transactions are intended to be short. They start with a "Begin Transaction" that yields a transaction identifier. Any client can participate in any transaction, provided that the client has the proper authentication, rights, and the transaction identifier. Transactions end when the server aborts the transaction, or when a client issues a "Commit Transaction" or "Abort Transaction" primitive.
Locking
An object is the natural unit of locking. Clients and users of the system must understand this when they design the schema for a section of the document base. Although the unit of locking is an object, "page level access" will be provided for large objects (when the whole object is appropriately locked.)
Locks are granted to a transaction. Locks come in three types. A browse lock is the weakest of all the locks. It shares with any other lock. Alerters can be fired off of browse locks when another lock is granted and/or committed.
A read lock prevents modification of an object. Read locks share with other read locks and browse locks.
A write lock acquires exclusive use of the object for modification. It does not share with read locks nor with other write locks. It does share with browse locks, but commit of the transaction may (optionally) release the browse lock.
Locks are normally granted implicitly by access to objects. One lock option that will be supported is the wait/no-wait option: if a lock cannot be granted, the request can be queued waiting for its release or the lock request can immediately fail (without destroying the transaction).
Long term locking
Yggdrasil does (almost) no long term locking. Transactions are intended to lock a modest number of objects. A long lived transaction should only hold browse locks. Clients are encouraged to use attributes and the short term locking mechanisms to implement client level long term locking (e. g., check-in and check-out.)
Surrogates
One meaning of surrogate is "a thing that is substituted for another." One restriction of links is the the from reference must be local. To handle foreign objects somewhat more gracefully, a surrogate object can be constructed. The surrogate is treated as local but it is bound to a foreign object. The contents, children, and attributes of the surrogate may all be specified locally or may be a combination of remote and local values.
Access control
Access control will be provided by Yggdrasil. While processing the naming hierarchy, access control will be done like it is done for (some) file servers. Each directory will have a list, the access control list, that controls the operations groups of users can perform (read, create, write, create link to, create link from, examine index, ...). Yggdrasil will use existing authentication servers (e. g., Grapevine and Clearinghouse) for individuals and groups. However, the access control will allow set operations on groups (where an individual is viewed as a singleton set). In particular, Yggdrasil will have set intersection and difference, as well as the more popular set union.
In addition, access to individual objects may be controlled. A link may have an access control list that is examined whenever it is followed. Any object may have access control specified for it. Access control may be specified for the bag of an object. Individual attributes may also have access control specified for them.
Users may make private additions to public data. They accomplish this by adding links or attributes that are protected from some other users. The change would only be seen by some set of users via the access control mechanisms.
Hints
Yggdrasil will accept hints from the owner/administrator of objects. They will include a hint for data storage. This is used both for increased performance by clustering and for collocation of data for removable media. Normally, objects contained in another object are stored near each other.
Another hint will be for data compression. Users/administrators will be able to set default compression for contents of a given type. Individual instances of contents may also have a compression hint.