FROZEN DRAFT -- Aug 25, 1987 -- FROZEN DRAFT
Internal Memo
ToFrom
DistributionBob Hagmann
 PARC/CSL
SubjectDate
Interim Document Storage and Retrieval System Design Aug 25, 1987
[This memo is a snapshot of the ideas for IDSARS as of Aug 25, 1987. The document structure for the system has evolved away from this format. I have not edited this memo for a month. Hence, I've decided to freeze this draft version of the memo. It does not match the current thinking on several area, notably in naming, versions, and alternatives.]
Introduction
Many organizations require the more or less permanent storage of information. Over the years, there have evolved three types of electronic services that have satisfied parts of the problem: databases, file systems, and file servers.
Each of these systems have their strengths and weaknesses. Over time, the problems and technologies change. What was appropriate at one time is not a good solution later.
The cost of secondary storage has continued to decrease over the years. In addition, the range of size of individual objects that we wish to store has grown (one storage object in CSL is a 160 megabyte file). We are not always satisfied with the simple yet elegant hierarchal naming systems such as we have in file systems. Nor are the concepts of relational database systems a panacea: it is missing several features that make it difficult to use to model some applications (e.g., good solutions for repeating groups, hypertext, stored procedures, triggers, transitive closure, large objects, ...). Object-oriented databases are a current research topic, but significant problems arise in their performance, locking, security, evolution, archiving, execution model, robustness, as well as other problems.
There seems to be a good deal of overlap between the needs of several groups to store and retrieve documents. The needs include the storage of documents of varying sizes, the naming and indexing of some of the documents, the interconnection of (some of) the documents, and notification of significant events of documents.
The following is a partial list of CRG projects, proposals, groups, and needs that could be served by a Document Server.
f FolioPub project at WRC
f ERSG proposal at WRC
f VLSI DA (CAD) group in CSL
f Voice project in CSL
f Software storage for the Cedar Programming Environment in CSL
f Storage of scanned images
f Distributed Notecards in ISL
f Large capacity and high performance file server
f Mail storage
f TIC support, both at WRC and PARC
Xerox also has the current focus of "document processing." Central to most (if not all) document processing systems is the storage and retrieval of documents.
We are interested in the subclass of documents called electronic documents; these are "documents" that can be handled in some way via computers. Although there is no precise definition of "electronic document", we believe that a stored electronic document should have (some of) the following features:
f Primitive elements include plain text, structured text (e.g., a Tioga or Viewpoint file), scanned image, mail message, digitized voice, digitized video, and compressed versions of the above. This should be an extensible list.
f A document is a primitive element with an optional ordered collection of documents; note that is a recursive definition
f Documents have attributes; some attributes are interpreted and some are uninterpreted. Keywords may be distinguished attributes.
f Documents may be interconnected with links; links have names
The basic idea is to build an "Interim Document Storage and Retrieval System" (IDSARS) that will be the central code of several server systems. IDSARS could have front-ends that made it looks like an XNS File Server, an OSI File Server, an NFS, or a native Document Server. A single server could export multiple interfaces over the same "Document Base" if they are compatible. IDSARS should be configurable with various secondary storage devices to meet a variety of requirements for response time, storage capacity, archival storage, backup, and robustness.
We in CSL are interested in providing the basic storage and retrieval system, together with its associated administrative functions. We hope to cooperate with other groups to make this a useful system. Other groups could provide an automatic keyword generation, selective dissemination of information (SDI), inverse typesetting, document capture (e. g., scanning), and other facilities.
Basic Features
f large numbers of documents
f documents can be large
f documents "types" include text, bitmaps, digitized audio, digitized video (?), graphics, ...
f documents can be connected via links (hypertext)
f documents can be named via a hierarchical name system
f documents can have attributes and keywords
f keyword and other indices maintained
f support for versions and alternatives
f data compression
f support for version histories and archival storage
f alerters (send a message when a system event occurs)
f query language and query optimizer
f page level access and locking within a document (?)
f transactions
f robust
f good performance
f good availability (replication?)
f hooks for multi-server and foreign server support
Hardware Base
f > 1 MIP CPU (or much more)
f > 8 megabytes of main memory possible
f e 2 megabytes of main memory minimum
f Good communications hardware
back-to-back packets
sufficient bandwidth (10 megabits/second or much more)
f SCSI bus (or other way to connect efficiently to a large amount of storage)
must be of the right flavor for WORM optical and magnetic hard disks (if WORM optical not on a separate server)
clock speed and synchronous/asynchronous must match on bus
f Optional WORM optical disks, possibly with a jukebox
might be accessible over the net instead of connected by a bus
f Magnetic hard disks
possibly in large numbers even if WORM optical disks are used!!
2-10 Gigabytes
varying capacity and performance
mirrored hardware (?)
f Plan for read/write optical disks
will not perform as well as magnetic hard disks (at least at first)
f Plan for stable main memory
battery backed RAM
Issues
f Commercial hardware platform
f Hardware support
f Flexible to accept new storage and communications technology
Operating System and Programming Environment
f Supports sharing and concurrency; choose one of
1) multiple concurrent threads sharing the same address space
UNIX System V.3, SUN UNIX, and Cedar are examples
2) extremely fast IPC
do I/O in "helper" processes (e.g. V)
3) asynchronous I/O (e.g. UNIX 4.3 BSD)
use a single process that never blocks, except for when it want to
build/use a lightweight process package
f Good existing communications
supports protocol(s) needed for server
f Reasonable programming environment support
f Reference: papers on X implementation
Issues
f Internal programming environment verses commercially available "environment"
language features
Cedar is built for large systems
ease of porting to a different environment
f Performance for sharing of data/processing and communications
f Support likely for new protocols
f Speed of communications and new technology
Hardware, OS, and Programming Environment Options
f Cedar
Use a Dorado and Cedar
Talks Pup and XNS, plus TCP/IP should be started this summer
Lots of code available, notably Alpine
To talk SCSI, do either:
Get the Misc Board built (10-mHz ethernet and SCSI)
Use a Ethernet/SCSI interface (Siemens)
Use a Dicentra
Put a 10-mHz ethernet board on the Dorado (preferred, maybe required)
Talk 3 or 10 mHz ethernet to clients
Build software to allow remote access to SCSI on Dicentra
Use Multibus-SCSI adapter
Port to non-Dorado by either:
The Mimosa Cedar to C translator and run under other OS (e.g., UNIX)
Dragon
Mimosa retargeted to other instruction set
f SUN UNIX
Use a SUN-3, SUN-4, or ...
Higher performance CPU's coming (?)
Talks XNS and TCP/IP
We own little code, but there's a lot of it out there
Flexible I/O (SCSI and friends)
f System V UNIX (and close relations)
Use any machine that talks some protocol, and supports flexible I/O (SCSI).
e. g., use the new HP RISC workstation (?)
Communications seems to be a problem
High Level Protocols
f "Native" protocol with full support for data model (first version)
f Build server so that a protocol veneer can be built later to handle (at least)
XNS Filing
ANSI/CCITT/OSI/... Filing and Retrieval
NFS
If the system talks Pup, do Pup FTP too (?? Leaf and Chat)
Issues
f Low level protocol(s) supporting "native" protocol
Data Storage
f Storage hierarchy
memory, magnetic hard disk log, magnetic hard disk buffers, WORM optical disk, ...
f Log based recovery
"new file" trick used on Alpine: write new data to free pages and commit and log only small amounts of data; this avoids double writing - once to the log and once to the file
f Use group commit for high update load, if needed
f All naming, indexing, buffering, linkages, etc. are saved and updated on magnetic hard disk (or other fast secondary storage)
f WORM optical disks will be self identifying upon removal (assuming that the optical disks are directly connected to the server)
not self identifying when they are de-selected or after every update
includes naming, indexing, and linkages
(assumes we use the optical disks directly instead of through some other server)
f Storage hierarchy: data/naming information/indices/etc. can be in/on:
memory buffer
stable storage (?)
log
magnetic hard disk buffer
WORM optical disk
f Delay processing (e.g., indexing)
accept "image" quickly; index at leisure (?? via external server)
Ò stores quickly and avoids conflicts
f Large object support
f "Real time" support -- enough for voice (?? video later)
f Archival storage
a) write to archival device when not referenced recently
b) write to archival device ASAP but when convenient
c) write to archival device preserving time order of documents (legal requirement)
f Plan for new technologies and changes in old technologies
XDM might replace WORM optical disk
optical tape
f Key is to build a framework for varying performance storage devices; the actual devices might change
Issues
Where do we store the attribute/keyword indices (see below)? How about at the leaf directories for objects in directories. For all data in a service (top level directory), store all attributes/keywords for all objects.
Compression
200 spi Ò 3,740,000 bits per 8ý x 11 page; @ 10:1 compression, that's about 50K bytes per page image
600 spi Ò 33,660,000 bits per 8ý x 11 page; @ 10:1 compression, that's about 420K bytes per page image
Rumor has it that one hour of video can be stored in 300 megabytes, using some compression algorithm. That's 83,333 bytes/second or 2/3 of a megabit per second.
If compressed data via some algorithm is ever stored on the server, the server must forever retain the ability to decompress it itself! (Unless all copies of data using this compression are deleted.)
How do we merge compression and versions? "Big files" are bitmaps, DBMS files, logs, and big "ASCII" files. Requirements seem to be:
 short ascii bcd Big files
compressed what? file file pages
random access no no (?) yes
fast linear read yes yes yes
backup incremental whole file ok incremental
old versions deltas whole files deltas
Issues
What compression (standards?) should we support?
Data Model
An object in the system is either a document or a link. Documents and links are almost the same. All objects (both documents and links) can have "contents" as well as attributes (a property list). Documents and links both can have names. Links have both source and destination objects. A link also has a "type" unless it is a child link. The "type" is just a character string.
Naming of documents
f Hierarchical directories
a slash separated path name ("[]<>" format accepted and converted)
loops allowed, but a server specific depth will be imposed (??)
directories have as members
directories
hard links to other directories
symbolic links
f Ordered children links
document can have subdocuments (children); there is an order to the children
f Typed links
any number of links of the same name on a node, but they are unordered
f Not all objects have names
Containerization of documents
f An object belongs to some named object
f The named object does not have to be a leaf in the naming tree
Versions and alternatives
[we should do something like the layers Danny talks about]
f Directories and objects can have numbered versions and named alternatives
f A partial name looks like objectName%alternative!version
"/Luther.alpine/Hagmann.pa/Buffering%FCFS!5/TechnicalReport.tioga!3" is a valid file name
f versions and alternatives are maintained for both named and unnamed objects
Issues for versions and alternatives
f is "definition groups" the right model?
f how can we extend definition groups to hypertext and multimedia?
f which "delta" technique do we use?
Attributes
f May be associated with an object (document or link)
f value are primitive document element
f Interpreted attributes may be used in queries (e. g., keywords)
Issues for attributes
f an attribute is either
1) a triple: an attribute name (string), a type, and a value
2) an attribute name (string) and a list of triples. Each triple has a field name, a type, and a value
f can a "value" be a list of values?
f are attribute names unique for an object?
f for option 2, are the field names unique for an attribute?
Keywords (?? as distinct from attributes)
f Similar to attributes, but are distinguished (?)
f Keywords have a string name
Indices
f Maintain indices for keywords in the naming tree at the levels specified by the "owner" of the subtree
f An index at a level replicates data of indices at lower levels
Issues for indices
f index on attributes too?
Updates
f Primitive navigational updates (e.g., link A to B, add attribute C of type D value E to F, remove G, store document H)
f One server interaction can create an object, set its contents, set its attributes, and create links from/to it.
Issues for updates
f bulk updates?
Triggers (well, really Alerters)
f Simple triggers on insert, delete, read, or update
f Trigger queues a message for some (probably) external server
f Uses:
Screen refresh due to update of a document being viewed
Triggering of services (e. g., automatic indexing of a new document)
Queries
[Hector is working on this.]
f A query results in a set or an enumerator (a list of dynamically constructed handles) of the matching objects. Each object may be examined further by the code under the enumerator (e.g., the attributes may be examined or the document read).
f A subquery is specified by object-selection and a match-condition, which yields a new subquery.
f A primitive subquery is specified by name-pattern and a match-condition.
f A name-pattern is a name expression (possibly "*") or it is a previously computed set
f Simple patterns are primitive names
f Primitive names may be composed via set operations or link following operations into a name-pattern
a set operation is like (foo* * *bar*) ) *tioga
link following:
children: name-patternb
named link: name-patternb(link-name-pattern)
all children: name-patternb*
all children over a named link: name-patternb(link-name-pattern)*
f Match condition
using interpreted attributes and their operations, or equality in uninterpreted attributes, compute the match
can apply match to a link to to the object being linked to
match condition may be omitted
may specify boolean operations on keywords
partial match
exactly n of m
at most n of m
at least n of m
f Ò RDBMS not good enough for partial match
Issues for Queries
f Who knows what syntax to use for the queries? The "*" and "b*", and "%" stuff is just a place holder.
f Support for (some?) known methods of finding documents
a) full text search
b) inverted word index
c) multi-attribute hashing
d) clustering
e) signatures
f) naming
g) attributes and indexing
h) weighted boolean
f other retrieval methods (?)
Issues for Data Model
f "Client" activity on the server
specify very restricted programming language
watch out for
concurrency control
crash recovery
looping
accountablilty
monitoring
Trojan horses
performance
naming (my data type "time" is different from yours)
modification of a type's semantics over time
f Extensibility
f Cocitations (Ò back links and queries)
f Is this good enough to store electronic mail?
f Is this good enough to store voice?
f Hints for data storage (clustering)
f Limited (extensible?) data types understood by server to allow query/indexing operations to occur on the server (evolving to allow relational DBMS?).
f What kind of queries are done in "information retrieval?" Do we need metrics (word distances)? How do the folks in the TIC want to query their databases? This could force huge indices!!!!!
f What can be an attribute?
f When can an object be garbage collected?
f Cross server links?
Transactions and concurrency control
Locking
f a document (object) is the natural unit of locking
f locks come in three flavors
browse lock: this lock shares with any other lock. It is the weakest of all the locks. Alerters can be fired off of browse locks when a stronger lock is granted.
read lock: locks the object so that it cannot be modified
write lock: locks the object for modification. Does not share with read lock. Does share with browse lock, but commit of the transaction breaks the browse lock.
Transactions
f they are supposed to be short
f there will be "Begin Transaction", "Commit Transaction", and "Abort Transaction" primitives.
Issues
f Long term locking (e. g., check-in and check-out)
Issues (general)
f Containment -- what does it mean?
f Legal requirements for documents (write once in proper time order)
f Security
f Standards (character set, naming, ...)
f Administration
f Debugging, historical logging, and monitoring
f Backup
f Private changes to public data (e.g., private annotation or link)
f Layers or object versions ?
f Pruning deltas
f Merging deltas
f Are all versions always available?
f Page level access
f Replication, availability, and redundancy
f Robustness
f Performance
f Access control lists? How are they done for unnamed objects?
f Garbage collection