Systems Summit: Filing Infrastructure
April 12, 1988
Filing Infrastructure

Bob Hagmann
PARC CSL
Introduction
Database effort in PARC CSL
research
support our filing needs
basis for retreival experiments
store images, programs, text, audio, mail, ...
document processing
Project Name
Ygg|dra|sil n. Also Yg|dra|sil. Norse Mythology. The great ash tree that holds together earth, heaven, and hell by its roots and branches. [Old Norse, probably the horse of Yggr'' : Yggr, name of Odin, from yggr, variant of uggr, frightful (see ugly) + drasill, horse.]
[Artwork node; type 'Artwork on' to command tool]
Picture of Yggdrasil
History
Database group in CSL looked for a focus
started talking with PARC/ISL Notecards group + friends
Roger Levien states that a ``core'' business for Xerox is ``filing'' (!!!)
PARC's OCM made a case to Allaire for ``Document Processing''
PARC OCM founded Six Committees
Æ Document Data Base Foundation Layer
Needs
``Store the bits and get out of my way''
Three principle sources of needs
Distributed Notecards in ISL
Large capacity and high performance file server
Software storage for programming environments
Others
FolioPub project at WRC
ERSG proposal at WRC
VLSI DA (CAD) group in CSL
Voice project in CSL
Storage of scanned images
Mail storage
TIC support, both at WRC and PARC
...
What software is available?
File Servers: NFS, Alpine, IFS, and XNS
Relational databases
Other databases
Image storage systems
Full text search
...
Leverage
Basis for lots of other stuff
Distributed Notecards in ISL
Large capacity and high performance file server
Software storage for programming environments
``Beyond mail''
Intelligent retrieval
...
Technology Change
Optical disks
Optical disk jukeboxes
High capacity magnetic disks
Decreasing cost of main memory
Fast commercial microprocessors/workstations
High capacity optical/magnetic tapes
Scanners
FDDI communications
Fax
Electronic printers
CD ROM
Information services
Ò deal with change and scale
Overview for Rest of Talk
System Summary
Objects and links
Containment and Indices
Naming of objects
Alerters
Versions and alternatives
Transactions and concurrency control
Access control and multi-server operations
System Features
Yggdrasil and IFS Comparison
Yggdrasil Phases
Yggdrasil Status
System Summary
Server architecture
Large number of documents of vastly varing sizes
Document ``types'' (extensible) - few interpreted at server
Hypertext: documents can be connected via links
Documents can be named
Documents can have attributes and keywords
Documents are grouped into contexts called containers
Keyword and other indices maintained per container
Versions and alternatives
Data compression and decompression
On-line archival storage
Alerters (send a message when an event occurs)
Page level access, access control, transactions, robust, performance, recovery, and availability
Hooks for multi-server and foreign server support
Objects and Links
Links are specialized objects: they have ``from'', ``to'', and type of link
type is just a string
Objects have contents
``Page level'' access provided for large objects
Objects may have attributes
named by string
list of fields
each field has a list of ``values''
Containment and Indices
Objects are contained in other objects
they are its children
Children may be ordered or unordered
An object may have many parents
Indices are built as specified by the clients
only certain attributes are indexed (e. g., "keywords")
Containment forms a graph. Indices are built, as required, up the graph.
The graph is not a DAG Ò index update is hard
can't build a full index if objects are in multiple containers
Naming of objects
Some, maybe all, objects have names
``hierarchical'' directory naming system
Loops allowed
Alerters
Simple triggers
On a system event, send a ``message''
Events can include insert, delete, read, or update
One use is for cache or screen maintenance
Versions and alternatives
Alternative tree
Branches called alternatives
Extensions of branches called versions or revisions
versions are major
revisions are minor
Naming environment binds to some revision/version/alternative
Default is unnamed alternative with the highest committed version
Object, version, or revision reference to an object
Merging and compression of alternatives
Transactions and concurrency control
Transactions intended to be short
Object is the natural unit of locking
maybe pages too
Read, write, and browse locks
Access control and multi-server operations
Access control lists
Surrogates stand for a foreign object
Links to foreign objects
Federated servers, not unified servers
System Features
High performance
Robust
Available
Multiple frontends for server
Fast crash recovery
Multiple protocols
e.g., XNS Filing, NFS, and native
Portable
Written in Cedar
use ``Cedar to C'' translation
...
Yggdrasil and IFS Comparison
topic IFS Yggdrasil ratio
size 1 Gbyte 1 Tbyte 1000
CPU .2 MIP 8 MIP 40
memory 128 KB 128 MB 1000
net read 28 1000 35
bandwidth
(Kbytes/sec)
latency 50 msec 3-30 msec 16 -
- 1 min  20000
Yggdrasil Phases
0
build hypertext, naming, indexing, and containers (mostly)
use UNIX File System
skip systems issues: performance, recovery, availability, access control, alerters, archival storage, and data compression
postpone versions and alternatives
1a
redo file system (performance and recovery)
1b
archival storage
1c
versions and alternatives
1d
availability, access control, alerters, and data compression
Yggdrasil Status
High level design done
Starting coding
mostly modifying Alpine at this point
Recruiting