Yggdrasil - Filing Symposium
April 25-28, 1989
Yggdrasil

A large scale hypertext database system
Bob Hagmann
Xerox PARC/CSL
Introduction
Problem
Store and retrieve large numbers of possibly large objects (electronic documents)
Motivations
research
support PARC's & CRG's filing/database needs
good job for PARC Ò good job for Xerox
Xerox's document processing strategy
Goals
Handle large number of objects of any size
Use an appropriate data model
Non-navigational access
Deal with scale
Interoperate
Have transactions, robustness, good performance, fast recovery, good availability, and access control
Have hooks for building on system
Project Name
Ygg|dra|sil n. Also Yg|dra|sil. Norse Mythology. The great ash tree that holds together earth, heaven, and hell by its roots and branches. [Old Norse, probably the horse of Yggr'' : Yggr, name of Odin, from yggr, variant of uggr, frightful (see ugly) + drasill, horse.]
Computer Science Challenges of the 1990's
Ubiquitous computing
Parallelism - massive parallelism, neural nets, and multiprocessors
Large scale distributed systems
Your favorite project goes here
Æ Using low cost, moderate access time, tertiary storage
optical tape (e. g., XDM and ICI's Digital Paper)
high density magnetic tape
transverse, helical, and longitudinal recording
???
For example: magnetic tape
4 mm, 8 mm, VHS, 19 mm and 1" tape under development
Ampex ACL
19 mm tape
81 GByte/tape
15 mbyte/sec
library:
256 cassettes
load/unload 8 sec
search 30 sec
55 GByte/ft3
366 GByte/ft2
$60/GBtye
Documents
Document: stored information that can be perceived by humans
Examples: text files, object files, PDL files, books, newspapers, movies, audio tapes, music, statues, paintings, fingerprints, and frescos.
Electronic documents: the subset of documents that can be dealt with by computers
Historical Perspective on Stored Information (European view)
Middle ages (guilds; oral tradition; bards; manuscripts and libraries owned by religion)
Moors fall => Arabic libraries into Christian hands (7th C.) and monks translate to Latin
Paper arrives in Europe (11th to 12th C.)
Secular universities (12th C.)
Renaissance: growth of individual (14-16th C.)
Printing press (~1450)
Scholarly journal (17th C.)
Public libraries and public education (lending library 18th C.; free public library 19th C.)
Xerox 914 copier (1959)
Perspective on Computers (European view)
Boxes of cards
File systems
Access methods for files
Hierarchical and network databases
Relational databases
???
My Local Branch Library
Books
large print, childrens
Magazines
Newspapers
Consumer information
Transportation and community information
VHS tapes
CD's
Audio tapes
Records
Technology Change
Optical disks
Optical disk jukeboxes
High capacity magnetic disks
Decreasing cost of main memory
Fast commercial microprocessors/workstations
Æ High capacity optical/magnetic tapes
Scanners
FDDI communications
Fax
Electronic printers
CD ROM
Information services
Ò deal with change and scale
Perspective Summary
Vast changes over 15 Centuries
"Information worker"
"Future Shock"
Amount of information growing
Speed of delivery increasing
Media and delivery systems evolving
Project
Build a large scale hypertext database server
No user interface -- this is a database
Terabyte in scale
ter|a|tol|o|gy n. The biological study of the production, development, anatomy, and classification of monsters.
RISC processors, Dragon, and beyond
Ethernet or FDDI
Lots of memory (100's of megabytes)
Lots of MIPS (10-100's)
Modest number of processors (1-16)
Magnetic disk cache (10's of gigabytes)
Lots of optical disk (a terabyte)
Mach and Camelot
TCP/IP and Xerox XNS protocols
Written in Cedar
Cedar, Mach and Camelot
Cedar
From Mesa: Algol family language with strong typing, monitors, threads, exceptions, ADT's
Cedar adds garbage collection, generic types, lists, atoms, ...
Language and runtime "up" under SunOS and partly Mach
Mach
Threads, message based, multiprocessor, external pagers
Camelot
Transaction facility on top of Mach
Logging, commit message protocols, recoverable storage management, media recovery, backup, name service, ...
Might be throw away
Seven Key Ideas
Objects ({ documents)
Typed links
Objects have properties ({ attributes)
Containers group objects
Set oriented non-navigational access
Objects can be named
Versions and alternatives for objects
System Summary (part 1)
Server architecture
Large number of objects of vastly varying sizes
Document ``types'' (extensible) - few interpreted at server
Hypertext: documents can be connected via links
Documents can be named (Ò File server interface)
Documents can have attributes and keywords
Documents are grouped into contexts called containers
Keyword and other indices maintained per container
Versions and alternatives
System Summary (part 2)
Data compression and decompression
On-line archival storage
RPC marshalling code to encode contents
Alerters (send a message when an event occurs)
Page level access, access control, transactions, robust, performance, recovery, and availability
Hooks for multi-server and foreign server support
Not an OODBMS
``Store the bits and get out of my way''
Major execution on the server not a requirement
H RDBMS execution
Full blown execution is hard
Performance
Locking
Security
Evolution
Deadlock
Match of execution model to programming model
Query optimization
Looping
...
Simple execution is doable
Leave It To The Client
Find the set objects non-navigationally
Let the client further filter objects
Let the client build appropriate data structures and use semantics for the current problem
Let the client worry about semantic changes
Add hooks to the database
Alerters (e.g., be informed when something changes)
Type system
Yggdrasil Phases
0
build hypertext, naming, indexing, and containers
use Camelot/Mach
skip systems issues: performance, recovery, availability, access control, alerters, archival storage, and data compression
postpone versions and alternatives
1a
archival storage
1b
versions and alternatives
1c
alerters and better locking
1d
availability, access control, and data compression
2
non-navigational access
Yggdrasil Status
High level design done (0 and 1)
Wildly coding and starting testing of 0 on Dorados
Porting to Sun OS
Getting Cedar up on Mach
Getting equipment
larger machines
optical disk jukebox
early use as on-line archival storage for System/33 and file servers without Yggdrasil
Relevance to Products
Short term: ideas and vaporware
Medium term: a prototype that
has a good data model for documents - hypertext
is a featured hypertext database system
merges file systems and databases
has an appropiate transaction and locking model
is set oriented
deals with complex objects
is large scale
uses non-navigational access to hypertext
supports other research
Long term: a prototype that also has
highly available, distributed computing
exhanced exectution on the server
non-navigational access to hypertext