Yggdrasil - Filing Symposium
April 25-28, 1989
Yggdrasil

A large scale hypertext database system
Bob Hagmann
Xerox PARC/CSL
Introduction
Problem
Store and retrieve large numbers of possibly large objects (electronic documents)
Motivations
research
support PARC's & CRG's filing/database needs
good job for PARC Ò good job for Xerox
Xerox's document processing strategy
Goals
Handle large number of objects of any size
Use an appropriate data model
Non-navigational access
Deal with scale
Interoperate
Have transactions, robustness, good performance, fast recovery, good availability, and access control
Have hooks for building on system
Project Name
Ygg|dra|sil n. Also Yg|dra|sil. Norse Mythology. The great ash tree that holds together earth, heaven, and hell by its roots and branches. [Old Norse, probably the horse of Yggr'' : Yggr, name of Odin, from yggr, variant of uggr, frightful (see ugly) + drasill, horse.]
Computer Science Challenges of the 1990's
Ubiquitous computing
Parallelism - massive parallelism, neural nets, and multiprocessors
Large scale distributed systems
Your favorite project goes here
Æ Using low cost, moderate access time, tertiary storage
optical tape (e. g., XDM and ICI's Digital Paper)
high density magnetic tape
transverse and helical recording
???
For example: magnetic tape
4 mm, 8 mm, VHS, 19 mm and 1" tape under development
Ampex ACL
19 mm tape
28.75, 81.1, or 186 GByte/tape
15 mbyte/sec
library:
7.1 Tbyte (uses small tapes)
256 cassettes
load/unload 8 sec
search 30 sec
55 GByte/ft3
366 GByte/ft2
$60/GByte
Documents
Document: stored information that can be perceived by humans
Examples: text files, object files, PDL files, books, newspapers, movies, audio tapes, music, statues, paintings, fingerprints, and frescos.
Electronic documents: the subset of documents that can be dealt with by computers
Historical Perspective on Stored Information (European view)
Middle ages (guilds; oral tradition; bards; manuscripts and libraries owned by religion)
Moors fall => Arabic libraries into Christian hands (7th C.) and monks translate to Latin
Paper arrives in Europe (11th to 12th C.)
Secular universities (12th C.)
Renaissance: growth of individual (14-16th C.)
Printing press (~1450)
Scholarly journal (17th C.)
Public libraries and public education (lending library 18th C.; free public library 19th C.)
Xerox 914 copier (1959)
Perspective on Computers (European view)
Boxes of cards
File systems
Access methods for files
Hierarchical and network databases
Relational databases
???
My Local Branch Library
Books
large print, childrens
Magazines
Newspapers
Consumer information
Transportation and community information
VHS tapes
CD's
Audio tapes
Records
Technology Change
Optical disks
Optical disk jukeboxes
High capacity magnetic disks
Decreasing cost of main memory
Fast commercial microprocessors/workstations
Æ High capacity optical/magnetic tapes
Scanners
FDDI communications
Fax
Electronic printers
CD ROM
Information services
Ò deal with change and scale
Perspective Summary
Vast changes over 15 Centuries
"Information worker"
"Future Shock"
Amount of information growing
Speed of delivery increasing
observe the spread of the "cold fusion" papers
Media and delivery systems evolving
Project
Build a large scale hypertext database server
No user interface -- this is a database
Terabyte in scale
ter|a|tol|o|gy n. The biological study of the production, development, anatomy, and classification of monsters
RISC processors, Dragon, and beyond
Ethernet or FDDI
Lots of memory (100's of megabytes)
Lots of MIPS (10-100's)
Modest number of processors (1-16)
Magnetic disk cache (10's of gigabytes)
Lots of optical disk (a terabyte)
Mach and Camelot
TCP/IP and Xerox XNS protocols
Written in Cedar
Cedar, Mach and Camelot
Cedar
From Mesa: Algol family language with strong typing, monitors, threads, exceptions, ADT's
Cedar adds garbage collection, generic types, lists, atoms, ...
Language and runtime "up" under SunOS and partly Mach
Mach
Threads, message based, multiprocessor, external pagers
Camelot
Transaction facility on top of Mach
Logging, commit message protocols, recoverable storage management, media recovery, backup, name service, ...
Might be throw away
Seven Key Ideas
Objects ({ documents)
Typed links
Objects have properties ({ attributes)
Containers group objects
Set oriented non-navigational access
Objects can be named
Versions and alternatives for objects
System Summary (part 1)
Server architecture
Large number of objects of vastly varying sizes
Document ``types'' (extensible) - few interpreted at server
Hypertext: documents can be connected via links
Documents can be named (Ò File server interface)
Documents can have attributes and keywords
Documents are grouped into contexts called containers
Keyword and other indices maintained per container
Versions and alternatives
System Summary (part 2)
Data compression and decompression
On-line archival storage
Use RPC marshalling code to encode contents
Alerters (send a message when an event occurs)
Page level access, access control, transactions, robust, performance, recovery, and availability
Hooks for multi-server and foreign server support
Not an OODBMS
``Store the bits and get out of my way''
Major execution on the server not a requirement
H RDBMS execution
Full blown execution is hard
Performance or security - choose one
Locking and Deadlock
Evolution of types
Match of execution model to programming model
Query optimization
Looping
Longevity
...
Simple execution is doable
Leave It To The Client
Find the set objects non-navigationally
Let the client further filter objects
Let the client build appropriate data structures and use semantics for the current problem
Let the client worry about semantic changes
Add hooks to the database
Alerters (e.g., be informed when something changes)
Type system
Yggdrasil Phases
0
build hypertext, naming (hence NFS interface), indexing, and containers
use Camelot/Mach and/or Sun OS
skip systems issues: performance, recovery, availability, access control, alerters, archival storage, and data compression
postpone versions and alternatives
1a: archival storage
1b: versions and alternatives
1c: alerters and better locking and transactions
1d: availability, access control, and data compression
2
non-navigational access
object-oriented
Yggdrasil Status
Four people claim allegiance to project
High level design done (0 and 1)
Wildly coding and starting testing of 0 on Dorados
Porting to Sun OS
Getting Cedar up on Mach
Getting equipment
larger machines
optical disk jukebox
early use as on-line archival storage for System/33 and file servers without Yggdrasil
Relevance to Xerox
Short term: ideas and vaporware
Medium term: a prototype that
has a good data model for documents - hypertext
is a featured hypertext database system
merges file systems and databases
has an appropriate transaction and locking model
is set oriented
deals with complex objects
is large scale
supports other research
Long term: a prototype that also
is highly available
does distributed computing
performs enhanced execution on the server
uses non-navigational access to hypertext