Yggdrasil Overview
May 4, 1989
Yggdrasil

A large scale hypertext database system
Bob Hagmann
Xerox PARC
Introduction
Problem
Store and retrieve large numbers of possibly large nodes ("objects" or electronic documents)
Motivations
research
support PARC's filing/database needs
Xerox's document processing strategy
Goals
Handle large number of nodes of any size
Use an appropriate data model
Non-navigational access (non-procedural)
Deal with scale
Interoperate
Have transactions, robustness, good performance, fast recovery, good availability, and access control
Have hooks for building on system
Project Name
Ygg|dra|sil n. Also Yg|dra|sil. Norse Mythology. The great ash tree that holds together earth, heaven, and hell by its roots and branches. [Old Norse, probably the horse of Yggr'' : Yggr, name of Odin, from yggr, variant of uggr, frightful (see ugly) + drasill, horse.]
"Whatever the details, there is no doubt of the cosmic importance of Yggdrasil"
European Mythology, Jacqueline Simpson
Computer Science Challenges of the 1990's
Ubiquitous computing
Parallelism - massive parallelism, neural nets, and multiprocessors
Large scale distributed systems
b Your favorite project goes here
Æ Using low cost, moderate access time, tertiary storage
optical tape (e. g., ICI's Digital Paper)
high density magnetic tape
transverse and helical recording
???
For example: magnetic tape
4 mm, 8 mm, VHS, 19 mm and 1" tape under development
Ampex ACL
19 mm tape
28.75, 81.1, or 186 GByte/tape
15 mbyte/sec
library:
7.1 Tbyte (uses small tapes)
256 cassettes
load/unload 8 sec
search 30 sec
55 GByte/ft3
366 GByte/ft2
$60/GByte
Documents
Document: stored information that can be perceived by humans
Examples: text files, object files, PDL files, books, newspapers, movies, audio tapes, music, statues, paintings, fingerprints, and frescos.
Electronic documents: the subset of documents that can be dealt with by computers
Historical Perspective on Stored Information (European view)
Middle ages (guilds; oral tradition; bards; manuscripts and libraries owned by religion)
Moors fall => Arabic libraries into Christian hands (7th C.) and monks translate to Latin
Paper arrives in Europe (11th to 12th C.)
Secular universities (12th C.)
Renaissance: growth of individual (14-16th C.)
Printing press (~1450)
Scholarly journal (17th C.)
Public libraries and public education (lending library 18th C.; free public library 19th C.)
Xerox 914 copier (1959)
Perspective on Computers (European view)
Boxes of cards
File systems
Access methods for files
Hierarchical and network databases
Relational databases
???
My Local Branch Library
Books
large print, childrens
Magazines
Newspapers
Consumer information
Transportation and community information
VHS tapes
CD's
Audio tapes
Records
Technology Change
Optical disks
Optical disk jukeboxes
High capacity magnetic disks
Decreasing cost of main memory
Fast commercial microprocessors/workstations
ÆHigh capacity optical/magnetic tapes
Scanners
FDDI communications
Fax
Electronic printers
CD ROM
Information services
Ò deal with change and scale
Perspective Summary
Vast changes over 15 Centuries
Static b dynamic literature
Illiterate b literate public
Serf b "post-industrial information worker"
Parchment b CD ROM
"Future Shock"
Amount of information growing
Speed of delivery increasing
observe the spread of the "cold fusion" papers
Media and delivery systems evolving
True context of the project is to fundamentally change the notion of document
Project
Build a large scale hypertext database server
No user interface -- this is a database
Terabyte in scale
ter|a|tol|o|gy n. The biological study of the production, development, anatomy, and classification of monsters
RISC processors, Dragon, and beyond
Ethernet or FDDI
Lots of memory (100's of megabytes)
Lots of MIPS (10-100's)
Modest number of processors (1-16)
Magnetic disk cache (10's of gigabytes)
Lots of optical disk (a terabyte)
Mach and Camelot
TCP/IP and Xerox XNS protocols
Written in Cedar
Cedar, Mach and Camelot
Cedar
From Mesa: Algol family language with strong typing, monitors, threads, exceptions, ADT's
Cedar adds garbage collection, generic types, lists, atoms, ...
Mesa inspired Modula-2
Language and runtime "up" under SunOS and Mach
Mach
Threads, message based, multiprocessor, external pagers
Camelot
Transaction facility on top of Mach
Logging, commit message protocols, recoverable storage management, media recovery, backup, name service, ...
Might be throw away
Seven Key Ideas
Nodes ({ documents or objects without semantics)
Typed links
Nodes have properties ({ attributes)
Containers group nodes
Set oriented, non-navigational (non-procedural) access
Nodes can be named
Versions and alternatives for nodes
System Summary (part 1)
Server architecture
Large number of nodes of vastly varying sizes
Nodes ``types'' (extensible) - few interpreted at server
Hypertext: nodes can be connected via links
Nodes can be named (Ò File server interface)
Nodes can have attributes and keywords
Nodes are grouped into contexts called containers
Keyword and other indices maintained per container
Versions and alternatives
System Summary (part 2)
Data compression and decompression
On-line archival storage
Use RPC marshalling code to encode contents
Alerters (send a message when an event occurs)
Page level access, access control, transactions, robust, performance, recovery, and availability
Hooks for multi-server and foreign server support
Not an OODBMS
``Store the bits and get out of my way''
Major execution on the server not a requirement
H RDBMS execution
Full blown execution is hard
Performance or security - choose one
Locking and Deadlock
Evolution of types
Match of execution model to programming model
Query optimization
Looping
Longevity
...
Simple execution is doable
Leave It To The Client
Find the set nodes non-navigationally
Let the client further filter nodes
Let the client build appropriate data structures and use semantics for the current problem
Add hooks to the database
Alerters (e.g., be informed when something changes)
Type system
Yggdrasil and IFS Comparison
topic IFS Yggdrasil ratio
size 1 Gbyte 1 Tbyte 1000
CPU .2 MIP 200 MIP 1000
memory 128 KB 128 MB 1000
net read 28 1000 35
bandwidth
(Kbytes/sec)
latency 50 msec 3-30 msec 16 -
- 1 min(to 20 sec)20000
Yggdrasil Phases
0
build hypertext, naming (hence NFS interface), indexing, and containers
use Camelot/Mach and/or Sun OS
skip systems issues: performance, recovery, availability, access control, alerters, archival storage, and data compression
postpone versions and alternatives
1a: archival storage
1b: versions and alternatives
1c: alerters and better locking and transactions
1d: availability, access control, and data compression
2
non-navigational access
object-oriented
Yggdrasil Status
Four people claim allegiance to project
High level design done (0 and 1)
Wildly coding and starting testing of 0 on Dorados
Porting to Sun OS
Getting Cedar up on Mach
Getting equipment
larger machines
optical disk jukebox
early use as on-line archival storage for System/33 and file servers without Yggdrasil
Contributions
Selection of existing ideas to build an artifact
full featured hypertext
set oriented
Large scale
Non-navigational access to hypertext
Database and file server merger