[Indigo]<Yggdrasil>Documentation>YggdrasilFilingSymposium.slides!4

Yggdrasil - Filing Symposium

April 25-28, 1989

Yggdrasil

A large scale hypertext database system

Bob Hagmann

Xerox PARC/CSL

Introduction

Problem

Store and retrieve large numbers of possibly large objects (electronic documents)

Motivations

research

support PARC's & CRG's filing/database needs

good job for PARC Ò good job for Xerox

Xerox's document processing strategy

Goals

Handle large number of objects of any size

Use an appropriate data model

Non-navigational access

Deal with scale

Interoperate

Have transactions, robustness, good performance, fast recovery, good availability, and access control

Have hooks for building on system

Project Name

Ygg|dra|sil n. Also Yg|dra|sil. Norse Mythology. The great ash tree that holds together earth, heaven, and hell by its roots and branches. [Old Norse, probably the horse of Yggr'' : Yggr, name of Odin, from yggr, variant of uggr, frightful (see ugly) + drasill, horse.]

Computer Science Challenges of the 1990's

Ubiquitous computing

Parallelism - massive parallelism, neural nets, and multiprocessors

Large scale distributed systems

Your favorite project goes here

Æ Using low cost, moderate access time, tertiary storage

optical tape (e. g., XDM and ICI's Digital Paper)

high density magnetic tape

transverse, helical, and longitudinal recording

???

For example: magnetic tape

4 mm, 8 mm, VHS, 19 mm and 1" tape under development

Ampex ACL

19 mm tape

81 GByte/tape

15 mbyte/sec

library:

256 cassettes

load/unload 8 sec

search 30 sec

55 GByte/ft3

366 GByte/ft2

$60/GBtye

Documents

Document: stored information that can be perceived by humans

Examples: text files, object files, PDL files, books, newspapers, movies, audio tapes, music, statues, paintings, fingerprints, and frescos.

Electronic documents: the subset of documents that can be dealt with by computers

Historical Perspective on Stored Information (European view)

Middle ages (guilds; oral tradition; bards; manuscripts and libraries owned by religion)

Moors fall => Arabic libraries into Christian hands (7th C.) and monks translate to Latin

Paper arrives in Europe (11th to 12th C.)

Secular universities (12th C.)

Renaissance: growth of individual (14-16th C.)

Printing press (~1450)

Scholarly journal (17th C.)

Public libraries and public education (lending library 18th C.; free public library 19th C.)

Xerox 914 copier (1959)

Perspective on Computers (European view)

Boxes of cards

File systems

Access methods for files

Hierarchical and network databases

Relational databases

???

My Local Branch Library

Books

large print, childrens

Magazines

Newspapers

Consumer information

Transportation and community information

VHS tapes

CD's

Audio tapes

Records

Technology Change

Optical disks

Optical disk jukeboxes

High capacity magnetic disks

Decreasing cost of main memory

Fast commercial microprocessors/workstations

Æ High capacity optical/magnetic tapes

Scanners

FDDI communications

Fax

Electronic printers

CD ROM

Information services

Ò deal with change and scale

Perspective Summary

Vast changes over 15 Centuries

"Information worker"

"Future Shock"

Amount of information growing

Speed of delivery increasing

Media and delivery systems evolving

Project

Build a large scale hypertext database server

No user interface -- this is a database

Terabyte in scale

ter|a|tol|o|gy n. The biological study of the production, development, anatomy, and classification of monsters.

RISC processors, Dragon, and beyond

Ethernet or FDDI

Lots of memory (100's of megabytes)

Lots of MIPS (10-100's)

Modest number of processors (1-16)

Magnetic disk cache (10's of gigabytes)

Lots of optical disk (a terabyte)

Mach and Camelot

TCP/IP and Xerox XNS protocols

Written in Cedar

Cedar, Mach and Camelot

Cedar

From Mesa: Algol family language with strong typing, monitors, threads, exceptions, ADT's

Cedar adds garbage collection, generic types, lists, atoms, ...

Language and runtime "up" under SunOS and partly Mach

Mach

Threads, message based, multiprocessor, external pagers

Camelot

Transaction facility on top of Mach

Logging, commit message protocols, recoverable storage management, media recovery, backup, name service, ...

Might be throw away

Seven Key Ideas

Objects ({ documents)

Typed links

Objects have properties ({ attributes)

Containers group objects

Set oriented non-navigational access

Objects can be named

Versions and alternatives for objects

System Summary (part 1)

Server architecture

Large number of objects of vastly varying sizes

Document ``types'' (extensible) - few interpreted at server

Hypertext: documents can be connected via links

Documents can be named (Ò File server interface)

Documents can have attributes and keywords

Documents are grouped into contexts called containers

Keyword and other indices maintained per container

Versions and alternatives

System Summary (part 2)

Data compression and decompression

On-line archival storage

RPC marshalling code to encode contents

Alerters (send a message when an event occurs)

Page level access, access control, transactions, robust, performance, recovery, and availability

Hooks for multi-server and foreign server support

Not an OODBMS

``Store the bits and get out of my way''

Major execution on the server not a requirement

H RDBMS execution

Full blown execution is hard

Performance

Locking

Security

Evolution

Deadlock

Match of execution model to programming model

Query optimization

Looping

...

Simple execution is doable

Leave It To The Client

Find the set objects non-navigationally

Let the client further filter objects

Let the client build appropriate data structures and use semantics for the current problem

Let the client worry about semantic changes

Add hooks to the database

Alerters (e.g., be informed when something changes)

Type system

Yggdrasil Phases

build hypertext, naming, indexing, and containers

use Camelot/Mach

skip systems issues: performance, recovery, availability, access control, alerters, archival storage, and data compression

postpone versions and alternatives

archival storage

versions and alternatives

alerters and better locking

availability, access control, and data compression

non-navigational access

Yggdrasil Status

High level design done (0 and 1)

Wildly coding and starting testing of 0 on Dorados

Porting to Sun OS

Getting Cedar up on Mach

Getting equipment

larger machines

optical disk jukebox

early use as on-line archival storage for System/33 and file servers without Yggdrasil

Relevance to Products

Short term: ideas and vaporware

Medium term: a prototype that

has a good data model for documents - hypertext

is a featured hypertext database system

merges file systems and databases

has an appropiate transaction and locking model

is set oriented

deals with complex objects

is large scale

uses non-navigational access to hypertext

supports other research

Long term: a prototype that also has

highly available, distributed computing

exhanced exectution on the server

non-navigational access to hypertext