Yggdrasil - Filing Symposium April 25-28, 1989 Yggdrasil A large scale hypertext database system Bob Hagmann Xerox PARC/CSL Introduction Problem Store and retrieve large numbers of possibly large objects (electronic documents) Motivations research support PARC's & CRG's filing/database needs good job for PARC g good job for Xerox Xerox's document processing strategy Goals Handle large number of objects of any size Use an appropriate data model Non-navigational access Deal with scale Interoperate Have transactions, robustness, good performance, fast recovery, good availability, and access control Have hooks for building on system Project Name Ygg|dra|sil n. Also Yg|dra|sil. Norse Mythology. The great ash tree that holds together earth, heaven, and hell by its roots and branches. [Old Norse, probably the horse of Yggr'' : Yggr, name of Odin, from yggr, variant of uggr, frightful (see ugly) + drasill, horse.] Computer Science Challenges of the 1990's Ubiquitous computing Parallelism - massive parallelism, neural nets, and multiprocessors Large scale distributed systems Your favorite project goes here 8 Using low cost, moderate access time, tertiary storage optical tape (e. g., XDM and ICI's Digital Paper) high density magnetic tape transverse, helical, and longitudinal recording ??? For example: magnetic tape 4 mm, 8 mm, VHS, 19 mm and 1" tape under development Ampex ACL 19 mm tape 81 GByte/tape 15 mbyte/sec library: 256 cassettes load/unload 8 sec search 30 sec 55 GByte/ft3 366 GByte/ft2 $60/GBtye Documents Document: stored information that can be perceived by humans Examples: text files, object files, PDL files, books, newspapers, movies, audio tapes, music, statues, paintings, fingerprints, and frescos. Electronic documents: the subset of documents that can be dealt with by computers Historical Perspective on Stored Information (European view) Middle ages (guilds; oral tradition; bards; manuscripts and libraries owned by religion) Moors fall => Arabic libraries into Christian hands (7th C.) and monks translate to Latin Paper arrives in Europe (11th to 12th C.) Secular universities (12th C.) Renaissance: growth of individual (14-16th C.) Printing press (~1450) Scholarly journal (17th C.) Public libraries and public education (lending library 18th C.; free public library 19th C.) Xerox 914 copier (1959) Perspective on Computers (European view) Boxes of cards File systems Access methods for files Hierarchical and network databases Relational databases ??? My Local Branch Library Books large print, childrens Magazines Newspapers Consumer information Transportation and community information VHS tapes CD's Audio tapes Records Technology Change Optical disks Optical disk jukeboxes High capacity magnetic disks Decreasing cost of main memory Fast commercial microprocessors/workstations 8 High capacity optical/magnetic tapes Scanners FDDI communications Fax Electronic printers CD ROM Information services g deal with change and scale Perspective Summary Vast changes over 15 Centuries "Information worker" "Future Shock" Amount of information growing Speed of delivery increasing Media and delivery systems evolving Project Build a large scale hypertext database server No user interface -- this is a database Terabyte in scale ter|a|tol|o|gy n. The biological study of the production, development, anatomy, and classification of monsters. RISC processors, Dragon, and beyond Ethernet or FDDI Lots of memory (100's of megabytes) Lots of MIPS (10-100's) Modest number of processors (1-16) Magnetic disk cache (10's of gigabytes) Lots of optical disk (a terabyte) Mach and Camelot TCP/IP and Xerox XNS protocols Written in Cedar Cedar, Mach and Camelot Cedar From Mesa: Algol family language with strong typing, monitors, threads, exceptions, ADT's Cedar adds garbage collection, generic types, lists, atoms, ... Language and runtime "up" under SunOS and partly Mach Mach Threads, message based, multiprocessor, external pagers Camelot Transaction facility on top of Mach Logging, commit message protocols, recoverable storage management, media recovery, backup, name service, ... Might be throw away Seven Key Ideas Objects (W documents) Typed links Objects have properties (W attributes) Containers group objects Set oriented non-navigational access Objects can be named Versions and alternatives for objects System Summary (part 1) Server architecture Large number of objects of vastly varying sizes Document ``types'' (extensible) - few interpreted at server Hypertext: documents can be connected via links Documents can be named (g File server interface) Documents can have attributes and keywords Documents are grouped into contexts called containers Keyword and other indices maintained per container Versions and alternatives System Summary (part 2) Data compression and decompression On-line archival storage RPC marshalling code to encode contents Alerters (send a message when an event occurs) Page level access, access control, transactions, robust, performance, recovery, and availability Hooks for multi-server and foreign server support Not an OODBMS ``Store the bits and get out of my way'' Major execution on the server not a requirement S RDBMS execution Full blown execution is hard Performance Locking Security Evolution Deadlock Match of execution model to programming model Query optimization Looping ... Simple execution is doable Leave It To The Client Find the set objects non-navigationally Let the client further filter objects Let the client build appropriate data structures and use semantics for the current problem Let the client worry about semantic changes Add hooks to the database Alerters (e.g., be informed when something changes) Type system Yggdrasil Phases 0 build hypertext, naming, indexing, and containers use Camelot/Mach skip systems issues: performance, recovery, availability, access control, alerters, archival storage, and data compression postpone versions and alternatives 1a archival storage 1b versions and alternatives 1c alerters and better locking 1d availability, access control, and data compression 2 non-navigational access Yggdrasil Status High level design done (0 and 1) Wildly coding and starting testing of 0 on Dorados Porting to Sun OS Getting Cedar up on Mach Getting equipment larger machines optical disk jukebox early use as on-line archival storage for System/33 and file servers without Yggdrasil Relevance to Products Short term: ideas and vaporware Medium term: a prototype that has a good data model for documents - hypertext is a featured hypertext database system merges file systems and databases has an appropiate transaction and locking model is set oriented deals with complex objects is large scale uses non-navigational access to hypertext supports other research Long term: a prototype that also has highly available, distributed computing exhanced exectution on the server non-navigational access to hypertext Κ7–"slides" style–CPressFonts 1.00 in leftMargin 1.00 in rightMargin 6.5 in lineLength• WordlistsYggdrasil.wordlist˜Iunleaded•Mark insideFooteršΡdis˜K– outsideFooterš˜Icenter–YXCPrintFonts 24 bp size 32 bp leading centered lineFormatting centered lastLineFormattingšΟb ˜ LšΟi(˜(–YXCPrintFonts 24 bp size 24 bp leading centered lineFormatting centered lastLineFormatting˜L˜Lšž ˜ Lšž˜—title˜ raggedšž˜N˜Q—šž ˜ N˜šœ,˜,NšœΠbmœ˜&—N˜$——˜Nšž*˜*Nšž˜Nšž˜Nšž˜Nšž ˜ Nšže˜eNšž!˜!—˜ IblockšΠbo œŸœΡbosœŸœ†‘ŸœŸœ Ÿœ‘œŸœ ˜O˜—˜)Nšž˜NšžC˜CNšž˜Nšž˜N˜šΟmž7˜8N˜1˜N˜/—N˜——˜Nšž4˜4šž ˜ N˜ N˜ N˜ šœ˜Nšœ ˜ Nšœ˜Nšœ ˜ Nšœ Οu˜ Nšœ €˜ Nšœ ˜ ———˜ šž<˜