DRAFT -- March 15, 1988 9:35:17 am PST -- DRAFT Internal Memo XEROX To From Distribution Bob Hagmann PARC/CSL Subject Date Yggdrasil Overall System Plan March 15, 1988 Introduction This memo addresses the issue of how to implement Yggdrasil. There are three sections (apart from this one). First, there is the implementation plan. This is the general structure of the system and how it is layered and installed on the execution environment. Second, there is the implementation staging. This sets the milestones for the building of the service. Finally, there is a section on future options. Yggdrasil should be the first of a series of projects to deal with document database requirements and research. By calling out options that we may want to pursue later, we hope that the implementation of the system will provide us with the needed flexibility in (at least) these directions. Implementation plan System components We want to build a service that provides the data model, acts as a front end for the (optional) archival storage subsystem, and manages the short, medium, and long term storage. The archival storage will preferably be provided by a network service. One example of this is the OSAR/Index Server that FileNet plans to sell. Our fallback position is to integrate an optical disk jukebox into the server directly and to write all the device management, storage allocation, error reporting, etc. code ourselves. Language, hardware, operating system, communications, protocols and programming environment We propose to write the system in the Cedar language. We think that this is the best choice since we are knowledgeable in the language, it provides many of the features we want, it is efficient, and it has a good programming environment. We plan to initially run on Dorados. However, the system must be portable and must move to the server architecture that will result from the ``OCM Committee on Computational Base'' committee (Common HOWL). This implies that we must write our code in the evolving machine independent Cedar style. We cannot depend on anything in the Cedar system we cannot port (e. g., the semantics of processes). We plan to use the Mimosa ``Cedar to C'' translator as our eventual compiler. To run, we will need runtime support and an appropriate OS. The system should be able to run on Cedar Dorados, SUN's under SUN OS using ``Cedar to C'' and the runtime, and on Cedar Dragons. Which ones we actually build will depend on the schedules and completion of the ``Cedar to C'' translator and runtime support, Dragons, and Cedar Port. We initially will support a subset of our ``Native'' protocol. This means our servers will have to also support the lower level protocols needed by this protocol (SUN RPC layered over TCP/IP and/or UDP; and/or Cedar RPC; and/or XNS Courier). Later, we will support the ``Native'' protocol with full support for data model, and some of SUN NFS, XNS Filing, and ANSI/CCITT/OSI/... ``Filing and Retrieval''. Which higher level protocols we support will depend on staffing and possibly outside assistence. The Front End Cluster To provide for availability, there must be more than one physical machine capable of running a service. The set of machines that provide this is called a cluster. The cluster consists of at least one front end, but a typical system will have two or three front ends. The front ends should be interconnected with good communications (private 10 mHz Ethernet or faster communications). The stable storage (e.g., magnetic disks) of a service are connected to more than one front end computer. The idea is that if one fails, the other can still access the disks. Depending on what you think can fail and/or what a site can afford, this could mean double ported disks, double ported controllers, and/or replicated buses. The front ends must keep their state consistent on disk. The front end running a service is found via a name server. On machine failure, the running front ends agree that the failed machine is down, and then elect machines that can access the disks of the failed service to take over the services of the failed machine. The elected front end machines then run crash recovery and come up as the service. Upon repair of the failed machine, the running cluster agrees that some services will be moved to the repaired machine. The change over is crude: we just crash the service on one machine and bring it up on the other. (Maybe we can do something more user friendly like send mail that we are about to crash, wait, then crash). One consequence of this strategy is that services of a cluster are either all up or all down. This restriction can be reduced slightly: a controlled shutdown of a service does not have to bring the cluster down. Implementation staging Staffing of the project is critical to any time table we might propose. To give an idea of phasing, some sort of milestones are included. With reservations, we propose the following milestones taken from ``start of coding'': 10 months: Large capacity server (10-100 Gigabytes) speaking a subset of our ``Native'' protocol with backup. This server has the basic hypertext data model with naming, attributes, and containers. This server will be put in daily public use at PARC. Missing from the implementation are: archival storage protocol frontends for file servers alerters versions and alternatives compression automatic indexing query language 14 months: add archival storage (~1 Terabyte) 18 months: add versions and alternatives, alerters, and automatic indexing 24 months: add compression, other protocols, a query language, all other requirements, and other features that will become evident by that time. Future options Execution model on the server Eventually, we would like to see the results of object-oriented and/or extensible database research on our server. Significant problems arise in their performance, locking, security, evolution, archiving, execution model, robustness, looping client code, and fairness, as well as other problems. Plan for a memory hierarchy with variable performance We need a structure that can accommodate main memory, stable main memory, {CCD, Magnetic Bubbles, ...}, drum or fixed head disk, low performance magnetic disks, high performance magnetic disks, mounted optical disks, jukebox optical disks, and a picker for magnetic tape archive. Plan for read/write optical disks At least at first, they will (probably) not perform as well as magnetic hard disks. Plan for other archival servers, devices, and media Various kinds of optical tape, microfilm, and magnetic tape may become available. Only unattended systems are of interest (e. g., systems that use mechanical loading of media, such as a jukebox). Plan for stable main memory Battery backed-up RAM is available. We should plan to eventually use it. Plan for additional protocols ANSI/CCITT/OSI/... Filing and Retrieval is an example. Plan for interoperation with commercial database and information services Plan for evolution of standards Character sets, protocols, compression, PDL's, document descriptions, images, ... Worry about parallel computer architectures Maybe something in the Connection Machine world will become a good search engine. Worry about "processor per head" architectures TRW's Fast Finder Chip idea, modified to handle synonyms and word stemming, might be real powerful as a search engine. Plan for RDBMS We may have to extend/integrate RDBMS into the server.