FROZEN DRAFT -- Aug 25, 1987 -- FROZEN DRAFT Internal Memo XEROX To From Distribution Bob Hagmann PARC/CSL Subject Date Interim Document Storage and Retrieval System Design Aug 25, 1987 [This memo is a snapshot of the ideas for IDSARS as of Aug 25, 1987. The document structure for the system has evolved away from this format. I have not edited this memo for a month. Hence, I've decided to freeze this draft version of the memo. It does not match the current thinking on several area, notably in naming, versions, and alternatives.] Introduction Many organizations require the more or less permanent storage of information. Over the years, there have evolved three types of electronic services that have satisfied parts of the problem: databases, file systems, and file servers. Each of these systems have their strengths and weaknesses. Over time, the problems and technologies change. What was appropriate at one time is not a good solution later. The cost of secondary storage has continued to decrease over the years. In addition, the range of size of individual objects that we wish to store has grown (one storage object in CSL is a 160 megabyte file). We are not always satisfied with the simple yet elegant hierarchal naming systems such as we have in file systems. Nor are the concepts of relational database systems a panacea: it is missing several features that make it difficult to use to model some applications (e.g., good solutions for repeating groups, hypertext, stored procedures, triggers, transitive closure, large objects, ...). Object-oriented databases are a current research topic, but significant problems arise in their performance, locking, security, evolution, archiving, execution model, robustness, as well as other problems. There seems to be a good deal of overlap between the needs of several groups to store and retrieve documents. The needs include the storage of documents of varying sizes, the naming and indexing of some of the documents, the interconnection of (some of) the documents, and notification of significant events of documents. The following is a partial list of CRG projects, proposals, groups, and needs that could be served by a Document Server. f FolioPub project at WRC f ERSG proposal at WRC f VLSI DA (CAD) group in CSL f Voice project in CSL f Software storage for the Cedar Programming Environment in CSL f Storage of scanned images f Distributed Notecards in ISL f Large capacity and high performance file server f Mail storage f TIC support, both at WRC and PARC Xerox also has the current focus of "document processing." Central to most (if not all) document processing systems is the storage and retrieval of documents. We are interested in the subclass of documents called electronic documents; these are "documents" that can be handled in some way via computers. Although there is no precise definition of "electronic document", we believe that a stored electronic document should have (some of) the following features: f Primitive elements include plain text, structured text (e.g., a Tioga or Viewpoint file), scanned image, mail message, digitized voice, digitized video, and compressed versions of the above. This should be an extensible list. f A document is a primitive element with an optional ordered collection of documents; note that is a recursive definition f Documents have attributes; some attributes are interpreted and some are uninterpreted. Keywords may be distinguished attributes. f Documents may be interconnected with links; links have names The basic idea is to build an "Interim Document Storage and Retrieval System" (IDSARS) that will be the central code of several server systems. IDSARS could have front-ends that made it looks like an XNS File Server, an OSI File Server, an NFS, or a native Document Server. A single server could export multiple interfaces over the same "Document Base" if they are compatible. IDSARS should be configurable with various secondary storage devices to meet a variety of requirements for response time, storage capacity, archival storage, backup, and robustness. We in CSL are interested in providing the basic storage and retrieval system, together with its associated administrative functions. We hope to cooperate with other groups to make this a useful system. Other groups could provide an automatic keyword generation, selective dissemination of information (SDI), inverse typesetting, document capture (e. g., scanning), and other facilities. Basic Features f large numbers of documents f documents can be large f documents "types" include text, bitmaps, digitized audio, digitized video (?), graphics, ... f documents can be connected via links (hypertext) f documents can be named via a hierarchical name system f documents can have attributes and keywords f keyword and other indices maintained f support for versions and alternatives f data compression f support for version histories and archival storage f alerters (send a message when a system event occurs) f query language and query optimizer f page level access and locking within a document (?) f transactions f robust f good performance f good availability (replication?) f hooks for multi-server and foreign server support Hardware Base f > 1 MIP CPU (or much more) f > 8 megabytes of main memory possible f > 2 megabytes of main memory minimum f Good communications hardware back-to-back packets sufficient bandwidth (10 megabits/second or much more) f SCSI bus (or other way to connect efficiently to a large amount of storage) must be of the right flavor for WORM optical and magnetic hard disks (if WORM optical not on a separate server) clock speed and synchronous/asynchronous must match on bus f Optional WORM optical disks, possibly with a jukebox might be accessible over the net instead of connected by a bus f Magnetic hard disks possibly in large numbers even if WORM optical disks are used!! 2-10 Gigabytes varying capacity and performance mirrored hardware (?) f Plan for read/write optical disks will not perform as well as magnetic hard disks (at least at first) f Plan for stable main memory battery backed RAM Issues f Commercial hardware platform f Hardware support f Flexible to accept new storage and communications technology Operating System and Programming Environment f Supports sharing and concurrency; choose one of 1) multiple concurrent threads sharing the same address space UNIX System V.3, SUN UNIX, and Cedar are examples 2) extremely fast IPC do I/O in "helper" processes (e.g. V) 3) asynchronous I/O (e.g. UNIX 4.3 BSD) use a single process that never blocks, except for when it want to build/use a lightweight process package f Good existing communications supports protocol(s) needed for server f Reasonable programming environment support f Reference: papers on X implementation Issues f Internal programming environment verses commercially available "environment" language features Cedar is built for large systems ease of porting to a different environment f Performance for sharing of data/processing and communications f Support likely for new protocols f Speed of communications and new technology Hardware, OS, and Programming Environment Options f Cedar Use a Dorado and Cedar Talks Pup and XNS, plus TCP/IP should be started this summer Lots of code available, notably Alpine To talk SCSI, do either: Get the Misc Board built (10-mHz ethernet and SCSI) Use a Ethernet/SCSI interface (Siemens) Use a Dicentra Put a 10-mHz ethernet board on the Dorado (preferred, maybe required) Talk 3 or 10 mHz ethernet to clients Build software to allow remote access to SCSI on Dicentra Use Multibus-SCSI adapter Port to non-Dorado by either: The Mimosa Cedar to C translator and run under other OS (e.g., UNIX) Dragon Mimosa retargeted to other instruction set f SUN UNIX Use a SUN-3, SUN-4, or ... Higher performance CPU's coming (?) Talks XNS and TCP/IP We own little code, but there's a lot of it out there Flexible I/O (SCSI and friends) f System V UNIX (and close relations) Use any machine that talks some protocol, and supports flexible I/O (SCSI). e. g., use the new HP RISC workstation (?) Communications seems to be a problem High Level Protocols f "Native" protocol with full support for data model (first version) f Build server so that a protocol veneer can be built later to handle (at least) XNS Filing ANSI/CCITT/OSI/... Filing and Retrieval NFS If the system talks Pup, do Pup FTP too (?? Leaf and Chat) Issues f Low level protocol(s) supporting "native" protocol Data Storage f Storage hierarchy memory, magnetic hard disk log, magnetic hard disk buffers, WORM optical disk, ... f Log based recovery "new file" trick used on Alpine: write new data to free pages and commit and log only small amounts of data; this avoids double writing - once to the log and once to the file f Use group commit for high update load, if needed f All naming, indexing, buffering, linkages, etc. are saved and updated on magnetic hard disk (or other fast secondary storage) f WORM optical disks will be self identifying upon removal (assuming that the optical disks are directly connected to the server) not self identifying when they are de-selected or after every update includes naming, indexing, and linkages (assumes we use the optical disks directly instead of through some other server) f Storage hierarchy: data/naming information/indices/etc. can be in/on: memory buffer stable storage (?) log magnetic hard disk buffer WORM optical disk f Delay processing (e.g., indexing) accept "image" quickly; index at leisure (?? via external server) g stores quickly and avoids conflicts f Large object support f "Real time" support -- enough for voice (?? video later) f Archival storage a) write to archival device when not referenced recently b) write to archival device ASAP but when convenient c) write to archival device preserving time order of documents (legal requirement) f Plan for new technologies and changes in old technologies XDM might replace WORM optical disk optical tape f Key is to build a framework for varying performance storage devices; the actual devices might change Issues Where do we store the attribute/keyword indices (see below)? How about at the leaf directories for objects in directories. For all data in a service (top level directory), store all attributes/keywords for all objects. Compression 200 spi g 3,740,000 bits per 8| x 11 page; @ 10:1 compression, that's about 50K bytes per page image 600 spi g 33,660,000 bits per 8| x 11 page; @ 10:1 compression, that's about 420K bytes per page image Rumor has it that one hour of video can be stored in 300 megabytes, using some compression algorithm. That's 83,333 bytes/second or 2/3 of a megabit per second. If compressed data via some algorithm is ever stored on the server, the server must forever retain the ability to decompress it itself! (Unless all copies of data using this compression are deleted.) How do we merge compression and versions? "Big files" are bitmaps, DBMS files, logs, and big "ASCII" files. Requirements seem to be: short ascii bcd Big files compressed what? file file pages random access no no (?) yes fast linear read yes yes yes backup incremental whole file ok incremental old versions deltas whole files deltas Issues What compression (standards?) should we support? Data Model An object in the system is either a document or a link. Documents and links are almost the same. All objects (both documents and links) can have "contents" as well as attributes (a property list). Documents and links both can have names. Links have both source and destination objects. A link also has a "type" unless it is a child link. The "type" is just a character string. Naming of documents f Hierarchical directories a slash separated path name ("[]<>" format accepted and converted) loops allowed, but a server specific depth will be imposed (??) directories have as members directories hard links to other directories symbolic links f Ordered children links document can have subdocuments (children); there is an order to the children f Typed links any number of links of the same name on a node, but they are unordered f Not all objects have names Containerization of documents f An object belongs to some named object f The named object does not have to be a leaf in the naming tree Versions and alternatives [we should do something like the layers Danny talks about] f Directories and objects can have numbered versions and named alternatives f A partial name looks like objectName%alternative!version "/Luther.alpine/Hagmann.pa/Buffering%FCFS!5/TechnicalReport.tioga!3" is a valid file name f versions and alternatives are maintained for both named and unnamed objects Issues for versions and alternatives f is "definition groups" the right model? f how can we extend definition groups to hypertext and multimedia? f which "delta" technique do we use? Attributes f May be associated with an object (document or link) f value are primitive document element f Interpreted attributes may be used in queries (e. g., keywords) Issues for attributes f an attribute is either 1) a triple: an attribute name (string), a type, and a value 2) an attribute name (string) and a list of triples. Each triple has a field name, a type, and a value f can a "value" be a list of values? f are attribute names unique for an object? f for option 2, are the field names unique for an attribute? Keywords (?? as distinct from attributes) f Similar to attributes, but are distinguished (?) f Keywords have a string name Indices f Maintain indices for keywords in the naming tree at the levels specified by the "owner" of the subtree f An index at a level replicates data of indices at lower levels Issues for indices f index on attributes too? Updates f Primitive navigational updates (e.g., link A to B, add attribute C of type D value E to F, remove G, store document H) f One server interaction can create an object, set its contents, set its attributes, and create links from/to it. Issues for updates f bulk updates? Triggers (well, really Alerters) f Simple triggers on insert, delete, read, or update f Trigger queues a message for some (probably) external server f Uses: Screen refresh due to update of a document being viewed Triggering of services (e. g., automatic indexing of a new document) Queries [Hector is working on this.] f A query results in a set or an enumerator (a list of dynamically constructed handles) of the matching objects. Each object may be examined further by the code under the enumerator (e.g., the attributes may be examined or the document read). f A subquery is specified by object-selection and a match-condition, which yields a new subquery. f A primitive subquery is specified by name-pattern and a match-condition. f A name-pattern is a name expression (possibly "*") or it is a previously computed set f Simple patterns are primitive names f Primitive names may be composed via set operations or link following operations into a name-pattern a set operation is like (foo* U *bar*) y *tioga link following: children: name-pattern_ named link: name-pattern_(link-name-pattern) all children: name-pattern_* all children over a named link: name-pattern_(link-name-pattern)* f Match condition using interpreted attributes and their operations, or equality in uninterpreted attributes, compute the match can apply match to a link to to the object being linked to match condition may be omitted may specify boolean operations on keywords partial match exactly n of m at most n of m at least n of m f g RDBMS not good enough for partial match Issues for Queries f Who knows what syntax to use for the queries? The "U" and "_*", and "%" stuff is just a place holder. f Support for (some?) known methods of finding documents a) full text search b) inverted word index c) multi-attribute hashing d) clustering e) signatures f) naming g) attributes and indexing h) weighted boolean f other retrieval methods (?) Issues for Data Model f "Client" activity on the server specify very restricted programming language watch out for concurrency control crash recovery looping accountablilty monitoring Trojan horses performance naming (my data type "time" is different from yours) modification of a type's semantics over time f Extensibility f Cocitations (g back links and queries) f Is this good enough to store electronic mail? f Is this good enough to store voice? f Hints for data storage (clustering) f Limited (extensible?) data types understood by server to allow query/indexing operations to occur on the server (evolving to allow relational DBMS?). f What kind of queries are done in "information retrieval?" Do we need metrics (word distances)? How do the folks in the TIC want to query their databases? This could force huge indices!!!!! f What can be an attribute? f When can an object be garbage collected? f Cross server links? Transactions and concurrency control Locking f a document (object) is the natural unit of locking f locks come in three flavors browse lock: this lock shares with any other lock. It is the weakest of all the locks. Alerters can be fired off of browse locks when a stronger lock is granted. read lock: locks the object so that it cannot be modified write lock: locks the object for modification. Does not share with read lock. Does share with browse lock, but commit of the transaction breaks the browse lock. Transactions f they are supposed to be short f there will be "Begin Transaction", "Commit Transaction", and "Abort Transaction" primitives. Issues f Long term locking (e. g., check-in and check-out) Issues (general) f Containment -- what does it mean? f Legal requirements for documents (write once in proper time order) f Security f Standards (character set, naming, ...) f Administration f Debugging, historical logging, and monitoring f Backup f Private changes to public data (e.g., private annotation or link) f Layers or object versions ? f Pruning deltas f Merging deltas f Are all versions always available? f Page level access f Replication, availability, and redundancy f Robustness f Performance f Access control lists? How are they done for unnamed objects? f Garbage collection Ê"Ï• WordlistsIDSARS.wordlist•StyleDefý BeginStyle (Cedar) AttachStyle (DefaultNest) "for print" {DoNest} PrintRule (look.w) "annotation looks" { "Cream" family bold face } ScreenRule (look.w) "annotation looks" { AlternateFontFamily bold+italic face } PrintRule (firstHeadersAfterPage) {0} .cvx .def (root) "format for root nodes" { (BasicSize) {14 bp } .cvx .def cedarRoot docStandard 36 pt topMargin 36 pt headerMargin 1.75 in leftMargin 1.5 in rightMargin 9.00 in lineLength 24 pt topIndent 24 pt topLeading 0 leftIndent 10 pt rightIndent } ScreenRule (root) "format for root nodes" { cedarRoot docStandard 36 pt topMargin 36 pt headerMargin 1.75 in leftMargin 1.5 in rightMargin 5.25 in lineLength 24 pt topIndent 24 pt topLeading 0 leftIndent 10 pt rightIndent } PrintRule (positionInternalMemoLogo) "Xerox logo: screen" { docStandard 1 pt leading 1 pt topLeading 1 pt bottomLeading } ScreenRule (positionInternalMemoLogo) "for Xerox logo" { docStandard 1 pt leading 1 pt topLeading -28 pt bottomLeading -1.5 in leftIndent } PrintRule (internalMemoLogo) "Xerox logo: screen" { "Logo" family 18 bp size 20 pt topLeading 20 pt bottomLeading } ScreenRule (internalMemoLogo) "for Xerox logo" { "Logo" family 18 pt size 12 pt leading -28 pt topLeading 36 pt bottomLeading -1.5 in leftIndent } PrintRule (memoHead) "for the To, From, Subject nodes at front of memos" { docStandard AlternateFontFamily 240 pt tabStops } StyleRule EndStyle˜Iunleaded•Mark centerFooteršÐosœ ˜,Iblock– insideHeaderšÑbox ˜ Ipositioninternalmemologo˜Iinternalmemologošœ˜memoheadšÏsœŸ˜Ošœ Ïtœ ˜Ošœ ˜ O˜O˜O˜O˜O˜—šŸœŸ˜ OšœA˜AO˜O˜Lšà˜à—head˜ Lšœé˜éLšœ¬˜¬L˜©LšœÂ˜Âšœx˜xL•CharSetsï• CharPropsPostfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï?–Postfix XCPrintFonts?šœ@˜@L–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï1–Postfix XCPrintFonts1šœ2˜2L–ï–Postfix XCPrintFontsšœ˜L–ï#–Postfix XCPrintFonts#šœ$˜$—LšœˆÏi œ˜Ÿšœ®˜®L–ïd–Postfix XCPrintFontsdšœå˜åL–ïy–Postfix XCPrintFontsyšœz˜zL–ï–Postfix XCPrintFontsšœ„˜„L–ï>–Postfix XCPrintFonts>šœ?˜?—LšœN¡œÛ˜±L˜„—˜L–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï^–Postfix XCPrintFonts^šœ_˜_L–ï2–Postfix XCPrintFonts2šœ3˜3L–ï7–Postfix XCPrintFonts7šœ8˜8L–ï,–Postfix XCPrintFonts,šœ-˜-L–ï&–Postfix XCPrintFonts&šœ'˜'L–ï'–Postfix XCPrintFonts'šœ(˜(L–ï–Postfix XCPrintFontsšœ˜L–ï4–Postfix XCPrintFonts4šœ5˜5L–ï6–Postfix XCPrintFonts6šœ7˜7L–ï$–Postfix XCPrintFonts$šœ%˜%L–ï5–Postfix XCPrintFonts5šœ6˜6L–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ ˜ L–ï–Postfix XCPrintFontsšœ˜L–ï"–Postfix XCPrintFonts"šœ#˜#L–ï3–Postfix XCPrintFonts3šœ4˜4—˜ L–ï–Postfix XCPrintFontsšœ˜L–ï'–Postfix XCPrintFonts'šœ(˜(L–ï&–Postfix XCPrintFonts&šœÏmœ#˜'–ï–Postfix XCPrintFontsšœ˜L˜L˜6—–ïM–Postfix XCPrintFontsMšœN˜NL˜oL˜:—–ï6–Postfix XCPrintFonts6šœ7˜7Lšœ>˜>—–ï–Postfix XCPrintFontsšœ˜˜?L˜—L˜ L˜—–ï#–Postfix XCPrintFonts#šœ$˜$L˜C—–ï–Postfix XCPrintFontsšœ˜L˜—˜L–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï>–Postfix XCPrintFonts>šœ?˜?——˜,–ï1–Postfix XCPrintFonts1šœ2˜2šœ=˜=L˜1—šœ¡ œ ˜L˜%—˜'L˜BL˜'——–ï–Postfix XCPrintFontsšœ˜L˜&—L–ï,–Postfix XCPrintFonts,šœ-˜-L–ï'–Postfix XCPrintFonts'šœ(˜(˜–ïN–Postfix XCPrintFontsNšœO˜OL˜L˜ L˜*—L–ï?–Postfix XCPrintFonts?šœ@˜@L–ï"–Postfix XCPrintFonts"šœ#˜#L–ï,–Postfix XCPrintFonts,šœ-˜-——˜1–ï–Postfix XCPrintFontsšœ˜L˜L˜–Postfix XCPrintFonts>šœ?˜?–ï–Postfix XCPrintFontsšœ˜L˜7L˜D——˜L˜L–ïs–Postfix XCPrintFontssšœô˜ôL–ïb–Postfix XCPrintFontsbšœc˜cL–ïK–Postfix XCPrintFontsKšœL˜LL–ïW–Postfix XCPrintFontsWšœX˜XL–ï%–Postfix XCPrintFonts%šœ&˜&–ïe–Postfix XCPrintFontsešœf˜fLšœ¢œ¢œ˜/šœ˜Lšœ¢˜Lšœ¢œ˜,Lšœ¢œ˜Lšœ,¢œ˜A——–ï–Postfix XCPrintFontsšœ˜Lšœm˜mL˜:Lšœ˜L˜*˜ L˜L˜L˜——L–ï+–Postfix XCPrintFonts+šœ¢œ(˜,˜L–ïh–Postfix XCPrintFontshšœ7¢œ¢œ)˜i–ï8–Postfix XCPrintFonts8šœ9˜9Lšœ˜Lšœ˜Lšœ˜Lšœ ˜ Lšœ ˜ L˜ L˜L˜—L–ï–Postfix XCPrintFontsšœ˜——˜–ï!–Postfix XCPrintFonts!šœ"˜"L˜,˜ L˜L˜L˜L˜L˜ L˜ L˜ L˜4L˜,——L–ï–Postfix XCPrintFontsšœ˜L–ï(–Postfix XCPrintFonts(šœ¢œ˜)L–ï/–Postfix XCPrintFonts/šœ0˜0L–ï%–Postfix XCPrintFonts%šœ&˜&L–ï%–Postfix XCPrintFonts%šœ&˜&L–ï–Postfix XCPrintFontsšœ˜˜˜L–ïA–Postfix XCPrintFontsAšœÂ˜ÂL–ï–Postfix XCPrintFontsšœ˜L–ï*–Postfix XCPrintFonts*šœ+˜+L–ï–Postfix XCPrintFontsšœ˜——˜$˜L–ï4–Postfix XCPrintFonts4šœ5˜5–ï–Postfix XCPrintFontsšœ˜L˜£L˜9L˜¢——šœ ˜ L–ï–Postfix XCPrintFontsšœ ˜ L–ï^–Postfix XCPrintFonts^šœ_˜_—˜L–ï3–Postfix XCPrintFonts3šœ4˜4——˜L–ï#–Postfix XCPrintFonts#šœ$˜$L–ïD–Postfix XCPrintFontsDšœE˜EL–ï –Postfix XCPrintFonts šœ ˜ L–ï(–Postfix XCPrintFonts(šœ)˜)L–ï–Postfix XCPrintFontsšœ˜L–ï/–Postfix XCPrintFonts/šœ0˜0L–ï–Postfix XCPrintFontsšœ ˜ L–ïD–Postfix XCPrintFontsDšœE˜EL–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï–Postfix XCPrintFontsšœ˜L–ï$–Postfix XCPrintFonts$šœ¡œ˜%L–ï–Postfix XCPrintFontsšœ˜L–ï+–Postfix XCPrintFonts+šœ,˜,L–ï –Postfix XCPrintFonts šœ ˜ L–ï –Postfix XCPrintFonts šœ˜L–ï?–Postfix XCPrintFonts?šœ@˜@L–ï–Postfix XCPrintFontsšœ˜——…—GÚj¯