IDSARS Ideas for Document Data Base Committee Aug 4, 1987 Interim Document Storage and Retrieval System Robert Hagmann PARC/CSL Introduction Project is to build a "document" storage and retrieval system multi-media (text, graphics, images, ...) hypertext (hypermedia) naming versions and alternatives attributes and keywords alerters Interim Document Storage and Retrieval System (IDSARS) is an Interim name "Customers" A "Document Server" could be used by f FolioPub project at WRC f ERSG proposal at WRC f VLSI DA (CAD) group in CSL f Voice project in CSL f Software storage for the Cedar Programming Environment in CSL f Storage of scanned images f Distributed Notecards in ISL f Large capacity and high performance file server with archiving capability f Mail storage f TIC support, both at WRC and PARC More Introduction Build a core system that could have front ends for OSI/ANSI Storage and Retrieval, XNS File Server,and Native Protocol. Just does the storage and retrieval part well; needs other groups to supply data, indices, recognition, structure, semantics of execution, user interfaces, ... Data storage, not an active object system Technology has evolved; maybe a window is open Implementation Features objects of varying sizes large total size compression archival storage storage hierarchy transactions object level locking page level access efficient hints robust available replication multi-server and foreign server support access control Object Model (basic) Primitive elements include plain text, structured text, scanned image, mail message, digitized voice, digitized video, dates, floats, integers, and compressed versions of the above. Extensible An object has three parts contents - a primitive element children - an optional (ordered?) collection of objects -- its children. Concept is called containment. attributes - properties of the object Some attributes are interpreted and some are uninterpreted Objects may be interconnected with links; links have names (types) Links are objects Containment f Objects belong to other objects f Objects can belong to many or no other objects f Recursive containment f Matches the way indices are built (specifies context) Naming f "Hierarchical" directories f Children of a directory have names f Directories are objects f Navigate the name lattice via partial names f Not all objects have names Versions and alternatives f Objects are comprised of versions and alternatives f Viewed version/alternative is selected via attributes and time f Directories and named objects can have numbered versions and named alternatives there is a partial mapping to the name space for named objects f A partial name looks like objectName%alternative!version f Historic and version links link to an object (historic) or link to a particular version/alternative of the object (version) Attributes f May be associated with an object f Named by string f Value is a list of tuples. Each tuple is a triple . The value may be a tuple. f Interpreted attributes may be used in queries (e. g., keywords) f Keywords (?? as distinct from attributes) f Attributes may be indexed Triggers (well, really Alerters) f Simple triggers on insert, delete, read, or update f Trigger queues a message for some (probably) external server screen/cache maintenance on workstation server for some activity (e. g., automatic keyword generation) Queries f A query results in a set of objects or an enumerator of the set f Sets are piped through converters. filter: removes some objects from the result set string pattern matching, arithmetic operations, dates, etc. de-referencing: result set computed via links from elements of the source set iteration: loop through a converter n or * times f Indices built as a performance accelerator Issues Hardware/software/protocol base "Client" activity on the server Limited (extensible?) data types understood by server Extensibility Legal requirements for documents Security and standards Long term locking Administration, backup, debugging, historical logging, and monitoring