IDMS Ideas June 1, 1987 Interim Document Management System "Project" Robert Hagmann PARC/CSL Disclaimers Ideas are in the formative stage Informal presentation Hardware/software/protocol base is open No implementation has been done Just Jack Kent and I right now my slides No resources have been committed We might not do it Introduction Project is to build a "document" storage and retrieval system multi-media (text, graphics, images, ...) hypertext (hypermedia) naming, with versions and alternatives attributes and keywords alerters Interim Document Management System is an Interim name "Customers" A "Document Server" could be used by f FolioPub project at WRC f ERSG proposal at WRC f VLSI DA (CAD) group in CSL f Voice project in CSL f Software storage for the Cedar Programming Environment in CSL f Storage of scanned images f Distributed Notecards in ISL f Large capacity and high performance file server with archiving capability f Mail storage f TIC support, both at WRC and PARC More Introduction Build a core system that could have front ends for OSI/ANSI Storage and Retrieval, XNS File Server,and Native Protocol. Just does the storage and retrieval part well; needs other groups to supply data, indices, recognition, structure, semantics of execution, user interfaces, ... Xerox has the current focus of "document processing" No product seems to match requirements, both functional and performance Technology has evolved; maybe a window is open Three systems that more or less permanently store information databases file systems file servers Document Model (basic) Primitive elements include plain text, structured text, scanned image, mail message, digitized voice, digitized video, and compressed versions of the above. Extensible A document is a primitive element with an optional (ordered?) collection of documents Documents have attributes; some attributes are interpreted and some are uninterpreted Documents may be interconnected with links; links have names Naming f Hierarchical directories f Ordered children links f Named links (unordered) f Not all objects have names Versions and alternatives f Directories and objects can have numbered versions and named alternatives f A partial name looks like objectName%alternative!version Attributes f May be associated with a document f Named by string; value of primitive document element f Interpreted attributes may be used in queries (e. g., keywords) f Keywords (?? as distinct from attributes) Keywords have a string name, and a list of positions Triggers (well, really Alerters) f Simple triggers on insert, delete, read, or update f Trigger queues a message for some (probably) external server Queries f A query results in a set or an enumerator of the matching documents f A subquery is specified by document-selection and a match-condition f A primitive subquery is specified by name-pattern and a match-condition. f Primitive names may be composed via set operations or link following operations into a name-pattern f Match condition uses interpreted attributes and their operations, or equality in uninterpreted attributes may specify boolean operations on keywords and metrics (??) partial match exactly n of m at most n of m at least n of m Issues Hardware/software/protocol base "Client" activity on the server Limited (extensible?) data types understood by server Extensibility Is this good enough to store electronic mail? Legal requirements for documents Security and standards Long term locking Administration, backup, debugging, historical logging, and monitoring Transactions and page level access Replication, robustness, availability, and redundancy Performance Garbage collection