IDMS Ideas
June 1, 1987
Interim Document Management System "Project"
Robert Hagmann
PARC/CSL
Disclaimers
Ideas are in the formative stage
Informal presentation
Hardware/software/protocol base is open
No implementation has been done
Just Jack Kent and I right now
my slides
No resources have been committed
We might not do it
Introduction
Project is to build a "document" storage and retrieval system
multi-media (text, graphics, images, ...)
hypertext (hypermedia)
naming, with versions and alternatives
attributes and keywords
alerters
Interim Document Management System is an Interim name
"Customers"
A "Document Server" could be used by
f FolioPub project at WRC
f ERSG proposal at WRC
f VLSI DA (CAD) group in CSL
f Voice project in CSL
f Software storage for the Cedar Programming Environment in CSL
f Storage of scanned images
f Distributed Notecards in ISL
f Large capacity and high performance file server with archiving capability
f Mail storage
f TIC support, both at WRC and PARC
More Introduction
Build a core system that could have front ends for OSI/ANSI Storage and Retrieval, XNS File Server,and Native Protocol.
Just does the storage and retrieval part well; needs other groups to supply data, indices, recognition, structure, semantics of execution, user interfaces, ...
Xerox has the current focus of "document processing"
No product seems to match requirements, both functional and performance
Technology has evolved; maybe a window is open
Three systems that more or less permanently store information
databases
file systems
file servers
Document Model (basic)
Primitive elements include plain text, structured text, scanned image, mail message, digitized voice, digitized video, and compressed versions of the above.
Extensible
A document is a primitive element with an optional (ordered?) collection of documents
Documents have attributes; some attributes are interpreted and some are uninterpreted
Documents may be interconnected with links; links have names
Naming
f Hierarchical directories
f Ordered children links
f Named links (unordered)
f Not all objects have names
Versions and alternatives
f Directories and objects can have numbered versions and named alternatives
f A partial name looks like objectName%alternative!version
Attributes
f May be associated with a document
f Named by string; value of primitive document element
f Interpreted attributes may be used in queries (e. g., keywords)
f Keywords (?? as distinct from attributes)
Keywords have a string name, and a list of positions
Triggers (well, really Alerters)
f Simple triggers on insert, delete, read, or update
f Trigger queues a message for some (probably) external server
Queries
f A query results in a set or an enumerator of the matching documents
f A subquery is specified by document-selection and a match-condition
f A primitive subquery is specified by name-pattern and a match-condition.
f Primitive names may be composed via set operations or link following operations into a name-pattern
f Match condition
uses interpreted attributes and their operations, or equality in uninterpreted attributes
may specify boolean operations on keywords and metrics (??)
partial match
exactly n of m
at most n of m
at least n of m
Issues
Hardware/software/protocol base
"Client" activity on the server
Limited (extensible?) data types understood by server
Extensibility
Is this good enough to store electronic mail?
Legal requirements for documents
Security and standards
Long term locking
Administration, backup, debugging, historical logging, and monitoring
Transactions and page level access
Replication, robustness, availability, and redundancy
Performance
Garbage collection