IDSARS Ideas for Document Data Base Committee
Aug 4, 1987
Interim Document Storage and Retrieval System
Robert Hagmann
PARC/CSL
Introduction
Project is to build a "document" storage and retrieval system
multi-media (text, graphics, images, ...)
hypertext (hypermedia)
naming
versions and alternatives
attributes and keywords
alerters
Interim Document Storage and Retrieval System (IDSARS) is an Interim name
"Customers"
A "Document Server" could be used by
f FolioPub project at WRC
f ERSG proposal at WRC
f VLSI DA (CAD) group in CSL
f Voice project in CSL
f Software storage for the Cedar Programming Environment in CSL
f Storage of scanned images
f Distributed Notecards in ISL
f Large capacity and high performance file server with archiving capability
f Mail storage
f TIC support, both at WRC and PARC
More Introduction
Build a core system that could have front ends for OSI/ANSI Storage and Retrieval, XNS File Server,and Native Protocol.
Just does the storage and retrieval part well; needs other groups to supply data, indices, recognition, structure, semantics of execution, user interfaces, ...
Data storage, not an active object system
Technology has evolved; maybe a window is open
Implementation Features
objects of varying sizes
large total size
compression
archival storage
storage hierarchy
transactions
object level locking
page level access
efficient
hints
robust
available
replication
multi-server and foreign server support
access control
Object Model (basic)
Primitive elements include plain text, structured text, scanned image, mail message, digitized voice, digitized video, dates, floats, integers, and compressed versions of the above.
Extensible
An object has three parts
contents - a primitive element
children - an optional (ordered?) collection of objects -- its children. Concept is called containment.
attributes - properties of the object
Some attributes are interpreted and some are uninterpreted
Objects may be interconnected with links; links have names (types)
Links are objects
Containment
f Objects belong to other objects
f Objects can belong to many or no other objects
f Recursive containment
f Matches the way indices are built (specifies context)
Naming
f "Hierarchical" directories
f Children of a directory have names
f Directories are objects
f Navigate the name lattice via partial names
f Not all objects have names
Versions and alternatives
f Objects are comprised of versions and alternatives
f Viewed version/alternative is selected via attributes and time
f Directories and named objects can have numbered versions and named alternatives
there is a partial mapping to the name space for named objects
f A partial name looks like objectName%alternative!version
f Historic and version links
link to an object (historic) or link to a particular version/alternative of the object (version)
Attributes
f May be associated with an object
f Named by string
f Value is a list of tuples.
Each tuple is a triple <attribute-name, value-type, value>.
The value may be a tuple.
f Interpreted attributes may be used in queries (e. g., keywords)
f Keywords (?? as distinct from attributes)
f Attributes may be indexed
Triggers (well, really Alerters)
f Simple triggers on insert, delete, read, or update
f Trigger queues a message for some (probably) external server
screen/cache maintenance on workstation
server for some activity (e. g., automatic keyword generation)
Queries
f A query results in a set of objects or an enumerator of the set
f Sets are piped through converters.
filter: removes some objects from the result set
string pattern matching, arithmetic operations, dates, etc.
de-referencing: result set computed via links from elements of the source set
iteration: loop through a converter n or * times
f Indices built as a performance accelerator
Issues
Hardware/software/protocol base
"Client" activity on the server
Limited (extensible?) data types understood by server
Extensibility
Legal requirements for documents
Security and standards
Long term locking
Administration, backup, debugging, historical logging, and monitoring