[Indigo]<Yggdrasil>Documentation>IDSARSforDocBaseCom.slides!1

IDSARS Ideas for Document Data Base Committee

Aug 4, 1987

Interim Document Storage and Retrieval System

Robert Hagmann

PARC/CSL

Introduction

Project is to build a "document" storage and retrieval system

multi-media (text, graphics, images, ...)

hypertext (hypermedia)

naming

versions and alternatives

attributes and keywords

alerters

Interim Document Storage and Retrieval System (IDSARS) is an Interim name

"Customers"

A "Document Server" could be used by

f FolioPub project at WRC

f ERSG proposal at WRC

f VLSI DA (CAD) group in CSL

f Voice project in CSL

f Software storage for the Cedar Programming Environment in CSL

f Storage of scanned images

f Distributed Notecards in ISL

f Large capacity and high performance file server with archiving capability

f Mail storage

f TIC support, both at WRC and PARC

More Introduction

Build a core system that could have front ends for OSI/ANSI Storage and Retrieval, XNS File Server,and Native Protocol.

Just does the storage and retrieval part well; needs other groups to supply data, indices, recognition, structure, semantics of execution, user interfaces, ...

Data storage, not an active object system

Technology has evolved; maybe a window is open

Implementation Features

objects of varying sizes

large total size

compression

archival storage

storage hierarchy

transactions

object level locking

page level access

efficient

hints

robust

available

replication

multi-server and foreign server support

access control

Object Model (basic)

Primitive elements include plain text, structured text, scanned image, mail message, digitized voice, digitized video, dates, floats, integers, and compressed versions of the above.

Extensible

An object has three parts

contents - a primitive element

children - an optional (ordered?) collection of objects -- its children. Concept is called containment.

attributes - properties of the object

Some attributes are interpreted and some are uninterpreted

Objects may be interconnected with links; links have names (types)

Links are objects

Containment

f Objects belong to other objects

f Objects can belong to many or no other objects

f Recursive containment

f Matches the way indices are built (specifies context)

Naming

f "Hierarchical" directories

f Children of a directory have names

f Directories are objects

f Navigate the name lattice via partial names

f Not all objects have names

Versions and alternatives

f Objects are comprised of versions and alternatives

f Viewed version/alternative is selected via attributes and time

f Directories and named objects can have numbered versions and named alternatives

there is a partial mapping to the name space for named objects

f A partial name looks like objectName%alternative!version

f Historic and version links

link to an object (historic) or link to a particular version/alternative of the object (version)

Attributes

f May be associated with an object

f Named by string

f Value is a list of tuples.

Each tuple is a triple <attribute-name, value-type, value>.

The value may be a tuple.

f Interpreted attributes may be used in queries (e. g., keywords)

f Keywords (?? as distinct from attributes)

f Attributes may be indexed

Triggers (well, really Alerters)

f Simple triggers on insert, delete, read, or update

f Trigger queues a message for some (probably) external server

screen/cache maintenance on workstation

server for some activity (e. g., automatic keyword generation)

Queries

f A query results in a set of objects or an enumerator of the set

f Sets are piped through converters.

filter: removes some objects from the result set

string pattern matching, arithmetic operations, dates, etc.

de-referencing: result set computed via links from elements of the source set

iteration: loop through a converter n or * times

f Indices built as a performance accelerator

Issues

Hardware/software/protocol base

"Client" activity on the server

Limited (extensible?) data types understood by server

Extensibility

Legal requirements for documents

Security and standards

Long term locking

Administration, backup, debugging, historical logging, and monitoring