-- /ivy/binding/hickory/hickoryDoc.tioga
-- documentation and descr of Hickory
-- Last edited by: Binding, August 24, 1984 4:22:39 pm PDT
Hickory:
A data base for a calendar and reminder system
Carl Binding (August 31, 1984)
Filed on: [Ivy]<binding>hickory>hickoryDoc.tioga

©  Copyright 1984 Xerox Corporation.  All rights reserved.

Abstract: This document is a last minute effort to document the general organization and some of the internals of Hickory. Hickory 
is a Cypress client, providing operations on event tuples. Operations provided are storing/retrieving of event describing tuples, 
grouping events  together, associating properties with events. Hickory also provides client notification in case of write 
modification to the data base.


XEROX            Xerox Corporation
                Palo Alto Research Center
                3333 Coyote Hill Road
                Palo Alto, California 94304

For Internal Xerox Use Only
Introduction
    Hickory is a database for storing, retrieving and generally dealing with events. Events are considered entities as they appear in a 
    calendar, for example. Thus the main attributes of an event are a time of occurrence, the nature of the event, where the event 
    takes place etc. As a particularity of Hickory, it deals with repeated events. Repetition is described by a subset of event 
    attributes and upon an appropriate query, Hickory computes all occurrences of a repeated event within the specified time 
    interval. Special care was given to implement that aspect of Hickory efficiently.
    Events come with a fixed number of attributes. Those are more or less the attributes stored with an event in the main relation of 
    Hickory. However, Hickory provides the possibility of associating properties of types Boolean, Integer, Rope or Time with an 
    event. Thus, greater flexibility is achieved.
    Hickory also provides a mechanism allowing for grouping events. An event can be a member of none, one or several groups. Queries 
    allow to retrieve events of a given group. Group membership of an event is registered in a separate relation. This means that 
    deleting a particular group does not destroy the events within the group.
    Hickory being a stable storage facility for events, it must provide means to notify clients whenever a write change to the database 
    has occurred. For this purpose Hickory provides facilities for registering notification procedures for any Hickory client. 
    Whenever a write modification is done to the data base, Hickory signals all its clients via the registered notification 
    procedures. The arguments to those procedures allow the client to update its internally cached data appropriately.
    The purpose of this document is to describe the internal organization of Hickory and also to give a general overview. Thus, reading 
    this document will help in maintaining Hickory and fix remaining bugs. First, we discuss the externals of hickory, i.e. the 
    interface and commander commands that Hickory provides. Then we shall describe the general organization of the program as well 
    as some implementation details.
External aspects of Hickory
    Hickory provides a set of operations on event tuples. Those operations are defined in the interface
    Hickory.mesa. The semantics of the operations should be self documenting, therefore we shall not discuss the interface in 
    greater detail here.
    Hickory registers three Commander commands. These are the following: 

    CreateHickory <fileName>: This command allows to read the specified text file to produce an initial data base. The format of the 
    text file is compatible with the format that the old Remember package produced. Thus, saying "CreateHickory RememberEvents.txt" 
    to the Commander will create a data base that can then be used by any Hickory client. The default for <fileName> is 
    RememberEvents.txt.

    DumpHickory <fileName>: This is the inverse from the above. All the event tuples contained in Hickory are written onto the 
    specified file. However the produced file might contain more information than Remember used to generate. Thus, you will not be 
    able to use a file produced by DumpHickory with the old Remember! The default for <fileName> is RememberEvents.txt.

    DestroyHickory: This erases the data base segment on which the Hickory data base is stored. Only if you have saved the contents 
    onto a text file can you regenerate the data base, through the CreateHickory command.

        Hickory also uses the user profile. The following user profile entries are used:

    Hickory.HickorySegmentFile: This is the Alpine file name of the used segment to store the Hickory data. The segment name is 
    defaulted to "[Luther.alpine]<userName>Hickory.segment".

    Hickory.HickoryGroups: When creating a new Hickory data base it might be of interest to specify an initial set of groups into which 
    events can be grouped. You must specify a set of Cedar IDs, separated by ',. The used default for groups is no groups at all. 
    Hickory clients, of course, can declare new groups at will.
General organization of the program
    Hickory is organized as a multiple module monitor. The interface visible to Hickory clients is defined in Hickory.mesa.  
    Implementation of this public interface is scattered over a set of modules. Some of these modules have private interfaces in 
    order to allow cross module communication. The following interfaces are private to Hickory and not visible to a regular Hickor 
    client.
HickoryStorage.mesa: This interface defines all the relations used in Hickory. The only exported operation allows to initialize 
those relations with Cypress.  All the used relations are grouped into a monitored record, which provides the monitor lock used 
throughout Hickory. Additionally to the data base relations, this record contains some global variables, used throughout the 
program. Hence, HickoryStorage exports all global, private variables of Hickory. In all other modules of the monitor, 
HickoryStorage.protData is therefore imported and for greater convenience the record is opened....!

HickorySupport.mesa: This interface defines some frequently used operations. These are mainly casting routines between different 
data types, routines allowing to display messages into the Hickory Log viewer and more importantly two operations that deal with 
starting and restarting transactions.  The interface also provides routines used in generating occurrences of repeated events.

HickoryGroup.mesa: Provides some of the Hickory internal operations dealing with grouped events. Additionally, the interface also 
exports routines dealing with the cache of groups maintained in HickoryGroupImpl.mesa.

HickoryCache.mesa: This interface exports operations useful for dealing with the caches that Hickory maintains in order to speed up 
generation of occurrences of repeated events. It also provides a set of operations on the lookaside cache of event tuples, which is 
used for retrieval of an event, with a known key.

HickoryProp.mesa: Here we define operations dealing with properties of events. For now only operations on the cache of property 
names as it is maintained by HickoryPropImpl.mesa are exported through this interface. 

HickoryOwner.mesa: To implement access protection to Hickory. When the data base segment is first opened, the user name and 
password of the user are stored in a relation. Further access  to Hickory is then matched against those stored user credentials. It 
is only the owner that can do write modifications to the data base! The interface also contains a routine dealing with changes to 
the owner's password. However, I am not sure if I got that right and its usefulness might be limited anyway.

HickoryDestroy.mesa: This interface exports a routine dealing with destroying events whose KeepUntil attribute indicates that it is 
time to purge them from the data base.

HickoryNotify.mesa: Provides the operations used when dealing with client notification. Provides operations for registering client 
provided routines and activating those routines, whenever a write update was done to Hickory.
    Implementation of those interfaces and the public Hickory interface is provided by the set of modules described next. As a 
    convention all module names are the interface name with the suffix "Impl".
Implementation of it all
    Some of the implementation details of Hickory are very straightforward, others are maybe a little harder to grasp at first glance. 
    In the following we shall discuss some of the things that we think to be of a slightly higher complexity. We neither claim that 
    the implementation is always optimal nor that the  justification for doing things a given way is always correct. Furthermore, 
    needless to say that there are probably still some hidden bugs within Hickory. Rather than discussing each implementation 
    module in detail, we shall discuss them grouped by the functionality they provide to the public Hickory interface.
General utilities
    The operations provided here are dealing with the two main data types provided by Hickory, i.e. EventTuple and EventList. 
    Implementation of those routines can be found in HickoryStoreImpl.mesa and HickorySupportImpl.mesa.
Storing and altering events
    Operations that deal with storing and changing event tuples are implemented in HickoryStoreImpl.mesa. The code of those routines is 
    relatively straightforward, so no in depth discussion is provided here. The only thing to worry about is to maintain the cache 
    of entered events correctly. When changing attributes of an event we also must deal with caches of occurrence times in the case 
    of a repeated event. 
Destroying events, forgetting them and destroying the data base.
    All the routines dealing with those operations are implemented in HickoryDestroyImpl.mesa. The code should be self documenting. 
    Again the thing to worry about is to maintain cache consistency in the case of a deletion of an event. Forgetting an event does 
    not destroy the tuple, it merely changes the Forgotten attribute of the tuple. As a result, not all queries might retrieve that 
    event. ( see below)
Querying the data base
    Querying the data base is probably the most tricky thing in Hickory. First, there are a couple of query options provided. Let us 
    explain them briefly:

    NoOptions: when supplying this value to a query, all the events that are not forgotten are returned. For repeated events only the 
    first occurrence is returned. Events are returned regardless of them being a reminder or not.

    All: when setting the All bit in the options parameter, Hickory returns the events that were previously forgotten as well as the 
    non forgotten events of course.

    ExpandRepetitions: Hickory generates an EventTuple for each occurrence of a repeated event that falls into the given time interval. 
    Thus the client sees a set of events, in which each occurrence of an event corresponds to a tuple in the set. Expansion of 
    repeated events is costly so be warned. For more details on how things are done, see below.

    Reminders: If the Reminder bit in options is set, Hickory only returns the event tuples that have their Remind attribute set.  If 
    the bit is  not set, the Remind attribute is ignored in a query.

    Further parameters to a query specify the time interval into which events must fall in order to be considered. By providing the key 
    of an EventTuple, it is possible to retrieve the associated tuple.
    Note that in thise case the group parameter is ignored. If group is non NIL otherwise, the query ranges over all the events 
    that are contained in the specified group.
    The implementation of queries can be found in HickoryQueryImpl.mesa. Again, the code should be self documenting. There is a number 
    of optimizations that are attempted when retrieving events. In particular, when searching for the tuple with a supplied key, we 
    first look into the lookaside cache of entered events. Only if the event is not found there, do we access the data base. 
    Queries are done in two steps when accessing the data base. First all the events with a single occurrence are retrieved. In 
    another query, all the repeated events are retrieved and if the ExpandRepetition bit in options is set, all occurrences of that 
    event falling in the given time inteval are generated. The rationale behind doing two queries to the database, was the belief 
    that the list of repeated events would be much shorter than the one of single events. Hence, we don't have to traverse a long 
    list of events in order to find repeated events, expand those and then reorder the entire list. Rather we deal with a larger 
    set of short lists, which is more suitable to most of the algorithms which are O(n) in the length of the lists we are dealing 
    with.
<<Grouping events and manipulating groups>>
    All  the operations relevant to groups are implemented by HickoryGroupImpl.mesa. Events can be grouped together by using routines 
    exported by either Hickory.mesa or HickoryGroup.mesa. A group in this context is a set of events. This means that an event can 
    be member of several groups at once. Removing it from one group, does not automatically remove it from another group. 
    The association between an event and the groups it belongs to is registered in a particular relation. In this relation there is an 
    index on the events, thus making retrieval of the groups of a particular event relatively efficient.
    HickoryGroupImpl maintains a cache of all groups that are declared within Hickory. This makes the ListGroup[] operations 
    particularly fast, but as inconvenience we have to maintain cache consistency correctly.
    There are certainly other possibilities of providing grouping semantics on events within the Cypress interface. I have used a 
    simple and straightforward approach, which had the big advantage to be operative in a relatively short time! I felt that using 
    the subdomain facilities was rather akward and might indeed not have provided the same semantics.
Declaring properties and querying them
    With each event one can associate any number of properties. Properties are characterized by their name and their value. The value 
    implicitly declares the type of the property. Currently values can only be REF INT, REF BOOLEAN, REF Rope.ROPE or REF 
    BasicTime.GMT. This is a limitation of Cypress. Once the type of a property is declared, all furhter values of the named 
    property must be of the correct type, otherwise an error is raised. 
    Properties are implemented as relations between events, property descriptors and the value. A property descriptor describes a 
    property by associating a key with a property name. As a result of this schema, there is one relation per data type, since 
    Cypress does not really allow for storing entities of any domain type.
    The reason to implement properties through a set of relations rather than using the property operations of Cyprress, was the fact 
    that Cypress does not allow to store attributes of domain AttributeDomain. Despite what the documentation claims.  One could 
    have gotten around that by redeclaring a property with version = OldOnly upon a query. But this seems to be as akward as the 
    way it is done now. As far as efficiency is concerned, it is hard to make a statement. Having an index on the events stored in 
    a property relation, is certainly helpful in retrieving a property of a given event.
Client Notification
    Hickory serving as shared data base to several clients ( currently Calendar and Remember) it is mandatory that clients can be 
    informed when the data base is modified. The HickoryNotifyImpl.mesa monitor provides the needed primitives. Note that this 
    monitor is having a separate monitor lock and does not use the lock provided by HickoryStorage.protData. A client can register 
    a procedure of type Hickory.NotifyProc. Whenever a write change to Hickory occurs, HickoryNotifyImpl forks off a process for 
    each registered notification procedure. The parameters of  the notification always include the reason of the notification. If 
    the change is specific to one event, the key of this event is also passed to the client. Occasionally, further information is 
    passed through the data parameter. 
    There are a couple of things of importance in this context. First, Hickory must guarantee that clients are notified when changes 
    occur. This implies the right calls being made to HickoryNotify from within Hickory. Second, the client must deal with a 
    notification in an appropriate way. This means to analyze the reason argument and then take the appoprite actions. As a further 
    caveat,  the client must be able to handle notifications that occur asynchronously. This probably means that the Hickory data 
    being cached within the client must be protected through a monitor!?

Other modules
    The implementation of Hickory includes some additional modules. Let's enumerate them and briefly describe what they do.

    HickoryStorageImpl.mesa: implementing the relation declaration and providing the global, private variables of Hickory.

    HickorySupportImpl.mesa: providing utility routines as well as doing the initialization of all the modules of Hickory.  The module 
    also contains two processes. One is monitoring the data base activity and shuts down any transaction that was idle for more 
    than five minutes. The other process wakes up every morning in the wee hours and maintains the data base. This consists of 
    storing the contents of all the caches in HickoryCacheImpl and destroying all the events whose KeepUntil attribute indicates 
    them as garbage. Throughout all that, cache consistency of the different caches within Hickory must of course be maintained.

    HickoryUtilImpl.mesa: registers and implements the three Commander procedure provided by Hickory. Hence it is a client of Hickory, 
    not exporting any routines to Hickory.mesa. Some of the code was plagiarized from the old Remember.
Some more details
    In this section we shall discuss some aspects of Hickory that have not yet been described.
Repeated events
    One of the more original aspecs of Hickory is the possibilty of dealing with repeated events. Quite obviously those kind of events 
    occur quite frequently in every days life, thus giving an incentive to deal with such things. We shall now describe to some 
    larger extent how Hickory deals with such repeated events.
    An event is marked as being a repetitive event through the RepeatType, RepeateTime and RepeatUntil attributes of a tuple. The first 
    of those attributes indicates the kind of repetition. Except the Complicated type, they are obvious. When the repetition type 
    of an event is complicated, RepeatTime is used to specify the repetition interval. RepeatTime must be Rope.ROPE that 
    Tempus.Parse can understand. For instance, "60 minutes" or "2 hours" are legal values. RepeatUntil gives the time when the 
    repeated event happens for a last time. As start time, we use the EventTime attribute. For each repeated event, we only store 
    the tuple representing the very first occurrence and the repetition descriptor.
    When we discover a repeated event ( RepeatType # None) we then must query the repetition descriptor in order to compute occurrences 
    of the repeated event in the given time interval. Computing repeated occurrences of the event is straightforward in principle, 
    but computationally expensive. In fact, we compute Ti+1 = Ti +     calendar system. A month can have 28, 29, 30 or 31 days, so how many seconds do separate an event that happens every 2 months? 
    The same is true for other repetition intervals. Hickory uses Tempus.Adjust to deal with that particularity.
    Another aspect to the problem is to find out the events happening in the interval [ t1, t2], knowing that the first occurrence was 
    at t0 and also knowing     approach is  computationally expensive. In order to attempt reducing the costs, Hickory provides a cache of time points on 
    which a repeated event has happened.  When attempting to compute an occurrence of a repeated event, we first look into the 
    cache if there is some time ts, s.t. ts <= t1. Doing this reduces the length of the generated sequence of time points. Only if 
    no suitable time can be found in the cache, do we have to start generating the sequence from the first occurrence of an event. 
    The caches used in that scheme are implemented in HickoryCacheImpl.mesa. Two constants are of relevance: the cache size and the 
    cost factor. The latter decides how often we insert newly generated time pointes into the cache. A cache itself is an 
    increasingly sorted array of time points, bound to a particular repeated event. Once the cache is full, elements are deleted 
    more or less randomly upon insertion of a newly computed time. Having to maintain the order of the cache makes insertion of 
    newly computed time points relatively expensive. Given the rather large cache size and the assumption that time intervals of 
    successive queries are not spread out too far  apart on the time axis, the overall efficiency should be satisfactory.
    Caches of time points are periodically saved onto stable storage. The aim again is to keep track of time as it goes by. Query 
    intervals are indeed assumed  to be progressing on the time axis. In other terms, if Query1 specifies a time interval [ t1, 
    t2], we think it is very likely that a subsequent query, Query2 uses a time interval [ t3, t4], s.t. t3 > t1.
    A further point of interest when dealing with repeated events is the LastOccurred attribute with a repetition descriptor tuple. As 
    real time goes by, we keep this attribute up to date; i.e. for a daily event we generate the occurrence of it on every day, 
    although we might not necessarily do a query on that day. If however we then make a query for today, we know the occurrence of 
    the event of yesterday and can rapidly generate the  occurrence time, without having to use the very first occurrence ever. 
    Again, this is a heuristic approach and might only be of limited usefulness.
Dealing with transactions
    One of the rather nast aspects of transactions, are aborts. How does Hickory deal with that?
    For one, all client calls to Hickory are implemented as one single transaction. The very first thing we do, is to make certain that 
    everything looks ok. This means a transaction must be open. Once this condition is met, we proceed with the transaction proper. 
    At the end of it we mark the transaction through the appropriate call to Cypress.
    In case of an abort, we handle the raised exception as follows. The transaction is closed and we reinitialize the data base scheme. 
    Then a new transaction is opened and we reexecute the entire transaction. When opening a new transaction after an abort, we 
    wait until no other transaction is open on  the data base segment. This is aimed to reduce further conflicts. When restarting 
    transactions, we only try once. A repeated failure will generate an error, not caught by Hickory.
    Currently transactions are started over until they finally succeed. This might not be a totally  good idea, since could bring us 
    into a kind of infinit loop..... Currently however the implementation seems to be sufficiently stable for moderate, careful use.
Conclusions ( if any)
    It's time to come to a stop. We shall now briefly assess some aspects of Hickory. What are the good things and the bad things of 
    Hickory? I will provide my personal view point, which is of course totally subjective to the matter.
    First, the positive things. Hickory works, it has two clients and given the short amount of time in which it was created, this is 
    obviously a good point. Hickory provides repeated events, which is an interesting feature of it. The interface seems to be 
    solid enough for some further clients, although some of the implementation might not be overly efficient.
    And now the bad points. Having been under time pressure, some of the code might be lousy and not take full advantage of the Cypress 
    interface. Hickory does not deal very well with protection issues to the data base. For now it is a simple personal data base. 
    It might be interesting to investigate into a muliperson calendar system. Possibly this might still use Hickory as it stands 
    now, but I expect that Hickory must be altered at least slightly.  .