-- /ivy/binding/hickory/hickoryDoc.tioga
-- documentation and descr of Hickory
-- Last edited by: Binding, August 24, 1984 4:22:39 pm PDT
Hickory:
A data base for a calendar and reminder system
Carl Binding (August 31, 1984)
Filed on: [Ivy]<binding>hickory>hickoryDoc.tioga

© Copyright 1984 Xerox Corporation. All rights reserved.
Abstract: This document is a last minute effort to document the general organization and some of the internals of Hickory. Hickory is a Cypress client, providing operations on event tuples. Operations provided are storing/retrieving of event describing tuples, grouping events together, associating properties with events. Hickory also provides client notification in case of write modification to the data base.
XEROX  Xerox Corporation
   Palo Alto Research Center
    3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
Introduction
Hickory is a database for storing, retrieving and generally dealing with events. Events are considered entities as they appear in a calendar, for example. Thus the main attributes of an event are a time of occurrence, the nature of the event, where the event takes place etc. As a particularity of Hickory, it deals with repeated events. Repetition is described by a subset of event attributes and upon an appropriate query, Hickory computes all occurrences of a repeated event within the specified time interval. Special care was given to implement that aspect of Hickory efficiently.
Events come with a fixed number of attributes. Those are more or less the attributes stored with an event in the main relation of Hickory. However, Hickory provides the possibility of associating properties of types Boolean, Integer, Rope or Time with an event. Thus, greater flexibility is achieved.
Hickory also provides a mechanism allowing for grouping events. An event can be a member of none, one or several groups. Queries allow to retrieve events of a given group. Group membership of an event is registered in a separate relation. This means that deleting a particular group does not destroy the events within the group.
Hickory being a stable storage facility for events, it must provide means to notify clients whenever a write change to the database has occurred. For this purpose Hickory provides facilities for registering notification procedures for any Hickory client. Whenever a write modification is done to the data base, Hickory signals all its clients via the registered notification procedures. The arguments to those procedures allow the client to update its internally cached data appropriately.
The purpose of this document is to describe the internal organization of Hickory and also to give a general overview. Thus, reading this document will help in maintaining Hickory and fix remaining bugs. First, we discuss the externals of hickory, i.e. the interface and commander commands that Hickory provides. Then we shall describe the general organization of the program as well as some implementation details.
External aspects of Hickory
Hickory provides a set of operations on event tuples. Those operations are defined in the interface
Hickory.mesa. The semantics of the operations should be self documenting, therefore we shall not discuss the interface in greater detail here.
Hickory registers three Commander commands. These are the following:
CreateHickory <fileName>: This command allows to read the specified text file to produce an initial data base. The format of the text file is compatible with the format that the old Remember package produced. Thus, saying "CreateHickory RememberEvents.txt" to the Commander will create a data base that can then be used by any Hickory client. The default for <fileName> is RememberEvents.txt.
DumpHickory <fileName>: This is the inverse from the above. All the event tuples contained in Hickory are written onto the specified file. However the produced file might contain more information than Remember used to generate. Thus, you will not be able to use a file produced by DumpHickory with the old Remember! The default for <fileName> is RememberEvents.txt.
DestroyHickory: This erases the data base segment on which the Hickory data base is stored. Only if you have saved the contents onto a text file can you regenerate the data base, through the CreateHickory command.
Hickory also uses the user profile. The following user profile entries are used:
Hickory.HickorySegmentFile: This is the Alpine file name of the used segment to store the Hickory data. The segment name is defaulted to "[Luther.alpine]<userName>Hickory.segment".
Hickory.HickoryGroups: When creating a new Hickory data base it might be of interest to specify an initial set of groups into which events can be grouped. You must specify a set of Cedar IDs, separated by ',. The used default for groups is no groups at all. Hickory clients, of course, can declare new groups at will.
General organization of the program
Hickory is organized as a multiple module monitor. The interface visible to Hickory clients is defined in Hickory.mesa. Implementation of this public interface is scattered over a set of modules. Some of these modules have private interfaces in order to allow cross module communication. The following interfaces are private to Hickory and not visible to a regular Hickor client.
HickoryStorage.mesa: This interface defines all the relations used in Hickory. The only exported operation allows to initialize those relations with Cypress. All the used relations are grouped into a monitored record, which provides the monitor lock used throughout Hickory. Additionally to the data base relations, this record contains some global variables, used throughout the program. Hence, HickoryStorage exports all global, private variables of Hickory. In all other modules of the monitor, HickoryStorage.protData is therefore imported and for greater convenience the record is opened....!
HickorySupport.mesa: This interface defines some frequently used operations. These are mainly casting routines between different data types, routines allowing to display messages into the Hickory Log viewer and more importantly two operations that deal with starting and restarting transactions. The interface also provides routines used in generating occurrences of repeated events.
HickoryGroup.mesa: Provides some of the Hickory internal operations dealing with grouped events. Additionally, the interface also exports routines dealing with the cache of groups maintained in HickoryGroupImpl.mesa.
HickoryCache.mesa: This interface exports operations useful for dealing with the caches that Hickory maintains in order to speed up generation of occurrences of repeated events. It also provides a set of operations on the lookaside cache of event tuples, which is used for retrieval of an event, with a known key.
HickoryProp.mesa: Here we define operations dealing with properties of events. For now only operations on the cache of property names as it is maintained by HickoryPropImpl.mesa are exported through this interface.
HickoryOwner.mesa: To implement access protection to Hickory. When the data base segment is first opened, the user name and password of the user are stored in a relation. Further access to Hickory is then matched against those stored user credentials. It is only the owner that can do write modifications to the data base! The interface also contains a routine dealing with changes to the owner's password. However, I am not sure if I got that right and its usefulness might be limited anyway.
HickoryDestroy.mesa: This interface exports a routine dealing with destroying events whose KeepUntil attribute indicates that it is time to purge them from the data base.
HickoryNotify.mesa: Provides the operations used when dealing with client notification. Provides operations for registering client provided routines and activating those routines, whenever a write update was done to Hickory.
Implementation of those interfaces and the public Hickory interface is provided by the set of modules described next. As a convention all module names are the interface name with the suffix "Impl".
Implementation of it all
Some of the implementation details of Hickory are very straightforward, others are maybe a little harder to grasp at first glance. In the following we shall discuss some of the things that we think to be of a slightly higher complexity. We neither claim that the implementation is always optimal nor that the justification for doing things a given way is always correct. Furthermore, needless to say that there are probably still some hidden bugs within Hickory. Rather than discussing each implementation module in detail, we shall discuss them grouped by the functionality they provide to the public Hickory interface.
General utilities
The operations provided here are dealing with the two main data types provided by Hickory, i.e. EventTuple and EventList. Implementation of those routines can be found in HickoryStoreImpl.mesa and HickorySupportImpl.mesa.
Storing and altering events
Operations that deal with storing and changing event tuples are implemented in HickoryStoreImpl.mesa. The code of those routines is relatively straightforward, so no in depth discussion is provided here. The only thing to worry about is to maintain the cache of entered events correctly. When changing attributes of an event we also must deal with caches of occurrence times in the case of a repeated event.
Destroying events, forgetting them and destroying the data base.
All the routines dealing with those operations are implemented in HickoryDestroyImpl.mesa. The code should be self documenting. Again the thing to worry about is to maintain cache consistency in the case of a deletion of an event. Forgetting an event does not destroy the tuple, it merely changes the Forgotten attribute of the tuple. As a result, not all queries might retrieve that event. ( see below)
Querying the data base
Querying the data base is probably the most tricky thing in Hickory. First, there are a couple of query options provided. Let us explain them briefly:
NoOptions: when supplying this value to a query, all the events that are not forgotten are returned. For repeated events only the first occurrence is returned. Events are returned regardless of them being a reminder or not.
All: when setting the All bit in the options parameter, Hickory returns the events that were previously forgotten as well as the non forgotten events of course.
ExpandRepetitions: Hickory generates an EventTuple for each occurrence of a repeated event that falls into the given time interval. Thus the client sees a set of events, in which each occurrence of an event corresponds to a tuple in the set. Expansion of repeated events is costly so be warned. For more details on how things are done, see below.
Reminders: If the Reminder bit in options is set, Hickory only returns the event tuples that have their Remind attribute set. If the bit is not set, the Remind attribute is ignored in a query.
Further parameters to a query specify the time interval into which events must fall in order to be considered. By providing the key of an EventTuple, it is possible to retrieve the associated tuple.
Note that in thise case the group parameter is ignored. If group is non NIL otherwise, the query ranges over all the events that are contained in the specified group.
The implementation of queries can be found in HickoryQueryImpl.mesa. Again, the code should be self documenting. There is a number of optimizations that are attempted when retrieving events. In particular, when searching for the tuple with a supplied key, we first look into the lookaside cache of entered events. Only if the event is not found there, do we access the data base. Queries are done in two steps when accessing the data base. First all the events with a single occurrence are retrieved. In another query, all the repeated events are retrieved and if the ExpandRepetition bit in options is set, all occurrences of that event falling in the given time inteval are generated. The rationale behind doing two queries to the database, was the belief that the list of repeated events would be much shorter than the one of single events. Hence, we don't have to traverse a long list of events in order to find repeated events, expand those and then reorder the entire list. Rather we deal with a larger set of short lists, which is more suitable to most of the algorithms which are O(n) in the length of the lists we are dealing with.
Grouping events and manipulating groups
All the operations relevant to groups are implemented by HickoryGroupImpl.mesa. Events can be grouped together by using routines exported by either Hickory.mesa or HickoryGroup.mesa. A group in this context is a set of events. This means that an event can be member of several groups at once. Removing it from one group, does not automatically remove it from another group.
The association between an event and the groups it belongs to is registered in a particular relation. In this relation there is an index on the events, thus making retrieval of the groups of a particular event relatively efficient.
HickoryGroupImpl maintains a cache of all groups that are declared within Hickory. This makes the ListGroup[] operations particularly fast, but as inconvenience we have to maintain cache consistency correctly.
There are certainly other possibilities of providing grouping semantics on events within the Cypress interface. I have used a simple and straightforward approach, which had the big advantage to be operative in a relatively short time! I felt that using the subdomain facilities was rather akward and might indeed not have provided the same semantics.
Declaring properties and querying them
With each event one can associate any number of properties. Properties are characterized by their name and their value. The value implicitly declares the type of the property. Currently values can only be REF INT, REF BOOLEAN, REF Rope.ROPE or REF BasicTime.GMT. This is a limitation of Cypress. Once the type of a property is declared, all furhter values of the named property must be of the correct type, otherwise an error is raised.
Properties are implemented as relations between events, property descriptors and the value. A property descriptor describes a property by associating a key with a property name. As a result of this schema, there is one relation per data type, since Cypress does not really allow for storing entities of any domain type.
The reason to implement properties through a set of relations rather than using the property operations of Cyprress, was the fact that Cypress does not allow to store attributes of domain AttributeDomain. Despite what the documentation claims. One could have gotten around that by redeclaring a property with version = OldOnly upon a query. But this seems to be as akward as the way it is done now. As far as efficiency is concerned, it is hard to make a statement. Having an index on the events stored in a property relation, is certainly helpful in retrieving a property of a given event.
Client Notification
Hickory serving as shared data base to several clients ( currently Calendar and Remember) it is mandatory that clients can be informed when the data base is modified. The HickoryNotifyImpl.mesa monitor provides the needed primitives. Note that this monitor is having a separate monitor lock and does not use the lock provided by HickoryStorage.protData. A client can register a procedure of type Hickory.NotifyProc. Whenever a write change to Hickory occurs, HickoryNotifyImpl forks off a process for each registered notification procedure. The parameters of the notification always include the reason of the notification. If the change is specific to one event, the key of this event is also passed to the client. Occasionally, further information is passed through the data parameter.
There are a couple of things of importance in this context. First, Hickory must guarantee that clients are notified when changes occur. This implies the right calls being made to HickoryNotify from within Hickory. Second, the client must deal with a notification in an appropriate way. This means to analyze the reason argument and then take the appoprite actions. As a further caveat, the client must be able to handle notifications that occur asynchronously. This probably means that the Hickory data being cached within the client must be protected through a monitor!?
Other modules
The implementation of Hickory includes some additional modules. Let's enumerate them and briefly describe what they do.
HickoryStorageImpl.mesa: implementing the relation declaration and providing the global, private variables of Hickory.
HickorySupportImpl.mesa: providing utility routines as well as doing the initialization of all the modules of Hickory. The module also contains two processes. One is monitoring the data base activity and shuts down any transaction that was idle for more than five minutes. The other process wakes up every morning in the wee hours and maintains the data base. This consists of storing the contents of all the caches in HickoryCacheImpl and destroying all the events whose KeepUntil attribute indicates them as garbage. Throughout all that, cache consistency of the different caches within Hickory must of course be maintained.
HickoryUtilImpl.mesa: registers and implements the three Commander procedure provided by Hickory. Hence it is a client of Hickory, not exporting any routines to Hickory.mesa. Some of the code was plagiarized from the old Remember.
Some more details
In this section we shall discuss some aspects of Hickory that have not yet been described.
Repeated events
One of the more original aspecs of Hickory is the possibilty of dealing with repeated events. Quite obviously those kind of events occur quite frequently in every days life, thus giving an incentive to deal with such things. We shall now describe to some larger extent how Hickory deals with such repeated events.
An event is marked as being a repetitive event through the RepeatType, RepeateTime and RepeatUntil attributes of a tuple. The first of those attributes indicates the kind of repetition. Except the Complicated type, they are obvious. When the repetition type of an event is complicated, RepeatTime is used to specify the repetition interval. RepeatTime must be Rope.ROPE that Tempus.Parse can understand. For instance, "60 minutes" or "2 hours" are legal values. RepeatUntil gives the time when the repeated event happens for a last time. As start time, we use the EventTime attribute. For each repeated event, we only store the tuple representing the very first occurrence and the repetition descriptor.
When we discover a repeated event ( RepeatType # None) we then must query the repetition descriptor in order to compute occurrences of the repeated event in the given time interval. Computing repeated occurrences of the event is straightforward in principle, but computationally expensive. In fact, we compute Ti+1 = Ti + Dt. One problem is obviously the non linearity of the gregorian calendar system. A month can have 28, 29, 30 or 31 days, so how many seconds do separate an event that happens every 2 months? The same is true for other repetition intervals. Hickory uses Tempus.Adjust to deal with that particularity.
Another aspect to the problem is to find out the events happening in the interval [ t1, t2], knowing that the first occurrence was at t0 and also knowing Dt. The straightforward approach is to start generating time points ti until t1 <= ti <= t2. This approach is computationally expensive. In order to attempt reducing the costs, Hickory provides a cache of time points on which a repeated event has happened. When attempting to compute an occurrence of a repeated event, we first look into the cache if there is some time ts, s.t. ts <= t1. Doing this reduces the length of the generated sequence of time points. Only if no suitable time can be found in the cache, do we have to start generating the sequence from the first occurrence of an event.
The caches used in that scheme are implemented in HickoryCacheImpl.mesa. Two constants are of relevance: the cache size and the cost factor. The latter decides how often we insert newly generated time pointes into the cache. A cache itself is an increasingly sorted array of time points, bound to a particular repeated event. Once the cache is full, elements are deleted more or less randomly upon insertion of a newly computed time. Having to maintain the order of the cache makes insertion of newly computed time points relatively expensive. Given the rather large cache size and the assumption that time intervals of successive queries are not spread out too far apart on the time axis, the overall efficiency should be satisfactory.
Caches of time points are periodically saved onto stable storage. The aim again is to keep track of time as it goes by. Query intervals are indeed assumed to be progressing on the time axis. In other terms, if Query1 specifies a time interval [ t1, t2], we think it is very likely that a subsequent query, Query2 uses a time interval [ t3, t4], s.t. t3 > t1.
A further point of interest when dealing with repeated events is the LastOccurred attribute with a repetition descriptor tuple. As real time goes by, we keep this attribute up to date; i.e. for a daily event we generate the occurrence of it on every day, although we might not necessarily do a query on that day. If however we then make a query for today, we know the occurrence of the event of yesterday and can rapidly generate the occurrence time, without having to use the very first occurrence ever. Again, this is a heuristic approach and might only be of limited usefulness.
Dealing with transactions
One of the rather nast aspects of transactions, are aborts. How does Hickory deal with that?
For one, all client calls to Hickory are implemented as one single transaction. The very first thing we do, is to make certain that everything looks ok. This means a transaction must be open. Once this condition is met, we proceed with the transaction proper. At the end of it we mark the transaction through the appropriate call to Cypress.
In case of an abort, we handle the raised exception as follows. The transaction is closed and we reinitialize the data base scheme. Then a new transaction is opened and we reexecute the entire transaction. When opening a new transaction after an abort, we wait until no other transaction is open on the data base segment. This is aimed to reduce further conflicts. When restarting transactions, we only try once. A repeated failure will generate an error, not caught by Hickory.
Currently transactions are started over until they finally succeed. This might not be a totally good idea, since could bring us into a kind of infinit loop..... Currently however the implementation seems to be sufficiently stable for moderate, careful use.
Conclusions ( if any)
It's time to come to a stop. We shall now briefly assess some aspects of Hickory. What are the good things and the bad things of Hickory? I will provide my personal view point, which is of course totally subjective to the matter.
First, the positive things. Hickory works, it has two clients and given the short amount of time in which it was created, this is obviously a good point. Hickory provides repeated events, which is an interesting feature of it. The interface seems to be solid enough for some further clients, although some of the implementation might not be overly efficient.
And now the bad points. Having been under time pressure, some of the code might be lousy and not take full advantage of the Cypress interface. Hickory does not deal very well with protection issues to the data base. For now it is a simple personal data base. It might be interesting to investigate into a muliperson calendar system. Possibly this might still use Hickory as it stands now, but I expect that Hickory must be altered at least slightly. .