-- 9-13-82.minutes -- design discussions between John and Rick on augments On Monday, September 13th and Tuesday, September 14th we met to thrash out a design for multiple data bases. This note is an attempt to reconstruct our conclusions and motivations. Some reasons for multiple data bases: 1) To be able to segment data for recovery and privacy. For instance, Walnut needs a separate DB to facilitate dumping and reloading its data. 2) To be able to combine private and public data bases without physical dependencies. (Provide the user with the illusion that he can add data to a public DB.) We want to be able to change both DBs independently, although not at the same time. 3) To be able to implement versions. (Provide the user with the illusion that he can subtract data from a public DB). This would allow us to make a new DB that was a slight variation on an old DB without having to copy all of the data. We came up with the following four implementations of DBs, in order of increasing functionality: 1) SEGMENTS. Each segment is physically and logically separate from all other segments. Entities, Domains, and Relships are only valid for the segment in which they are defined. No possibility of sharing data except at the level at which the user interprets it. 2) ADDITIVE AUGMENTS. Augments share data by relying on the name-equivalence of objects. If two domains or two relations have the same name, they are equivalent. If two entities belong to the same domain and have the same name, they are equivalent. We call equivalent objects in different augments "namesakes" of each other. If an object is created in an augment just for the purpose of refering to its namesake in a different augment, it is called a "shadow" of its namesake. If an augment is "additive" it means that the sum of two augments is the sum of all of the objects in the augments with duplicates allowed. Thus there may be more than one key relationship (although still at most one per augment), and some tuples may be duplicated. 3) SUBTRACTIVE AUGMENTS. Subtractive augments are just like additive augments with "anti-tuples" defined. An "anti-tuple" is a special tuple which cancels its namesake in another augment. Anti-tuples are automatically created for DestroyDomain, DestroyRelation, DestroyRelship, DestroyEntity, and SetF. (SetF is implemented by copying the old tuple, "destroying" it, and changing the data in the new tuple.) There is no way to explictly create or delete an anti-tuple. Subtractive augments don't guarantee the uniqueness of key tuples. 4) FULL AUGMENT. A "full" augment is an augment that maintains all the behavior of a single augment, no matter what operations are performed. This boils down to two requirements: a) maintaining the uniqueness of key tuples (what we call the "key invariant"), and b) making sure that when a tuple is deleted, it remains deleted (requires anti-tuples plus . . . ?). We played with a number of schemes for maintaining the key invariant in a full augment. One was to compare keys at the time of a RelationSubset and discard corresponding tuples. This had two problems: 1) it was slow, taking NlogN time instead of N, and 2) it didn't help us at all with deleting non-key tuples. We discarded this scheme in favor of anti-tuples. When a full augment is first opened, the system scans the key index looking for corresponding tuples in the other augments. If any are found, anti-tuples are created to mask them. When subsequent augments are opened, the process is repeated. This seems fairly expensive but workable. As a compromise, we invented the notion of a subtractive augment which ignores this inititialization. (Surprisingly, subtractive augments also have nicer semantics when the underlying data change.) Another problem is: What happens when the underlying augment is changed by someone else when you aren't looking at it? One solution is to say that you can't change underlying data bases, but that seems too restrictive. We would prefer to make the system tolerant of changes, if at all possible. The possible changes are: 1) renaming an entity. This is disastrous since we depend on the correspondence of names to determine the correspondence of entities. If an entity is renamed in one augment, the correspondence is broken. We discussed trying to automatically update augments, but it is difficult if more than one augment depends on the correspondence or if the new name conflicts with an existing name in a different augment. This problem may require an administrative solution. 2) changing the schema. This is also disastrous. We may be able to handle some changes (like adding an attribute to a relation) but the general problem is very hard. This problem may require an administrative solution. 3) creating new data. OK. 4) deleting a tuple. OK. 5) deleting an entity. This is OK because shadow entities still remain in the other augments. 6) changing the key attribute of a relation. OK. This has the same semantics as deleting a tuple and creating a new one. 7) changing the non-key attribute of a relation. OK for all of the augments but a full augment. Imagine that the "age" relation was a key attribute of the Person, and Smith's age was "30" in the public data base. Jones knows that Smith is now 31, and so he modifies his DB accordingly. A year later, Smith turns 32 and the public DB is updated to reflect this. Jones' DB will still say "31" with no hint that the public DB is different. Contrast this with what happens with subtractive augments. When Jones updated his DB, an anti-tuple is created saying in effect "ignore the tuple that says that Smith is 30". As long as the public DB remains the same, Jones only sees one tuple. But, when the Smith's age is updated to "32", the anti-tuple no longer applys and Jones will see both tuples. We concluded that a reasonable implementation strategy was to implement additive augments and then later implement subtractive augments. Both types of augments could be mixed freely. Subtractive augments require about twice as much work as additive augments, but you don't have to undo anything in the process. Full augments would only be implemented if we were convinced that they were useful. After some thought, we also decided that each DB operation should explicitly give the augment or list of augments that it acts on. Although this is an inconvenience for the programmer, we couldn't find an algorithm for implicitly determining the correct augments that we were confident would work in all cases. Since applications such as Walnut want to be able to specify the augment it writes to, we must have the option of giving the augment. Other application may want to write to either the private or the public version of its augment. It is possible to guess the correct augment for new relationships, but not for new entities. Future applications may also want to constrain the augments that are read from. We decided that if a write operation defaulted the augment, the write would go to the default augment (perhaps $[Local]MiscDB). If a read operation defaults the augment list, the system would read from all of the currently open augments. A separate proposal we discussed was to use atoms to represent Domains, Relations and Attributes rather than augment-specific data structures. These would remain valid even when augments were aborted or closed. This would make it simpler for applications to recover from an abort.