The main decision to be made was the schema to be used in the data base. The information that we need to store for each person are basically the following:
- Last Name,
- First Name and other initials,
- RName,
- all associated phone numbers, of different or similar functions.
One early decision taken was to represent people as entities ( that sounds like a reasonable one) and thus we need a unique identifier for each person. An obvious candidate is the persons RName, which by definition is unique. At first I did that, but soon discovered that RNames were not good enough, people like Greg Nelson, if you didn't know him as GNelson it would be hopeless to find him. Also we would like to store people who don't work at Xerox at all ( me for example soon) and in this case the private data base especially would be greatly lacking. Why not the last name, one quick look at any phone list will dismiss any illusions about uniqueness of them.( example: Brown, only here at Parc we have: Allen, Darrah, John, Kerry, Mark and Ron) . What we came up to use is the format lastname,first<rName>. Which guarantees uniqueness and at the same time gives access to people using there last names, if possible.
Another interesting point that was brought up by Mark Brown was the use of Soundex Codes as a way of detecting misspelt names. Basically each name is associated with a code that represents it, different names with similar sounding pronounciations are mapped to the same code, so ElAbbadi and ElApetty have the same soundex code( see Knuth Vol. 3 ). So this adds three more pieces of information in the data base per person( Soundex codes for last, first and rnames).
Now for the data base design, one can always design one gigantic universal relation that encompuses all the required information. This has the disadvantage of causing anomalies to occur fairly easily, as well as the difficulty of manipulating them. We chose the following design which tends to break the relations with logically dependant information together:
A domain of persons with " last, first<rname> " as the unique name to it.
A phone relation [personname: Person,
phone: ROPE,
phonekind: ROPE],
A name relation [personname: Person,
lastname: ROPE,
lastnameSoundex: ROPE,
firstname: ROPE,
firstnameSoundex: ROPE],
and an rName relation[personname: Person,
rName: ROPE,
rNameSoundex: ROPE]
The next point was to develop the two layered data bases, a private one and public one that have similar structure , same domains and relations, but one on the private disk and the other on a public database server. Unfortunately cypress at the moment doesn't have the facilities to switch between segments ( although that is included in its basic design it hasn't been implemented yet ). So, we had to develop a layer on top of cypress to deal with this.
Two points have to be dealt with here :
1- Who has write access to a given database ? The option we took was that only one person can have the right to do that i.e. a master in charge of the public data base and each person in his own private data base. This means that a given client can write in one database only. This also causes the undesirable necessity of the need for one person to maintain the public data base, but this is needed to maintain the integrity of this database and of the information in it.
2-Who has read access ? In this case we want the information to be easily obtainable to anybody and without any effort in the process of switching between the private and the public databases, if you want somebody's phone number you get all that is available on both the public and private data bases.