--CSL-Whitepages
-- Last Edited by: Elabbadi, August 4, 1983 10:29 am
          CSL Notebook Topic
To Recipients  DateAugust 1, 1983 3:13 pm
From Amr El Abbadi  Location PARC
Subject A Phone Directory  Organization CSL
Release as TargetFileName
Came from
 WorkingFileName (this form is on [Indigo]<CedarDocs>Style>Memo.form)
Last editedby Amr El Abbadi, August 1, 1983 3:15 pm
Abstract CSL now has a public phone database on Alpine, using it with Finch and Squirrel you can make a phone call with two mouse clicks. A description is presented here of my summer project at CSL to create this database, and the necessary tools to handle both private and public databases.

Definition Of Project

To design and develop tools to build and access a data base of phone numbers, using Cypress schemas and design tools.

We built a public PARC-wide whitepages directory on Alpine, from the most accurate information that we had, the WhitePagesCNF.pa file. But that is not enough for a useful phone database, we also need a private one where each individual may store his/her own private information, but still access the entire database as one entity. We therefore built a layer on top of Cypress that would allow the easy access to both a private and the public database.

The database tools include facilities to handle spelling mistakes, using the Soundex Code.

Finally the database is to be integrated with Finch and the Voice Project so as to be able to place phone calls through the database.


Design
The main decision to be made was the schema to be used in the data base. The information that we need to store for each person are basically the following:

- Last Name,
- First Name and other initials,
- RName,
- all associated phone numbers, of different or similar functions.

One early decision taken was to represent people as entities ( that sounds like a reasonable one) and thus we need a unique identifier for each person. An obvious candidate is the persons RName, which by definition is unique. At first I did that, but soon discovered that RNames were not good enough, people like Greg Nelson, if you didn't know him as GNelson it would be hopeless to find him. Also we would like to store people who don't work at Xerox at all ( me for example soon) and in this case the private data base especially would be greatly lacking. Why not the last name, one quick look at any phone list will dismiss any illusions about uniqueness of them.( example: Brown, only here at Parc we have: Allen, Darrah, John, Kerry, Mark and Ron) . What we came up to use is the format lastname,first<rName>. Which guarantees uniqueness and at the same time gives access to people using there last names, if possible.

Another interesting point that was brought up by Mark Brown was the use of Soundex Codes as a way of detecting misspelt names. Basically each name is associated with a code that represents it, different names with similar sounding pronounciations are mapped to the same code, so ElAbbadi and ElApetty have the same soundex code( see Knuth Vol. 3 ). So this adds three more pieces of information in the data base per person( Soundex codes for last, first and rnames).

Now for the data base design, one can always design one gigantic universal relation that encompuses all the required information. This has the disadvantage of causing anomalies to occur fairly easily, as well as the difficulty of manipulating them. We chose the following design which tends to break the relations with logically dependant information together:


A domain of persons with " last, first<rname> " as the unique name to it.
A phone relation [personname: Person,
  phone: ROPE,
  phonekind: ROPE],
A name relation [personname: Person,
  lastname: ROPE,
  lastnameSoundex: ROPE,
  firstname: ROPE,
  firstnameSoundex: ROPE],
and an rName relation[personname: Person,
  rName: ROPE,
  rNameSoundex: ROPE]


The next point was to develop the two layered data bases, a private one and public one that have similar structure , same domains and relations, but one on the private disk and the other on a public database server. Unfortunately cypress at the moment doesn't have the facilities to switch between segments ( although that is included in its basic design it hasn't been implemented yet ). So, we had to develop a layer on top of cypress to deal with this.

Two points have to be dealt with here :

1- Who has write access to a given database ? The option we took was that only one person can have the right to do that i.e. a master in charge of the public data base and each person in his own private data base. This means that a given client can write in one database only. This also causes the undesirable necessity of the need for one person to maintain the public data base, but this is needed to maintain the integrity of this database and of the information in it.

2-Who has read access ? In this case we want the information to be easily obtainable to anybody and without any effort in the process of switching between the private and the public databases, if you want somebody's phone number you get all that is available on both the public and private data bases.

Implementation Issues

There are two main parts of relevance to the phone config programs:

1- A layer on top of Cypress developed by Rick Cattell and myself, which provides all the necessary mechanism between the two layers, so it has new relation and entity declaration and subsets, and GetF to retrieve specific fields from entries on both databases.

2- The main phone registry tools and these may be summarized as follows:

- A procedure to initialize and build a phone database.

- Procedures to register a person in a database, with a given name/rname/phone/phonekind.

- Procedures to retrieve name/phone of a person in both databases

- A procedure to parse a file with a specified format (whitepagesCNF.pa format ) and add the persons in it to the database.

- A procedure to form a list of individuals from the database, with their names and phone numbers.




PitFalls

I spent some very interesting time finding out about monitors, monitored records, and associated concurrent issues in cedar, and as a matter of fact implemented procedures that use cypress to perform searches in parallel in both databases. Soon I found from Rick that this was all an illusion and that cypress would seialize my searches at any rate, and thus searches might take even longer time than if I had no "concurrency"!! This was a surprise and it really would be wonderful if cypress was able to get around this problem.



What Is Missing?

The main missing feature is integrating whitepages with finch and the voice project, this will provide all the facilities to make phonecalls using the database. Dan Swinehart and the Voice Project team is getting this ready for use soon. Now, we have squirrel which can easily handle one database at a time, and help us to browse through it and also make phonecalls by two clicks of a mouse button ( recently just developed ) .

One other feature that I would like to see in a phone directory like this would be to add tools to maintain phone list sets a la walnut, e.g. a CedarImplementors' list, CSL list, friends list...etc. This is a useful feature and poses some interesting questions like: Is there a relation between public phone lists and private ones? and if there is what is the relation? This question generalizes to the relation between the private and public databases? As it stands there are no consistacy constraints between them.

Conclusion
This was a wonderful experience to get into the exciting world of Cedar and the Dorado . I also had a great time using cypress and falling into its bugs, ( I believe it is now very bug proof?! Keep your fingers crossed) and then jumping out of them on squirrel!. I enjoyed interacting with Alpine, as a public data base server, which gave me the first chance to have a direct feeling for the uses of such a server, and situations that arise as a result of that, like transaction abortion or read-write option alternatives.

My interfaces are ready for use by Finch and the Voice project. The public database is on Alpine and is available for general usage.