[_CD8_]<cedardocs>database>C+F4.memo!1

Heading:

Cypress database concepts and facilities, October 1982

Page Numbers: Yes X: 530 Y: 10.5" First Page: 31

5. A sample program

As a simple example, we will consider the following concocted program, that defines a small schema and database of persons with their names, friends, and phone numbers. It also defines two other domains, "frogs" and "things", and does a number of queries on the database. It illustrates the use of most of the procedures defined in this section.

VLTest1Impl: PROGRAM
IMPORTS DBView, IOStream =

BEGIN OPEN IOStream, DBView;

tty: IOStream.Handle← CreateTTYStreams["VLTest1Impl.log"].out;

Person, Frog, Thing: Domain;
Friend, Phone: Relation;
ageProp, heightProp: Attribute;
friendOf, friendIs: --Friend-- Attribute;
wsh, harry, mark, nori: --Person-- Entity;
george: --Frog-- Entity;
rock, pebble: --Thing-- Entity;
myName: ROPE ← "Rick Cattell";
myExt: REF INT← NEW[INT ← 4466];

The following routine defines the domains and relations and through calls to the database system. Since the version parameter to DeclareDomain and DeclareRelation, and DeclareAttribute defaults to NewOnly, an ERROR would be raised if this program were executed a second time.

Initialize: PROC =
BEGIN
tty.Put[rope["Defining data dictionary..."], char[CR]];
-- Declare domains and make Persons and Frogs be subtypes of Things:
Person← DeclareDomain["Person"];
Thing← DeclareDomain["Thing"];
Frog← DeclareDomain["Frog"];
DeclareSubType[of: Thing, is: Person];
DeclareSubType[of: Thing, is: Frog];
-- Declare phone age and height property of Things
ageProp← DeclareProperty["Age", Thing, IntType];
heightProp← DeclareProperty["Height", Thing, StringType];
-- Declare Friend relation between Persons
Friend← DeclareRelation["Friend"];
friendOf← DeclareAttribute[Friend, "of", Person];
friendIs← DeclareAttribute[Friend, "is", Person];
-- Declare Phone relation between Persons and integers
Phone← DeclareRelation["Phone"];
phoneOf← DeclareAttribute[Phone, "of", Person];
phoneIs← DeclareAttribute[Phone, "is", IntType];
END;

The following routines creates various relationships and entities. Several methods are used to assign the attributes to illustrate the alternatives.

InsertData: PROC =
BEGIN t: Relship; rick: Entity;
tty.Put[rope["Creating data..."], char[CR]];
harry← DeclareEntity[Person, "Harry Q. Bovik"];
mark← DeclareEntity[Person, "Mark Brown"];
rick← DeclareEntity[Person, myName];
nori← DeclareEntity[Person]; SetName[nori, "Nori Suzuki"];
-- Data can be assigned with SetP, SetF, or DeclareRelship’s initialization list:
SetP[harry, phoneIs, I2V[4999]];
SetP[mark, phoneIs, I2V[4464]];
SetP[nori, phoneIs, I2V[4425]];
SetP[rick, phoneIs, I2V[4466]]
SetP[rick, ageProp, I2V[29]]
t← DeclareRelship[Friend]; SetF[t, friendOf, rick]; SetF[t, friendIs, harry];
[]← DeclareRelship[Friend, LIST[[friendOf, nori], [friendIs, harry]]];
END;

InsertMoreData: PROCEDURE =
BEGIN t: Relship; rick: Entity; ok: BOOL← FALSE;
tty.Put[rope["Creating more data..."], char[CR]];
-- Create some new entities and fetch old one (rick)
wsh← DeclareEntity[Person];
rick← DeclareEntity[Person, "Rick Cattell", OldOnly];
rock← DeclareEntity[Thing, "Harry the rock"];
pebble← DeclareEntity[Thing, "Fred the pebble"];
george← DeclareEntity[Frog, "George the frog"];
[]← DeclareEntity[Frog, "Larry the frog"];
-- Use SetP to assign names and heights
SetName[wsh, "Willie-Sue H"];
SetP[wsh, heightProp, S2V["medium"]];
SetP[rick, heightProp, S2V["medium"]];
SetP[rock, heightProp, S2V["big"]];
SetP[pebble, heightProp, S2V["little"]];
SetP[george, heightProp, S2V["little"]];
-- Check that Person can’t be friend of frog
t← DeclareRelship[Friend];
SetF[t, fOf, rick];
SetF[t, fIs, george ! WrongAttributeValueType => ok ← TRUE];
IF NOT ok THEN ERROR;
END;

The following routine illustrates the destruction of entities and domains.

DestroySomeData: PROCEDURE =
-- Destroy one person entity and all frog entities
BEGIN flag: BOOL← FALSE;
tty.Put[char[CR], rope["Deleting some data: Rick and all the frogs..."], char[CR]];
DestroyEntity[DeclareEntity[Person, "Rick Cattell", OldOnly]];
DestroyDomain[Frog];
END;

The following routine prints all of the people in the Person domain, by doing a DomainSubset retrieval with no constraints.

PrintPeople: PROC =
-- Use DomainSubset with no constraints to enumerate all Persons
BEGIN
person: -- PersonDomain -- Entity;
es: EntitySet;
lint: LONG INTEGER;
tty.Put[char[CR], rope["PersonDomain:"], char[CR], char[CR]];
tty.Put[rope["NamePhone"], char[CR]];
es← DomainSubset[Person];
WHILE (person← NextEntity[es])#NIL DO
tty.Put[rope[V2S[GetP[person, NameProp]]], char[TAB]];
lint← V2I[GetP[person, phoneProp]]↑;
tty.Put[int[lint], char[CR]];
ENDLOOP;
ReleaseEntitySet[es];
END;

The following routine prints all elements of the Thing domain using DomainSubset, which automatically enumerates sub-domains as well.

PrintEverything: PROCEDURE =
-- Use DomainSubset to enumerate all Things (includes Frogs and Persons).
-- Print "unknown" if height field is null string.
BEGIN
thing: -- Thing -- Entity;
es: EntitySet;
r: ROPE;
tty.Put[char[CR], rope["Things:"], char[CR], char[CR]];
tty.Put[rope["TypeNameHeight"], char[CR]];
es← DomainSubset[Thing];
WHILE (thing← NextEntity[es])#NIL DO
tty.Put[rope[GetName[DomainOf[thing]]], char[TAB]];
tty.Put[rope[V2S[GetP[thing, NameProp]]], char[TAB]];
r← V2S[GetP[thing, heightProp]];
tty.Put[rope[IF r.Length[]=0 THEN "[unknown]" ELSE r], char[CR]];
ENDLOOP;
ReleaseEntitySet[es];
END;

The following routine prints the people in the Person domain with a phone number in the range 4400 to 4499.

PrintCSLPhones: PROC =
-- Use RelationSubset to enumerate phones between 4400 and 4500
BEGIN
rs: RelshipSet; r: Relship;
tty.PutF["\nPhone numbers between 4400 and 4499:\n"];
tty.PutF["NamePhone\n"];
rs← RelationSubset[Phone, LIST[[phoneIs, I2V[4400], I2V[4499]]]];
UNTIL (r← NextRelship[rs])=NIL DO
tty.PutF["%g%g\n", rope[GetFS[r, phoneOf]], rope[GetFS[r, phoneIs]] ];
ENDLOOP;
ReleaseRelshipSet[rs];
END;

The following routine prints all of the tuples in the Person domain that are related through the Friend relation to the entity in the Person domain named harry.

PrintFriends: PROC =
-- Use RelationSubset to enumerate friends of Harry
BEGIN
friendRS: --Friend-- Relship;
friendName: --Person-- ROPE;
rs: RelshipSet;
tty.Put[char[CR], rope["Harry’s friends:"], char[CR]];
rs← RelationSubset[Friend, LIST[[friendIs, harry]]];
WHILE (friendRS← NextRelship[rs])#NIL DO
friendName← GetFS[friendRS, friendOf];
tty.Put[rope[friendName], rope[""], char[CR]];
ENDLOOP;
ReleaseRelshipSet[rs];
END;

The main program opens the default database, calls the above routines, and calls CloseDatabase to cause the updates to be made.

tty.Put[rope["Creating database..."], char[CR]];
[]← OpenDatabase[];
[]← OpenTransaction[];
[]← OpenSegment[];
Initialize[];
InsertData[];
PrintPeople[];
PrintCSLPhones[];
PrintFriends[];
PrintEverything[];
InsertMoreData[];
DestroySomeData[];
PrintEverything[];
CloseDatabase[];
tty.Close[];

END.

6. Database design

This section is a discussion of database design: the process of representing abstractions of real-world information structures in a database. This is an important, poorly understod topic, just as the design of programs is; don’t expect any startling revelations in these pages. To some extent the discussion here is specialized to the data structures available in the Cedar database package.

What are the properties of a well-designed database? To a large extent these properties follow from the general properties of databases, as discussed in Section 2. For instance, we would like our databases to extend gracefully as new types of information are added, since the existing data and programs are likely to be quite valuable.

It may be useful to consider the following point. The distinguishing aspect of information stored in a database system is that at least some of it is stored in a form that can be interpreted by the system itself, rather than only by some application-specific program. Hence one important dimension of variation among different database designs is in the amount of the database that is system-interpretable, i.e. the kinds of queries that can be answered by the system.

As an example of variation in this dimension, consider the problem of designing a database for organizing a collection of Mesa modules. In the present Mesa environment, this database would need to include at least the names of all the definitions modules, program modules, configuration descriptions, and current .Bcd files. A database containing only this information is little more than a file directory, and therefore the system’s power to answer queries about information in this database is very limited. A somewhat richer database might represent the DIRECTORY and IMPORTS sections of each module as collections of tuples, so that queries such as "which modules import interface Y?" can be answered by the system. This might be elaborated further to deal with the use of individual types and procedures from a definitions module, and so on. There may be a limit beyond which it is useless to represent smaller objects in the database; if we aren’t interested in answering queries like "what procedures in this module contain IF statements?", it may be attractive to represent the body of a procedure (or some smaller naming scope) as a Mesa data structure that is not interpretable by the database system, even though it is stored in a database.

We shall illustrate a few more database design ideas with the following problem: design a database of information about a collection of documents. Our current facilities, which again are simply file directories, leave much to be desired. The title of a document on the printed page does not tell the reader where the document is stored or how to print a copy. Relationships between different versions of the same basic document are not explicit. Retrievals by content are impossible. Our goal here is not to solve all of these problems, but to start a design that has the potential of dealing with some of them.

Each document has a title and a set of authors. Hence we might represent a collection of documents with a domain whose title is the name of the document, and an author property specifying the authors:

Document: Domain = DeclareDomain["Domain"];
dAuthors: Property = DeclareProperty["author", Document, StringType];

Here the authors’ names are concatenated into a single string, using some punctuation scheme to allow the string to be decoded into the list of authors. This is a very poor database design because it does not allow the system to respond easily to queries involving authors; the system cannot parse the encoded author list.

Another property of this declaration of authors may be viewed as an advantage or a drawback: authors are strings, so anything is acceptable as an author. This weak typing gives some flexibility: the database will never complain that it doesn’t know the author you just attached to a certain document. The corresponding drawback, as you might expect, is that the system is not helpful in catching errors when a new document is added to the database. If "Mark R. Brown" is mistakenly spelled "Mark R. Browne", then one of Mark’s papers will not be properly retrieved by a later search. A step in the direction of stronger type checking is to provide a separate domain for authors.

To represent authors as entities, and allow a variable number number of authors for document, a better design would be:

Person: Domain = DeclareDomain["Person"];
author: Property = DeclareProperty["author", Document, Person];

Incidentally, we define a property rather than relation for brevity here. Instead of the author property declaration we could have written:

author: Relation = DeclareRelation["author"];
authorOf: Attribute = DeclareAttribute["of", author, Document];
authorIs: Attribute = DeclareAttribute["is", author, Person];

The property declaration has exactly the same effect on the database as the relation declaration, since it automatically declares an author relation with an "of" and "is" attribute. However, the relation is not available in a Cedar Mesa variable in the property case, so operations such as RelationSubset cannot be used (thus most applications will not use DeclareProperty; but will use operations such as SetP).

In either case, we have one Document entity per document, plus, for each document, one author relationship per author of that document. Each author relationship points to its Document tuple via the entity-valued attribute of the author relation. Now a search of the author relation can be used to locate all documents authored by a given person.

Documents have other interesting properties. Some of these, for example the date on which the document was produced, are in one-to-one correspondence with documents. Such properties can be defined by specifying a relation or property as being keyed on the document:

publDate: DeclareProperty["publ-date", Document, StringType, Key];

We are using the convention that domain names are capitalized and relation, attribute, and property names are not capitalized, both for the Cedar Mesa variable names and in the names used in the database system itself. When the database system is integrated with Mesa, the Mesa and database names will be one and the same. Other conventions are desirable, for example preceeding all relation names with the name of the database application in which they are used to avoid confusion when databases overlap.

A few entities and relationships from a database of documents, defined according to these two tuplesets, are shown graphically in Figure 4.

<==<cffig4.press<

We might wish to include additional information for particular kinds of documents, for example conference papers. Conference papers may participate in the same relations as other documents. For example, they have authors. In addition, we may want to define relations in which only conference papers may participate, for example a presentation relation which defines who presented the paper, and where. We can define a conference paper to be a sub-domain of documents, and define relations which pertain specifically to conference papers:

ConferencePaper: Domain = DeclareDomain["ConferencePaper"];
... DeclareSubType[of: Document, is: ConferencePaper]; ...
Conference: Domain = DeclareDomain["Conference"];
presentation: Relation = DeclareRelation["presentation"];
presentationOf: Attribute = DeclareAttribute["of", presentation, ConferencePaper];
presentationAt: Attribute = DeclareAttribute["at", presentation, Conference];
presentationBy: Attribute = DeclareAttribute["by", presentation, Person];

Figure 5 illustrates a fragment of a database using this extended design.

The reader will note that we have defined our database schema in the functionally irreducible form described in Section 2: i.e., the relations have as few attributes as possible so as to represent atomic facts. This normalization is not necessary in the design of a schema, but often makes the database easier to understand and use, and simplifies the development of applications (which need not be concerned with anomalies in updates to data that is a result of redundantly storing the same information).

Note that the presentation relation is an example of a functionally irreducible relation that is not binary. It cannot be decomposed into smaller relations without losing information or introducing artificial entities to represent the presentations themselves.

What other information should be present in our document database? Subject keywords would certainly be useful. Since one document will generally have many associated keywords, we would introduce another relation, say docKeyword, to represent the new information. Should keywords be entities? Again there is a tradeoff, but the argument for entities seems persuasive: limiting the range of keywords increases the value of the database for retrieval. The keyword entities could also participate in relationships with a dictionary of synonyms, Computing Reviews categories, etc.

This is certainly not a complete design, and the reader is encouraged to fit his or her own ideas into the database framework.

7. References

Brown, M., Cattell, R., Suzuki, N. The Cedar Database Management System: A Preliminary Report, Proceedings ACM SIGMOD Conference 1981, Ann Arbor, 1981.

Brown, M. "How to Use Walnut", Internal memorandum, 1982.

Cattell, R. G. G. "An entity-based database user interface", Proceedings ACM SIGMOD Conference 1981, Santa Monica, 1980.

Cattell, R. G. G. "Design of an Relationship-Entity-Datum data model", to appear as CSL report, 1982.

Date, C. J. An Introduction to Database Systems, Addison-Wesley, 1977.

Donahue, J. ?. documentation of Squirrel and how to build database applications, promised to write October 1982.

Hoare, C. A. R. "Data reliability", Proc. International Conference on Reliable Software, Los Angeles, April 1975, p 528-533.

Israel, J. E., Mitchell, J. G., Sturgis, H. E. "Separating data from function in a distributed file system", CSL-78-5, September 1978.