[_CD8_]<cedardocs>database>CypressDoc4.bravo!4

Page Numbers: Yes X: 530 Y: 10.5" First Page: 39
Columns: 1 Edge Margin: .6" Between Columns: .4"
Margins: Top: 1.3" Bottom: 1"
Line Numbers: No Modulus: 5 Page-relative
Even Heading:

CYPRESS DOCUMENTATION

Odd Heading: Not-on-first-page

4. Application Example

This section provides a simple example of the use of Cypress. Section 4.1 introduces the example, a database of documents. Section 4.2 is a discussion of database design: the process of representing abstractions of real-world information structures in a database, somewhat specialized to the data structures available in Cypress. In Section 4.3, a working program is illustrated.

Our example is necessarily short; don’t expect any startling revelations on these pages. We will try to consider some of the most common cases, however.

4.1 A database application

What are the properties of a well-designed database? To a large extent these properties follow from the general properties of databases. For instance, we would like our databases to extend gracefully as new types of information are added, since the existing data and programs are likely to be quite valuable.

It may be useful to consider the following point. The distinguishing aspect of information stored in a database system is that at least some of it is stored in a form that can be interpreted by the system itself, rather than only by some application-specific program. Hence, one important dimension of variation among different database designs is in the amount of the database that is system-interpretable, i.e. the kinds of queries that can be answered by the system.

As an example of variation in this dimension, consider the problem of designing a database for organizing a collection of Mesa modules. In the present Mesa environment, this database would need to include at least the names of all the definitions modules, program modules, configuration descriptions, and current .Bcd files. A database containing only this information is little more than a file directory, and therefore the system’s power to answer queries about information in this database is very limited. A somewhat richer database might represent the DIRECTORY and IMPORTS sections of each module as relationships, so that queries such as "which modules import interface Y?" can be answered by the system. This might be elaborated further to deal with the use of individual types and procedures from a definitions module, and so on. There may be a limit beyond which it is useless to represent smaller objects in the database; if we aren’t interested in answering queries like "what procedures in this module contain IF statements?", it may be attractive to represent the body of a procedure (or some smaller naming scope) as a text string that is not interpretable by the database system, even though it is stored in a database.

We shall illustrate design ideas with a database of information about documents. Our current facilities, which again are simply file directories, leave much to be desired. The title of a document on the printed page does not tell the reader where the document is stored or how to print a copy. Relationships between different versions of the same basic document are not explicit. Retrievals by content are impossible. Our goal here is not to solve all of these problems, but to start a design that has the potential of dealing with some of them.

Each document in our example database has a title and a set of authors. Hence we might represent a collection of documents with a domain of entities whose name is the title of the document, and an author property specifying the authors:

Document: Domain = DeclareDomain["Domain"];
dAuthors: Property = DeclareProperty["author", Document, StringType];

Here the authors’ names are concatenated into a single string, using some punctuation scheme to allow the string to be decoded into the list of authors. This is a very poor database design because it does not allow the system to respond easily to queries involving authors; the system cannot parse the encoded author list.

Note that in the above definition authors are strings, so anything is acceptable as an author. This weak typing has some flexibility: the database will never complain that it doesn’t know the author you just attached to a certain document. However, the system is not helpful in catching errors when a new document is added to the database. If "Mark R. Brown" is mistakenly spelled "Mark R. Browne", then one of Mark’s papers will not be properly retrieved by a later search. A step in the direction of stronger type checking is to provide a separate domain for authors.

To represent authors as entities, and to allow a variable number number of authors for a document, a better design would be:

Document: Domain = DeclareDomain["Domain"];
Person: Domain = DeclareDomain["Person"];
author: Property = DeclareProperty["author", Document, Person];

Incidentally, in the last line above we define a property rather than relation for brevity. Instead of the author property declaration we could have written:

author: Relation = DeclareRelation["author"];
authorOf: Attribute = DeclareAttribute[author, "of", Document];
authorIs: Attribute = DeclareAttribute[author, "is", Person];

The property declaration has exactly the same effect on the database as the relation declaration, since it automatically declares an author relation with an "of" and "is" attribute. However, the relation is not available in a Cedar Mesa variable in the property case, so operations such as RelationSubset cannot be used. Therefore most non-trivial applications will generally not use DeclareProperty, but will still use operations such as SetP by using one of the relation’s attributes. For example, if joe is a person entity and book is a document entity, one can write:

SetP[joe, authorOf, book]

SetP[book, authorIs, joe]

We now have one Document entity per document, plus, for each document, one author relationship per author of that document. Conversely, we have one Person entity per person, and one author relationship per document the person authored. Of course, these are one and the same author relationships referencing the Person and Document entities. Figure 4-1 illustrates a few such entities and relationships. Each author relationship points to its Document entity via the authorOf attribute, and to its Person entity via the authorIs attribute.

Frequently a database application requires some representation of sets or lists, for example to represent the people in an organization or steps in a procedure. Sets and lists are not primitives of the data model per se; sets and lists are normally represented as relations. For example, in our database the set of authors of a particular document is stored via a set of author relationships referencing the document. The operation

GetPList[book, authorOf]

could be used to retrieve this list for some particular book. If we wish to maintain an ordering on this set, e.g. so that the authors of a book are kept in some particular order for each book, we need to use some list representation. In our Cypress implementation GetPList (and RelationSubset) return relationships in the same order they were created by SetP, SetPList, or DeclareRelship, so that a client may maintain an order by the order of calls. A variant of SetF and SetP is under consideration that allows the client to specify where new relationships should be placed in the ordering of relationships referencing a particular entity. Another alternative, the one conventionally used in the Relational model, is to define another attribute to the relation specify position in the ordering:

authorOrder: Attribute = DeclareAttribute[author, "order", IntType];

Using an ordering attribute is usually a better solution than depending on the semantics of the Cypress implementation’s ordering, as it makes the ordering explicit in the relation. In databases where a large number of relationships are expected to refer to the same entity, it is also more space efficient in our implementation.

If an authorOrder attribute is defined, the client may wish to redefine the authorOf attribute so that links (pointers) are not maintained between the Document entities and author relationships, instead defining a more space-efficient B-tree index on the [authorOf, authorOrder] pair:

authorOf: Attribute = DeclareAttribute[
relation: author, name: "of", type: Document, link: FALSE];

authorIndex: Index = DeclareIndex[author, LIST[[authorOf, authorOrder]]];

The Cypress implementation will use this index to process any call of the form

RelationSubset[author, LIST[[authorOf, x]]

This call to RelationSubset will therefore enumerate authors of document x sorted by authorOrder. Cypress will also use the index in processing for GetPList[..., authorOf] as GetPList uses RelationSubset.

This solution is also somewhat less than perfect, as it depends upon the fact that the Cypress implementation orders relationships when an index exist; but indices are not intended to change the semantics of the operations, only to improve performance. Probably the best solution, if the ordering is important to the semantics of a database application, is to represent a list by a binary "next" relation connecting the entities in an ordering.

Documents have other interesting properties. Some of these, for example the date on which the document was produced, are in one-to-one correspondence with documents. Such properties can be defined by specifying a relation or property as being keyed on the document:

publDate: Property = DeclareProperty["publDate", Document, StringType, Key];

We are using the convention that domain names are capitalized and relation, attribute, and property names are not capitalized, both for the Cedar Mesa variable names and in the names used in the database system itself. If and when the database system is better integrated with Cedar Mesa, the Cedar and database names will be one and the same.

We might wish to include additional information for particular kinds of documents, for example conference papers. Conference papers may participate in the same relations as other documents. For example, they have authors. In addition, we may want to define relations in which only conference papers may participate, for example a presentation relation which defines who presented the paper, and where. We can define a conference paper to be a sub-domain of documents, and define relations which pertain specifically to conference papers:

ConferencePaper: Domain = DeclareDomain["ConferencePaper"];
... DeclareSubType[of: Document, is: ConferencePaper]; ...
Conference: Domain = DeclareDomain["Conference"];
presentation: Relation = DeclareRelation["presentation"];
presentationOf: Attribute = DeclareAttribute["of", presentation, ConferencePaper];
presentationAt: Attribute = DeclareAttribute["at", presentation, Conference];
presentationBy: Attribute = DeclareAttribute["by", presentation, Person];

Figure 4-2 illustrates a fragment of a database using this extended design.

The reader will note that we have defined our database schema in the functionally irreducible form described in Section 2.7: i.e., the relations have as few attributes as possible so as to represent atomic facts. This normalization is not necessary in the design of a schema, but often makes the database easier to understand and use, and avoids anomalies in data updates as a result of redundantly storing the same information. Note that the presentation relation is an example of a functionally irreducible relation that is not binary. It cannot be decomposed into smaller relations without losing information or introducing artificial entities to represent the presentations themselves.

What other information should be present in our document database? Subject keywords would certainly be useful. Since one document will generally have many associated keywords, we would introduce another relation, say docKeyword, to represent the new information. Should keywords be entities? Again there is a tradeoff, but the argument for entities seems persuasive: limiting the range of keywords increases the value of the database for retrieval. The keyword entities could also participate in relationships with a dictionary of synonyms, Computing Reviews categories, etc.

This is certainly not a complete design, and the reader is encouraged to fit his or her own ideas into the database framework.

The following program defines a small schema and database of persons and documents. It illustrates the use of most of the procedures defined in Section 3.

DocTestImpl: PROGRAM
IMPORTS DB, IO, Rope =

BEGIN OPEN IO, DB

tty: IO.Handle← CreateViewerStreams["VLTest1Impl.log"].out;

Person, Conference: Domain;
Thesis, ConferencePaper, Document: Domain;
author: Relation;
authorOf, authorIs: Attribute;
presentation: Relation;
presentationOf, presentationBy, presentationDate: Attribute;
publDate: Attribute;
rick, mark, nori: --Person-- Entity;
cedarPaper, cypressDoc, thesis: --Document-- Entity;

Initialize: PROC =
BEGIN
tty.PutF["Defining data dictionary...\n"];
-- Declare domains and make ConferencePapers and Theses be subtypes of Document:
Person← DeclareDomain["Person"];
Conference← DeclareDomain["Conference"];
Document← DeclareDomain["Document"];
ConferencePaper← DeclareDomain["ConferencePaper"];
Thesis← DeclareDomain["Thesis"];
DeclareSubType[of: Document, is: ConferencePaper];
DeclareSubType[of: Document, is: Thesis];
-- Declare publDate property of Document
publDate← DeclareProperty["publDate", Document, IntType];
-- Declare author relation between Persons and Documents
author← DeclareRelation["author"];
authorOf← DeclareAttribute[author, "of", Document];
authorIs← DeclareAttribute[author, "is", Person];
-- Declare presentation relation
presentation← DeclareRelation["presentation"];
presentationOf← DeclareAttribute[presentation, "of", Document];
presentationBy← DeclareAttribute[presentation, "by", Person];
presentationAt← DeclareAttribute[presentation, "at", Conference];
END;

InsertData: PROC =
BEGIN t: Relship;
tty.PutF["Inserting data...\n"];
cedarPaper← DeclareEntity[ConferencePaper, "The Cedar DBMS"];
cypressDoc← DeclareEntity[Document, "Cypress DB Concepts & Facilities"];
thesis← DeclareEntity[Thesis, "An Analysis of Priority Queues"];
sigmod← DeclareEntity[Conference, "SIGMOD 81"];
rick← DeclareEntity[Person, "Rick Cattell"];
mark← DeclareEntity[Person, "Mark Brown"];
-- Note we can create entity and then set name...
nori← DeclareEntity[Person];
ChangeName[nori, "Nori Suzuki"];
-- Data can be assigned with SetP, SetF, or DeclareRelship’s initialization list:
t← DeclareRelship[presentation,, NewOnly];
SetF[t, presentationtOf, cedarPaper];
SetF[t, presentationBy, mark];
SetF[t, presentationAt, sigmod];
[]← SetPList[cypressDoc, authorIs, LIST[rick, mark]];
-- the Cedar notation LIST[ ... ] defines a list
[]← SetPList[cedarPaper, authorIs, LIST[rick, mark, nori]];
[]← DeclareRelship[author, LIST[[authorOf, thesis], [authorIs, mark]]];
[]← SetP[cypressDoc, publDate, I2V[1982]];
[]← SetP[thesis, publDate, I2V[1977]];
-- the I2V[...] calls needed because Cedar Mesa does not yet coerce INT to REF ANY
-- Check that thesis can’t be presented at conference:
ok← FALSE;
t← DeclareRelship[presentation];
SetF[t, presentationOf, thesis
! MismatchedAttributeValueType => {ok ← TRUE; CONTINUE}];
IF NOT ok THEN ERROR;
END;

DestroySomeData: PROCEDURE =
-- Destroy one person entity and all frog entities
BEGIN flag: BOOL← FALSE;
tty.Put[char[CR], rope["Deleting Rick from database..."], char[CR]];
DestroyEntity[DeclareEntity[Person, "Frank Baz", OldOnly]];
DestroyDomain[Frog];
END;

PrintDocuments: PROC =
-- Use DomainSubset with no constraints to enumerate all Documents
BEGIN
doc: -- Document -- Entity;
authors: LIST OF Value;
es: EntitySet;
tty.PutF["Documents:\n\n"];
tty.PutF["Titleauthors\n"];
es← DomainSubset[Document];
WHILE (doc← NextEntity[es])#NIL DO
tty.PutF["%g", rope[GetName[doc]]];
authors← GetPList[doc, authorIs];
FOR al: LIST OF Entity← NARROW[authors], al.rest UNTIL al=NIL DO
tty.PutF["%g ", rope[GetName[al.first]]]] ENDLOOP;
ENDLOOP;
ReleaseEntitySet[es];
END;

PrintPersonsPublications: PROC [pName: ROPE] =
-- Use RelationSubset to enumerate publications written by person
BEGIN
p: Person← DeclareEntity[Person, pName, OldOnly];
authorT: --author-- Relship;
rs: RelshipSet;
first: BOOL← TRUE;
IF p=NIL THEN
{tty.PutF["%g is not a person!", rope[pName]]; RETURN};
tty.PutF["Papers written by %g are:\n", rope[pName]];
rs← RelationSubset[author, LIST[[authorIs, p]]];
WHILE (authorT← NextRelship[rs])#NIL DO
IF first THEN first← FALSE ELSE tty.Put[rope[", "]];
tty.Put[rope[GetFS[authorRS, authorOf]]];
ENDLOOP;
tty.PutF["\n"];
ReleaseRelshipSet[rs];
END;

tty.Put[rope["Creating database..."], char[CR]];
Initialize[];
DeclareSegment["[Local]Test", $Test, 1,, NewOnly];
OpenTransaction[$Test];
Initialize[];
InsertData[];
PrintDocuments[];
PrintPersonsPublications["Mark Brown"];
DestroySomeData[];
PrintDocuments[];
CloseTransaction[TransactionOf[$Test]];
tty.Close[];

END.