[Indigo]<PostCedar5.2>Documentation>CypressDoc.tioga!1

CYPRESS DOCUMENTATION

Cypress Database System Documentation

R. G. G. Cattell

Release as [Indigo]<CedarŽ>Documentation>CypressDoc.Tioga
Written by R. G. G. Cattell, June 13, 1983
Last edited by Beach, February 9, 1984 5:40:26 pm PST
by Jim Donahue, February 10, 1984 12:16:55 pm PST

Abstract: This document describes the Cypress system, and is aimed at potential writers of database applications in the Cedar programming environment. It should be accurate as of the date above, and is recommended as better documentation than CSL Report 83-4, from which it was derived. Suggestions are welcomed on both the design of Cypress and its exposition in this document. We will assume little knowledge of database systems, and little knowledge of Cedar. We will not explain the motivation for this particular design of Cypress; see the CSL Report for that. You should also consult the documentation for database tools not described here; see a database wizard for details.

XEROX Xerox Corporation
Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, California 94304

DRAFT — For Internal Xerox Use Only — DRAFT

TABLE OF CONTENTS

1. Introduction 1

2. Cypress data model concepts 3

2.1 Data independence
2.2 Basic primitives
2.3 Names and keys
2.4 Basic operations
2.5 Aggregate operations
2.6 Convenience operations
2.7 Normalization
2.8 Segments

3. Model level interface 19

3.1 Types
3.2 Transactions and segments
3.3 Data schema definition
3.4 Basic operations
3.5 Query operations
3.6 System domains and relations
3.7 Errors

4. Application example 39

4.1 A database application
4.2 Schema design
4.3 Example program

1. Introduction

INTRODUCTION

Cypress was motivated by the need for database management facilities in Cedar. Its goals are (1) convenience, making it easy to build new database applications in Cedar, and (2) reasonable performance for a wide class of applications. If you are unsure whether an application you have in mind is appropriate, talk to someone who has built one. You should also consult others to see what auxiliary documentation and folklore is available.

The conceptual data structure provided by a database system, analogous to the Mesa type system, is called a data model. The data model specifies how data may be structured and accessed; thus the data model is what you need to know in order to use the system. The Cypress data model described herein includes features that are accepted in some form in a number of models in the literature, particularly those that seemed most useful for database applications we envision in CSL.

The data model is described conceptually in Section 2. In Section 3 we describe the Cedar interface to Cypress: DB.mesa. Finally, Section 4 provides a simple example of Cypress use. We cover only the abstract model and the DB interface, not its implementation or rationale. The reader interested in these should consult CSL Report 83-4. A set of formal axioms for the model can also be found there.

Clients interested in tools for examining or modifying databases, either as an end-user or for debugging an application program, should consult the database tools documentation: [Indigo]<CedarŽ>Documentation>SquirrelDoc.tioga and .press, where Ž refers to the current cedar release version number such as 5.1.

2. Cypress data model concepts

CYPRESS DATA MODEL CONCEPTS

In this section, we give an informal description of the Cypress data model. We describe the particulars of the Cedar interface in Section 3.

2.1 Data independence

We deal here with the conceptual data model: the logical primitives for data access and data type definition. This should be carefully distinguished from the physical data storage and access mechansisms. The physical representation of data is hidden as much as possible from the database client to facilitate data independence, the guarantee that a user's program will continue to work (perhaps with a change in efficiency) even though the physical data representation is redesigned.

For any particular database using our conceptual data model, the actual specification of the types of data in the database, using the primitives the model provides, is termed the data schema. Note that a mapping must be provided between the conceptual data model and the physical representation, either automatically or with further instruction from the client; we will do some of both. The logical to physical mapping is intimately associated with the performance of the database system as viewed by the user performing operations at the conceptual level.

2.2 Basic primitives

Three basic primitives are defined in the model: an entity, datum, and relationship.

An entity represents an abstract or concrete object in the world: a person, an organization, a document, a product, an event. In programming languages and knowledge representation entities have variously been referred to as atoms, symbols, and nodes. A datum, unlike an entity, represents literal information such as times, weights, part names, or phone numbers. Character strings and integers are possible datum types.

It is a policy decision whether something is represented as an entity or merely a datum: e.g., an employee's spouse may be represented in a database system as a datum (the spouse's name), or the spouse may be an entity in itself. The database system provides a higher level of logical integrity checking for entities than for datum values, as we will see later: unique entity identifiers, checks on entity types, and removal of dependent data upon entity deletion. We shall discuss the entity/datum choice further in Section 4.2.

We will use the term value to refer to something that can be either a datum or an entity. In many programming languages, there is no reason to distinguish entity values from datum values. Indeed, most of the Cypress operations deal with any kind of value, and some make it transparent to the caller whether an entity or datum value is involved. The transparent case makes relational operations possible in our model, as we will see in Section 2.5.

A relationship is a tuple whose elements are [entity or datum] values. We refer to the elements (fields) of relationships by name instead of position. These names for positions are called attributes.

Note that we have separated the representatives of unique objects (entities) from the representation of information about objects (relationships), unlike some object-oriented programming languages and data models. Therefore an entity is not an "object" (or "record") in the programming language sense, although entities are representatives of real-world objects.

We also define entity types, datum types, and relationship types. These are called domains, datatypes, and relations, respectively. We make use of these three types through one fundamental type constraint: every relationship in a relation has the same attributes, and the values associated with each attribute must be from a pre-specified domain or datatype. One might think of a relation as a "record type" in a programming language, although relations permit more powerful operations than record types.

As an example, consider a member relation that specifies that a given person is a member of a given organization with a given job title, as in the following figure. The person and organization might be entities, while the title might be a string datum. We relax the fundamental type constraint somewhat in allowing a lattice of types of domains: a particular value may then belong to the pre-specified domain or one of its sub-domains. For example, one could be a member of a University, a Company, or any other type of Organization one chooses to define. Other relations, e.g. an "offers-course" relation, might apply only to a University.

CypressFig1.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in

Relationships represent facts about the world. A relation (a set of relationships) represents a kind of relationship in which entities and values can participate. We will schematically represent that John Smith is a member of Acme company with title "manager" by drawing lines for the relationship (labelled with the relation and its attributes), a circle for each entity, and a box for the datum:

CypressFig2.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in

One should normally think of relations as types, not sets. A relation can be defined as the set of all relationships of its type, however, and thus can be treated as a relation in the normal mathematical sense. Note that we differ from mathematical convention in another minor point: the use of attributes to refer to tuple elements by name instead of position. Reference by position is therefore not necessary and is in fact not permitted. We can often omit attribute names without ambiguity since the types of participating entities imply their role in a relationship. However, they are necessary in the general case; e.g., a boss relation between a person and a person requires the directionality to define its semantics.

We can summarize the six basic primitives of the data model in tabular form. Familiarity with these six terms is essential to understanding the remainder of this document:

Type Instance Instance Represents

Domain Entity physical or abstract object

Datatype Datum numerical measurement or symbolic tag

Relation Relationship logical correspondence among objects and values

Our terminology might be more consistent if we called a domain an "entity type," and a relation a "relationship type." Instead we have compromised on the terms most widely used in the literature for all six of the basic concepts. The reader will find the remainder of this document much more understandable if these six terms are commited to memory before proceeding.

2.3 Names and keys

The data model also provides primitives to specify uniqueness of relationships and entities. For relationships, this is done with keys, and for entities, with names. A key for a relation is an attribute or set of attributes whose value uniquely identifies a relationship in the relation. No two relationships may have the same values for a key attribute or attribute set (a relation may have more than one key, in which case relationships must have unique values for all keys). A name acts similarly for a domain. The name of an entity must uniquely specify it in its domain.

Consider a spouse relation, a binary relation between a person and a person. Both of its attributes are keys, because each person may have only one spouse (if we choose to define spouse this way!). For an author relation, neither the person nor the document are keys, since a person may author more than one document and a document may have more than one author. The person and document attributes together would comprise a key, however: there should be only one tuple for a particular person-document pair.

CypressFig3.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in

<==<MLDFig2.3.press<

We have labelled entities with names in the figures. Names are most useful when they are human-sensible tags for the entities, e.g. the title for a document, or the name of a person. However, their primary function is as a unique entity identifier, so non-unique human-sensible names must be represented as relations. If entities of a domain have more than one unique identifier, e.g. social security numbers and employee numbers, then one identifier must be chosen as the domain's entity names and the other represented as a relation connecting the entities with the unique alternate identifier (a key of that relation).

We require that every entity have a unique name, although the name may automatically be generated by the database system. Thus every entity may be uniquely identified by the pair:

[domain name, entity name].

Some authors in the database literature use the term entity to refer to a real-world object rather than its representation in the database system. When we use the term entity, we refer to the internal entity, the entity "handle" returned by database operations and stored in entity-valued variables, not the external entity that the internal entity represents or the entity identifier [domain, name] that may be used to uniquely identify an internal entity. The three are interchangeable, however, since they must always be in one-to-one correspondence.

The reader may find it simple to think of entity-valued attributes of relationships as pointers to the entities, in fact bi-directional pointers, since the operations we provide allow access in either direction. This is a useful analogy. However, there is no constraint that the model be implemented with pointers, and the relationships of a relation could equally well be conceptualized as rows of a table whose columns are the attributes of the relation and whose entries for entity-valued attributes are the string names of the entities. For example, the author relationships in the previous figure could also be displayed in the form:

Author:

Person Book

George Backgammon for Beginners

John Backgammon for Beginners

John Advanced Backgammon

Tom Advanced Backgammon

Thus our introduction of entities to the Relational model does not entail a different representation than, say, a Network model might imply, but simply additional integrity checks on entity names and types, and new operations upon entities. This compatibility with the Relational data model is important, as it allows the application of the powerful Relational calculus or algebra as a query language. We return to query languages in Section 2.5.

Note that the only information about an entity associated directly with the entity is the name; this contrasts with most other data models. A person's age or spouse, for example, would be represented in the Cypress data model via age or spouse relations. Thus the relationships in a database are the information-carrying elements: entities do not have attributes. However the model provide an abbreviation, properties, to access information such as age or spouse in a single operation. We will discuss properties later. In addition, the physical data modelling algorithms can store these data directly with the person entity as a result of the relation key information (since a person can have only one age, a field can be allocated for that field in the stored object representing a person.)

2.4 Basic operations

The data model provides the capability to define and examine the data schema, and perform operations on entities, relationships, and aggregates of entities and relationships. In this section we discuss the basic operations on entities and relationships. In Section 2.5, we discuss the operations on aggregate types, i.e. domains and relations. We defer to Section 2.6 the discussion of "convenience" operations built upon the basic and aggregate operations.

Four basic operations are defined for entities:

1. DeclareEntity[domain, name]: Returns a new or existing entity in a domain. An entity name must be specified.

2. DestroyEntity[entity]: Destroys an entity; this also destroys all relationships that refer to it.

3. DomainOf[entity]: Returns the domain of an entity (its type).

4. NameOf[entity]: Returns the string name of an entity.

Five basic operations are defined for relationships:

1. DeclareRelationship[relation, list of attribute values]: Returns a relationship with the given attribute values in the given relation.

2. DestroyRelationship[relationship]: Destroys a relationship.

3. RelationOf[relationship]: Returns a relationship's relation.

4. GetF[relationship, attribute]: Returns the value associated with the given attribute of the given relationship.

5. SetF[relationship, attribute, value]: Sets the value of the given relationship attribute.

The operations upon relationships recognize a specially-distinguished undefined value for an attribute. Unassigned attributes of a newly-created relationship have this value. A client of the data model may retrieve a value with GetF and test whether it equals the distinguished undefined value, and may set a previously defined value to be the distinguished undefined value with SetF.

Other "convenience" operations are built on top of the basic operations on entities and relationships: properties and translucent attributes. They are described in Section 2.6. Although these operations are not essential to the basis of the Cypress model, they do furnish a fundamentally different perspective on the model. They provide a mechanism to associate information directly with entities (instead of through relationships) and to write programs largely independent of attribute types.

The reader will also note that we have ignored issues of concurrent access and protection in the basic operations. We will see later that an underlying transaction, file, and protection is associated with the relation and domain handles used in the basic operations. This convenience allows us to treat concurrency, protection, and data location orthogonally.

2.5 Aggregate operations

There are two kinds of operations upon domains and relations, the aggregate types in our model: the definition of domains and relations, and queries on domains and relations. We first discuss their definitions.

Schema definition

As in other database models and a few programming languages, the Cypress model is self-representing: the data schema is stored and accessible as data. Thus application-independent tools can be written without coded-in knowledge of the types of data and their relationships.

Client-defined domains, relations, and attributes are represented by entities. These entities belong to special built-in domains, called the system domains:

the Domain domain, with one element (entity) per domain

the Attribute domain, with one element per attribute

the Relation domain, with one element per relation

There is also a predefined Datatype domain, with pre-defined elements RopeType, BoolType, and IntType, called built-in types. We do not allow client-defined datum types at present.

Information about domains, relations, and attributes are represented by system relations in which the system entities participate. The pre-defined SubType relation is a binary relation between domains and their subdomains. There are also predefined binary relations that map attributes to information about the attributes:

aRelation: maps an attribute entity to its relation entity.

aType: maps an attribute to its type entity (a domain or a built-in type)

aUniqueness: maps an attribute entity to {TRUE, FALSE}, depending whether it is part of a key of its relation. We are assuming only one key per relation, here; our implementation relaxes this assumption in the case of single-attribute keys.

The following diagram graphically illustrates a segment of a data schema describing the member relation and several domains. The left side of the figure shows two subdomains of Organization, (Company and University), and the right shows the types and uniqueness properties of the member relation's attributes memberOf, memberIs, and memberAs.

CypressFig4.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in

New domains, relations, and attributes are defined by creating entities and relationships in these pre-defined system domains and relations. However, our implementation provides special operations to define the data schema, to simplify error checking. These operations are:

DeclareDomain[name], and for each subtype relationship:

DeclareSubType[superdomain, subdomain]

DeclareRelation[name], and for each attribute:

DeclareAttribute[name, relation, type, uniqueness].

Queries

The operation RelationSubset[relation, attribute value list] enumerates relationships in a relation satisfying specified equality constraints on entity-valued attributes and/or range constraints on datum-valued attributes. For example, RelationSubset might enumerate all of the relationships that reference a particular entity in one attribute and have an integer in the range 23 to 52 in another.

The operation DomainSubset[domain, name range] enumerates entities in a domain. The enumeration may optionally be sorted by entity name, or restricted to a subset of the entities with a name in a given range.

More complex queries can be implemented in terms of DomainSubset and RelationSubset. A future implementation will provide a MultiRelationSubset operation to efficiently enumerate single queries spanning more than one relation. MultiRelationSubset operates upon a parsed representation of the query language, and produces the same kind of enumeration as RelationSubset. See CSL-83-4 for more details.

2.6 Convenience operations

Some more convenient specialized operations are built upon the basic operations described in the previous two sections. They implement what we call properties and translucent attributes. Although theoretically speaking these operations add no power to the model, they permit a significantly different perspective on the data access and so should be thought of as part of the model.

Properties

Properties allow the client to treat entities as if they, like relationships, had "attributes." They provide the convenience of treating attributes of relationships that reference an entity as if they were attributes (or properties) of the entity itself. The property operations are:

1. GetPList[entity, attribute1, attribute2]: Attribute1 and attribute2 must be from the same relation. Returns the values of attribute1 for all relationships in the relation that reference the entity via attribute2. Attribute2 may be omitted, in which case it is assumed to be the only other entity-valued attribute of the relation.

2. GetP[entity, attribute1, attribute2]: this is identical to GetPList except exactly one relationship must reference the entity via attribute2; otherwise an error is generated. GetP always returns one value.

3. SetPList[entity, attribute1, value list, attribute2]: Attribute1 and attribute2 must be from the same relation. Destroys any existing relationships whose attribute2 equals the entity, and creates new ones for each value in the list, with attribute1 equal to the value, and attribute2 equal to the entity. Attribute2 can be defaulted as in GetPList.

4. SetP[entity, attribute1, value, attribute2]: this is identical to SetPList except it simply adds a new relationship referencing the entity instead of destroying any existing ones (unless attribute1 is a key of its relation, in which case the existing one must be replaced).

Thus the property operations allow information specified through relationships to be treated as properties of the entity itself, in single operations. The property operations and the operations defined in earlier sections may be used interchangeably, as there is only one underlying representation of information: the relationships. As an example of the use of properties, consider the following database:

CypressFig5.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in

The figure shows the entity John Smith, and three relationships in which he participates: an age relationship and two member relationships, The member relationships are ternary, the age relationship binary. On this database, the property operations work as follows:

GetPList[John Smith, memberOf] would return the set {Acme Company, State University}.

GetP[John Smith, ageIs] (or GetPList) would return 34.

SetP[John Smith, memberOf, Foo Family] would create a new member relationship specifying John to be a member of the Foo Family. SetPList would do the same, but would destroy the two existing member relationships referencing John. In either case, the memberAs attribute would be left undefined in the new relationship.

SetP[John Smith, ageIs, 35], where ageOf is a key of the age relation, would delete the relationship specifying John's age to be 34, and insert a new one specifying John's age to be 35. Note that SetP acted differently than on the member relation because memberIs is not a key.

Again, the property operations are simply a convenience, although they provide a different perspective on the data model by allowing an entity-based view of a database.

Translucent attributes

Some database application programs may not wish to be concerned with whether an attribute is entity-valued, string-valued, or integer-valued. They might prefer to have all values mapped to some common denominator, e.g. a string. An example would be a program that is simply displaying tuples on the screen.

Another class of applications would like to be independent of whether a particular attribute is represented as an entity or datum value. Consider the member relation in the previous figure. If we choose to define an Organization domain, then the memberOf attribute is entity-valued; but instead we might choose to make the memberOf attribute be string-valued, merely giving the name of the organization without defining organizations as entities. This might be appropriate, for example, if we did not wish to invoke the type checking on uniqueness of names and the correctness of entity types. We would like to write programs that are independent of whether an attribute is string-valued or entity-valued (as in the Relational data model).

We introduce translucent attributes to avoid dependence on attribute types. Any attribute may be treated as a translucent attribute, by using the GetFS and SetFS operations to retrieve or assign its value.

GetFS[relationship, attribute] is identical to the GetF operation, except it returns a string regardless of the attribute's type. If the attribute is datum-valued, e.g. an integer or boolean, it is converted to a string equivalent. If the attribute is entity-valued, the name of the entity is returned.

SetFS[relationship, attribute, value] performs the inverse mapping. If the attribute is datum-valued, e.g. an integer or boolean, a string equivalent is accepted. If the attribute is entity-valued, the name of the entity is passed to SetFS. If an entity with the given name does not exist in the domain that is the attribute's type, then one is automatically created.

Changing entity names

Another convenience operation is provided on entities to change an entity's name: ChangeName[entity, new name]. This operation is semantically equivalent to destroying the given entity and creating a new one with the new name, participating in the same relationships that the old one did. See the description of ChangeName in Section 3.4 for precise semantics in our implementation, however.

2.7 Normalization

A few comments on relational normalization are included here for users who must do their own data schema design. Others may skip to the next section.

A relation is normalized by breaking it into two or more relations of lower order (fewer attributes) to eliminate undesirable dependencies between the attributes. For example, one could define a "publication" relation with three attributes:

Publication:

Person Book Date

George Backgammon for Beginners 1978

John Backgammon for Beginners 1978

Mary How to Play Chess 1981

Mary How to Cheat at Chess 1982

This relation represents the fact that John and George wrote a book together entitled "Backgammon for Beginners," published in 1978, and Mary wrote two books on the subject of chess, in 1981 and 1982. Alternatively, we could encode the same information in two relations, an author relation and a publication-date relation:

Author:

Person Book

George Backgammon for Beginners

John Backgammon for Beginners

Mary How to Play Chess

Mary How to Cheat at Chess

Publication-date:

Book Date

Backgammon for Beginners 1978

How to Play Chess 1981

How to Cheat at Chess 1982

Although the second two relations may seem more verbose than the first one, they are actually representationally better in some sense, because the publication dates of books are not represented redundantly. If one wants to change the publication date of "Backgrammon for Beginners" to 1979, for example, it need only be changed in one place in the publication-date relation but in two places in the publication relation. If the date were changed in only one place in the publication relation, the database would become inconsistent. This kind of behavior is called an update anomaly. The second two relations are said to be a normalized form (as it happens, third normal form) of the first relation, and thereby avoid this particular kind of update anomaly.

Relational normalization is not strictly part of the Cypress data model. However the model's operations (and the tools we will develop in the implementation) encourage what we will call functionally irreducible form, in which relations are of the smallest order that is naturally meaningful.

A relation is in irreducible form if it is of the smallest order possible without introducing new artificial domain(s) not otherwise desired (all relations can be reduced to binary by introducing artificial domains). We will allow a slight weakening of irreducible form, functionally irreducible form, which permits combining two or more irreducible relations only when their semantics are mutually dependent (and therefore all present or absent in our world representation). For example, a birthday relation between a person, month, day, and year can be combined instead of using three relations. Another example would be an address relation between a person, street, city, and zip code. Combining an age and phone relation would not result in functionally irreducible form, however, as their semantics are not mutually dependent.

The functionally irreducible relations seen by the user are independent of the physical representation chosen by the system for efficiency, so we are concerned only with the logical data access. Note that in addition to avoiding update anomalies, functionally irreducible form provides a one-to-one correspondence between the relationships in the database and the atomic facts they represent, a canonical form that is in some sense more natural than any other form.

2.8 Segments

We would like a mechanism to divide up large databases, to provide different perspectives or subsets of the data to different users or application programs. In this section we discuss a mechanism to provide this separation: segments. A segment is a set of entities and relationships that a database client chooses to treat as one logical and physical part of a database.

In introducing segments, we will slightly change the definition of an entity, previously defined to be uniquely determined by its domain and name. We will treat entities with the same name and domain in different segments as different entities, although they may represent the same external entity. The unique identifier of an internal entity is now the triple

[segment, domain, name].

A consequence of this redefinition of entities is that relations and domains do not span segments, either. Application programs must maintain any desired correspondence between entities, domains, or relations with the same name in different segments. We will return to this later. In the next section, we will discuss a more powerful but more complex and expensive mechanism, augments, in which the database system itself maintains the correspondence.

We introduce three new operations to the data model in conjunction with segments:

DeclareSegment[segment, file]: opens a segment with the given name, whose data is stored in the given file.

GetSegments[] returns a list of all the segments which have been opened.

SegmentOf[entity or relationship] returns the segment in which a given entity or relationship exists. It may also be applied to relations or domains, since they are entities.

With the addition of segments to the data model, we redefine the semantics of the basic access operations as follows:

1. DeclareDomain and DeclareRelation take an additional argument, namely the segment in which the defined domain or relation will reside. The entity representing a domain or relation now represents data in a particular segment.

2. DeclareEntity and DeclareRelationship are unaffected: they implicitly refer to the segment in which the respective domain or relation was defined. By associating a segment (and therefore a transaction and underlying file) with each relation or domain entity returned to the database client, we conveniently obviate the need for additional arguments to every invocation of the basic operations in the data model.

3. DestroyEntity, DestroyRelationship, GetF, SetF, DomainOf, RelationOf, and Eq are similarly unaffected: they deal with entities and relationships in whatever segment they are defined. Note that by our definition, entities in different segments are never Eq. Also note that nothing in our definition makes a SetF across a segment boundary illegal (i.e. SetF[relationship, attribute, entity] where the relationship and entity are in different segments). Our current implementation requires that special procedures GetFR and SetFR be used on attributes that can cross segment boundaries, see Section 3.

4. DomainSubset and RelationSubset are unchanged when applied to client-defined domains or relations, i.e., they enumerate only in the segment in which the relation or domain was declared. However an optional argument may be used when applied to one of the system domains or relations (e.g. the Domain domain), allowing enumeration over a specific segment or all segments. RelationSubset's attribute-value-list arguments implictly indicate the appropriate segment even for system relations, so a segment is not normally needed unless the entire relation is enumerated.

Note that the data in a segment is stored in an underlying file physically independent from other segments, perhaps on another machine. Introducing a file system into the conceptual data model may seem like an odd transgression at this point. From a practical point of view, however, we believe it better to view certain problems at the level of file systems. This point of view allows segments to be used for the following purposes:

1. Physical independence: Different database applications typically define their data in separate segments. As a result one application can continue to operate although the data for another has been logically or physically damaged. One application can entirely rebuild its database without affecting another, or an application can continue to operate in a degraded mode missing data in an unavailable segment.

2. Logical independence: Different database applications may have information which pertains to the same external entity, e.g. a person with a particular social security number. When one application performs a DestroyEntity operation, however, we would like the entity to disappear only from that application's point of view. Information maintained by other applications should remain unchanged.

3. Protection: Clients can trust the protection provided by a file system more easily than a complex logical protection mechanism provided by the database system. An even higher assurance of protection can be achieved by physical isolation of the segment at a particular computer site. A more complex logical protection mechanism would be desirable for some purposes, but was deemed beyond the scope of Cypress.

4. Performance: Data may be distributed to sites where they are most frequently used. For example, personal data may reside on a client's machine while publicly accessed data reside on a file server. If the file system provides replication, it can be used to improve performance for commonly accessed data.

Concurrency control is handled by the file system.

As noted earlier, information about an external entity may be distributed over multiple segments. One or more database applications may cooperate in maintaining the illusion that entities, domains, and relations span segment boundaries. This illusion may be used in at least two ways:

1. Private additions may be added to a public segment by adding entities or relationships in a private segment. The new relationships may reference entities in the public segment by creating representative entities with the same name in the private segment. An example would be personal phone numbers and addresses added to a public database of phone numbers and addresses: an application program would make the two segments appear to the user as one database.

2. If two applications use separate segments A and B, they may safely reference each other's data yet remain physically independent. One of the applications may destroy and reconstruct its segment if it uses the same unique names for its entities. If both applications have relationships referencing an entity e, and application A does a DestroyEntity operation on e, the entity and relationships referencing it disappear from application A's point of view, but application B's representative entity and relationships remain.

3. Model Level Interface

MODEL LEVEL INTERFACE

We now describe the Cedar interface to the implementation of the Cypress data model. We assume that the reader is familiar with the basic conceptual data model, i.e., has read the previous section. Our presentation is therefore slightly different in this section: we describe the procedures in the database interface in roughly the order that a client will want to use them in a program. We present types and initialization, schema definition, the basic operations, and then queries.

It should be emphasized that the interface we are about to describe is only one possible implementation of the abstract data model described in Section 2. For example, we have chosen to implement a procedural interface called by Cedar programs, and to do type checking at run-time.

3.1 Types

In this subsection we describe the most important types in the interface. Less pervasive types are treated at the point where they are first used.

Entity: TYPE;
Relship: TYPE;

An Entity or Relship is not the actual database entity or relationship; they are handles for the actual database objects. All accesses to database objects are performed by calling interface procedures with the handles as parameters. Even comparisons of two entities for equality must be done in this way. The Entity and Relship handles are allocated from storage and automatically freed by the garbage collector when no longer needed.

Value: TYPE = REF ANY;
ValueType: TYPE;
Datatype: TYPE;
RopeType, IntType, BoolType, AnyDomainType: DataType;

Storing Cedar data values in tuples presents several problems. First, since we would like to define a single operation to store a new value into a specified attribute of a Relship (for instance), there must be a single type for all values that pass through this "store-value" procedure. This is the type Value above, represented as untyped REFs in Cedar. The DataTypes will be discussed in the next section. Entities, strings, integers, and booleans are the types of values the system currently recognizes and allows as attribute values. More precisely, these four types are Entity, ROPE, REF INT, and REF BOOL. In the case of an entity-valued attribute, an attribute's type may be AnyDomainType or a specific domain may be specified. The latter is highly preferred, as AnyDomainType is a loophole in the type mechanism and limits the kinds of operations that can be performed automatically by the database system or associated tools. We currently provide no mechanism to store compound Cedar data structures such as arrays, lists, or records in a database; the database system's data structuring mechanisms should be used instead. Cypress query operations such as RelationSubset cannot be composed upon data that appears as uninterpreted bits in the database.

Note that a Value may be either an Entity or a Datum. Some operations accept any Value, e.g. SetF; others require an Entity, e.g. NameOf. Others may require an Entity from a particular client-defined domain, e.g. a Person. We might think of the hierarchy of built-in and client defined types and instances of values like this:

Value type hierarchy Database representative of type

Value (REF ANY) ValueType
Datum DatumType
ROPE RopeType
INT IntType
BOOL BoolType
Entity AnyDomainType
person Entity Person domain
employee Entity Employee domain
... other client-defined entities ... ... other client-defined domains ...

As Cedar doesn't have a good mechanism for defining type hierarchies or new types for client-defined domains, most Cypress operations simply take a REF ANY or an Entity as argument, performing further type checking at run-time.

3.2 Transactions and segments

In this section we describe the basic operations to start up a database application's interaction with Cypress. The client application's data is stored in one or more segments, accessed under transactions. The Cypress system currently runs on the same machine as the client program, however transactions are implemented by the underlying file system which may reside on another machine. Data in remote segments may therefore be concurrently accessed by other instances of Cypress on other client machines.

A transaction is a sequence of read and write commands. The system supports the property that the entire sequence of commands executes atomically with respect to all other data retrieval and updates, that is, the transaction executes as if no other transactions were in progress at the same time. Because there may in fact be other transactions accessing the same data at the same time, it is possible that two transactions may deadlock, in which case one of them must be aborted. So the price paid for concurrent access is that programs be prepared to retry aborted transactions.

The database system provides the capability of accessing a database stored on the same machine as the database client, using the Pilot file system or on Alpine file servers. We currently permit only one transaction per segment per instance of the database software on a client machine. That is, data in remote segments may concurrently be updated by application programs under separate transactions, but on the same machine transactions are used simply to make application transactions on their respective segments independent. This transaction-per-segment scheme is a major simplification of the Cypress package. In addition, as we shall see presently, nearly all Cypress procedures can automatically infer the appropriate segment and transaction from the procedure arguments, avoiding the need to pass the transaction or segment for every database operation.

Calls to Initialize, DeclareSegment, and OpenTransaction start the database session. A transaction is either passed in by the client, or created by the database package (the latter is just a convenience feature). The operation MarkTransaction below forms the end of a database transaction and the start of a new one. The operation AbortTransaction may be used to abort a transaction. Data in a database segment may not be read or updated until the segment and transaction have been opened. Clients must decide when to tell the system that a transaction is complete (with CloseTransaction), and must be prepared to deal with unsolicited notification that the current transaction has been aborted because of system failure or lock conflict.

The client's interaction with the database system begins with a call to Initialize:

Initialize: PROC[
nCachePages: CARDINAL← 256,
nFreeTuples: CARDINAL← 32,
cacheFileName: ROPE← NIL ];

Initialize initializes the database system and sets various system parameters: nCachePages tells the system how many pages of database to keep in virtual memory on the client's machine, nFreeTuples specifies the size to use for the internal free list of Entity and Relship handles, and cacheFileName is the name of the disk file used for the cache backing store. Any or all of these may be omitted in the call; they will be given default values. Initialize should be called before any other operation; the schema declaration operations generate the error DatabaseNotInitialized if this is violated.

Before database operations may be invoked, the client must open the segment(s) in which the data are stored. The location of the segment is specified by using the full path name of the file, e.g. "[MachineName]<Directory>SubDirectory>SegmentName.segment". Each segment has a unique name, the name of a Cedar ATOM which is used to refer to it in Cypress operation. The name of the Cedar ATOM is normally, though not necessarily, the same as that of the file in which it is stored, except the extension ".segment" and the prefix specifying the location of the file is omitted in the ATOM. If the file is on the local file system, its name is preceded by "[Local]". For example, "[Local]Foo" refers to a segment file on the local disk named Foo.database; "[Alpine]<CedarDB>Baz" refers to a segment named Baz.segment on the <CedarDB> directory on the Alpine server. It is generally a bad idea to access database segments other than through the database interface. However, because segments are physically independent and contain no references to other files by file identifier or explicit addresses within files, the segment files may be moved from machine to machine or renamed without effect on their contents. If a segment file in a set of segments comprising a client database is deleted, the others may still be opened to produce a database missing only that segment's entities and relationships. A segment is defined by the operation DeclareSegment:

DeclareSegment: PROC[
filePath: ROPE, segment: Segment, number: INT← 0,
readOnly: BOOL← FALSE, version: Version← OldOnly,
nBytesInitial, nBytesPerExtent: LONG CARDINAL← 32768]
RETURNS [Segment];

Segment: TYPE = ATOM;
Version: TYPE = {NewOnly, OldOnly, NewOrOld};

The version parameter to DeclareSegment defaults to OldOnly to open an existing file. The signal IllegalFileName is generated if the directory or machine name is missing from fileName, and FileNotFound is generated at the time a transaction is opened on the segment if the file does not exist. If version NewOnly is passed, a new segment file will be created, erasing any existing one. In this case, a number assigned to the segment by the database administrator must also be passed. This number hack is necessitated by our current implementation of segments (it specifies the section of the database address space in which to map this segment). Please bear with us. Finally, the client program can pass version=NewOrOld to open a new or existing segment file; in this case the segment number must also be passed, of course.

The other parameters to DeclareSegment specify properties of the segment. If readOnly=TRUE, then writes are not permitted on the segment; any attempt to invoke a procedure which modifies data will generate the error ProtectionViolation. nBytesInitial is the initial size to assign to the segment, and nBytesPerExtent is the incremental increase in segment size used when more space is required for data in the file.

For convenience, a call is available to return the list of segments that have been declared in the current Cypress session:

GetSegments: PROC RETURNS[LIST OF Segment ];

A transaction is associated with a segment by using OpenTransaction:

OpenTransaction: PROC[
segment: Segment,
userName, password: ROPE← NIL,
useTrans: Transaction← NIL ];

If useTrans is NIL then OpenTransaction establishes a new connection and transaction with the corresponding (local or remote) file system. Otherwise it uses the supplied transaction. The same transaction may be associated with more than one segment by calling OpenTransaction with the same useTrans argument for each. The given user name and password, or by default the logged in user, will be used if a new connection must be established.

Any database operations upon data in a segment before a transaction is opened or after a transaction abort will invoke the Aborted signal. The client should catch this signal on a transaction abort, block any further database operations and wait for completion of any existing ones. Then the client may re-open the aborted transaction by calling OpenTransaction. When the remote transaction is successfully re-opened, the client's database operations may resume.

Note that operations on data in segments under different transactions are independent. Normally there will be one transaction (and one or more segments) per database application program. A client may find what transaction has been associated with a particular segment by calling

TransactionOf: PROC [segment: Segment] RETURNS [Transaction];

Transactions may be manipulated by the following procedures:

MarkTransaction: PROC[trans: Transaction];

AbortTransaction: PROC [trans: Transaction];

CloseTransaction: PROC [trans: Transaction];

MarkTransaction commits the current database transaction, and immediately starts a new one. User variables which reference database entities or relationships are still valid.

AbortTransaction aborts the current database transaction. The effect on the data in segments associated with the segment is as if the transactions had never been started, the state is as it was just after the OpenTransaction call or the most recent MarkTransaction call. Any attempts to use variables referencing data fetched under the transaction will invoke the NullifiedArgument error. A call to OpenTransaction is necessary to do more database operations, and all user variables referencing database items created or retrieved under the corresponding transaction must be re-initialized (they may reference entities or relationships that no longer exist, and in any case they are marked invalid by the database system).

A simple client program using the database system might have the form, then:

Initialize[];
DeclareSegment["[Local]Test", $Test];
OpenTransaction[$Test];
...
... database operations, including zero or more MarkTransaction calls ...
...
CloseTransaction[TransactionOf[$Test]];

3.3 Data schema definition

The definition of the client's data schema is done through calls to procedures defined in this section. The data schema is represented in a database as entities and relationships, and although updates to the schema must go through these procedures to check for illegal or inconsistent definitions, the schema can be read via the normal data operations described in the next section. Each domain, relation, etc., has an entity representative that is used in data operations which refer to that schema item. For example, we pass the domain entity when creating a new entity in the domain. The types of schema items are:

Domain, Relation, Attribute, Datatype, Index, IndexFactor: TYPE = Entity;

Of course, since the schema items are entities, they must also belong to domains; there are pre-defined domains, which we call system domains, in the interface for each type of schema entity:

DomainDomain, RelationDomain, AttributeDomain, DatatypeDomain, IndexDomain: Domain;

There are also pre-defined system relations, which contain information about sub-domains, attributes, and indices. Since these are not required by the typical (application-specific) database client, we defer the description of the system relations to Section 3.6.

In general, any of the data schema may be extended or changed at any time; i.e., data operations and data schema definition may be intermixed. However, there are a few specific ordering constraints on schema definition we will note shortly. Also, the database system optimizes for better performance if the entire schema is defined before any data are entered. The interactive schema editing tool described in the database tools documentation allows the schema to be changed regardless of ordering constraints and existing data, by recreating schema items and copying data invisibly to the user when necessary.

All the data schema definition operations take a Version parameter which specifies whether the schema element is a new or existing one. The version defaults to allowing either (NewOrOld): i.e., the existing entity is returned if it exists, otherwise it is created. This feature avoids separate application code for creating the database schema the first time the application program is run.

DeclareDomain: PROC [name: ROPE, segment: Segment,
version: Version← NewOrOld, estRelations: INT← 5] RETURNS [d: Domain];

DeclareSubType: PROC[sub, super: Domain];

DeclareDomain defines a domain with the given name in the given segment and returns its representative entity. If the domain already exists and version=NewOnly, the signal AlreadyExists is generated. If the domain does not already exist and version=OldOnly, then NIL is returned. The parameter estRelations is used to estimate the largest number of relations in which entities of this domain are expected to participate.

The client may define one domain to be a subtype of another by calling DeclareSubType. This permits entities of the subdomain to participate in any relations in which entities of the superdomains may participate. All client DeclareSubType calls should be done before declaring relations on the superdomains (to allow some optimizations). The error MismatchedSegment is generated if the sub-domain and super-domain are not in the same segment.

DeclareRelation: PROC [
name: ROPE, segment: Segment, version: Version← NewOrOld] RETURNS [r: Relation];

DeclareAttribute: PROC [
r: Relation, name: ROPE, type: ValueType← NIL,
uniqueness: Uniqueness ← None, length: INT← 0,
link: {Linked, Unlinked, Colocated, Remote}← yes, version: Version← NewOrOld]
RETURNS[a: Attribute];

Uniqueness: TYPE = {NonKey, Key, KeyPart, OptionalKey};

DeclareRelation defines a new or existing relation with the given name in the given segment and returns its representative entity. If the relation already exists and version=NewOnly, the signal AlreadyExists is generated. If the relation does not already exist and version=OldOnly, then NIL is returned.

DeclareAttribute is called once for each attribute of the relation, to define their names, types, and uniqueness. If version=NewOrOld and the attribute already exists, Cypress checks that the new type, uniqueness, etc. match the existing attribute. The error MismatchedExistingAttribute is generated if there is a discrepancy. The attribute name need only be unique in the context of its relation, not over all attributes. Note this is the only exception to the data model's rule that names be unique in a domain. Also note that we could dispense with DeclareAttribute altogether by passing a list into the DeclareRelation operation; we define a separate procedure for programming convenience.

The attribute type should be a ValueType, i.e. it may be one of the pre-defined types (IntType, RopeType, BoolType, AnyDomainType) or the entity representative for a domain. For pre-defined types, the actual values assigned to attributes of the relationship instances of the relation must have the corresponding type: REF INT, ROPE, REF BOOL, or Entity. If the attribute has a domain as type, the attribute values in relationships must be entities of that domain or some sub-domain thereof. The type is permitted to be one of the pre-defined system domains such as the DomainDomain, thereby allowing client-defined extensions to the data schema (for example, a comment for each domain describing its purpose).

The attribute uniqueness indicates whether the attribute is a key of the relation. If its uniqueness is NonKey, then the attribute is not a key of the relation. If its uniqueness is OptionalKey, then the system will ensure that no two relationships in r have the same value for this attribute (if a value has been assigned). The error NonUniqueKeyValue is generated if a non-unique key value results from a call to the SetP, SetF, SetFS, or CreateRelship procedures we define later. Key acts the same as OptionalKey, except that in addition to requiring that no two relationships in r have the same value for the attribute, it requires that every entity in the domain referenced by this attribute must be referenced by a relationship in the relation: the relationships in the relation and the entities in the domain are in one-to-one correspondence. Finally, if an attribute's uniqueness is KeyPart, then the system will ensure that no two relationships in r have the same value for all key attributes of r, though two may have the same values for some subset of them.

The length and link arguments to DeclareAttribute have no functional effect on the attribute, but are hints to the database system implementation. For RopeType fields, length characters will be allocated for the string within the space allocated for a relationship in the database. There is no upper limit on the size of a string-valued attribute; if it is longer than length, it will be stored separately from the relationship with no visible effect except for the performance of database applications. The link field is used only for entity-valued fields; it suggests whether the database system should link together relationships which reference an entity in this attribute. In addition, it can suggest that the relationships referencing an entity in this attribute be physically co-located as well as linked. Again, its logical effect is only upon performance, not upon the legal operations.

DestroyRelation: PROC[r: Relation];

DestroyDomain: PROC[d: Domain];

DestroySubType: PROC[sub, super: Domain];

Relations, domains, and subdomain relationships may be destroyed by calls to the above procedures. Destroying a relation destroys all of it relationships. Destroying a domain destroys all of its entities and also any relationships which reference those entities. Destroying a sub-domain relationship has no effect on existing domains or their entities; it simply makes entities of domain sub no longer eligible to participate in the relations in which entities of domain super can participate. Existing relationships violating the new type structure are allowed to remain. Existing relations and domains may only be modified by destroying them with the procedures above, with one exception: the operation ChangeName (described in Section 3.4) may be used to change the name of a relation or domain.

DeclareIndex: PROC [
relation: Relation, indexedAttributes: AttributeList, version: Version];

DeclareIndex has no logical effect on the database; it is a performance hint, telling the database system to create a B-Tree index on the given relation for the given indexedAttributes. The index will be used to process queries more efficiently. Each index key consists of the concatenated values of the indexedAttributes in the relationship the index key references. For entity-valued attributes, the value used in the key is the string name of the entity. The version parameter may be used as in other schema definition procedures, to indicate a new or existing index. If any of the attributes are not attributes of the given relation then the signal IllegalIndex is generated.

The optimal use of indices, links, and colocation, as defined by DeclareIndex and DeclareAttribute, is complex. It may be necessary to do some space and time analysis of a database application to choose the best trade-off, and a better trade-off may later be found as a result of unanticipated access patterns. Note, however, that a database may be rebuilt with different links, colocation, or indices, and thanks to the data independence our interface provides, existing programs will continue to work without change.

If a relation is expected to be very small (less than 100 relationships), then it might reasonably be defined with neither links nor indices on its attributes. In the typical case of a larger relation, one should examine the typical access paths: links are most appropriate if relationships that pertain to particular entities are involved, indices are more useful if sorting or range queries are desired.

B-tree indices are always maintained for domains; that is, an index contains entries for all of the entities in a domain, keyed by their name, so that sorting or lookup by entity name is quick. String comparisons are performed in the usual lexicographic fashion.

DeclareProperty: PROC [
relationName: ROPE, of: Domain, type: ValueType,
uniqueness: Uniqueness← None, version: Version← NewOrOld]
RETURNS [property: Attribute];

DeclareProperty provides a shorthand for definition of a binary relation between entities of the domain "of" and values of the specified type. The definitions of type and uniqueness are the same as for DeclareAttribute. A new relation relationName is created, and its attributes are given the names "of" and "is". The "is" attribute is returned, so that it can be used to represent the property in GetP and SetP defined in the next section.

3.4 Basic operations on entities and relationships

In this section, we describe the basic operations on entities and relationships; we defer the operations on domains and relations to the next section.

A number of error conditions are common to all of the procedures in this section. Since values are represented as REF ANYs, all type checking must currently be done at run-time. The procedures in this section indicate illegal arguments by generating the errors IllegalAttribute, IllegalDomain, IllegalRelation, IllegalValue, IllegalEntity, and IllegalRelship, according to the type of argument expected. The error NILArgument is generated if NIL is passed to any procedure that cannot accept NIL for that argument. The error NullifiedArgument is generated if an entity or relationship is passed in after it has been deleted or rendered invalid by transaction abort or close.

DeclareEntity: PROC[
d: Domain, name: ROPE← NIL, version: Version← NewOrOld]
RETURNS [e: Entity];

DeclareEntity finds or creates an entity in domain d with the given name. The name may be omitted if desired, in which case an entity with a unique name is automatically created. If version is OldOnly and an entity with the given name does not exist, NIL is returned. If version is NewOnly and an entity with the given name already exists, the signal NonUniqueEntityName is generated.

DeclareRelship: PROC [
r: Relation, avl: AttributeValueList← NIL, version: Version← NewOrOld]
RETURNS [Relship];

DeclareRelship finds or creates a relship in r with the given attribute values. If version is NewOnly, a new relship with the given attribute values is generated. If version is OldOnly, the relship in r with the given attribute values is returned if it exists, otherwise NIL is returned. If version is NewOrOld, the relship with the given attribute values is returned if it exists, otherwise one is created. If the creation of a new relship violates the key constraints specified by DeclareAttribute, the signal NonUniqueAttributeValue is generated.

DestroyEntity: PROC[e: Entity];

DestroyEntity removes e from its domain, destroys all relationships referencing it, and destroys the entity representative itself. Any client variables that reference the entity automatically take on the null value (Null[e] returns TRUE), and cause error NullifiedArgument if passed to database system procedures. After an entity is destroyed, its old name may be re-used in creating a new one.

DestroyRelship: PROC[t: Relship];

DestroyRelship removes t from its relation, and destroys it. Any client variables that reference the relationship automatically take on the null value, and will cause error NullifiedArgument if subsequently passed to database system procedures.

SetF: PROC[t: Relship, a: Attribute, v: Value];

SetF assigns the value v to attribute a of relationship t. If the value is not of the same type as the attribute (or a subtype thereof if the attribute is entity-valued), then the error MismatchedAttributeValueType is generated. If a is not an attribute of t's relation, IllegalAttribute is generated.

GetF: PROC[t: Relship, a: Attribute] RETURNS [Value];

GetF retrieves the value of attribute a of relationship t. If a is not an attribute of t's relation, error IllegalAttribute is generated. The client should use the V2x routines described in the next section to coerce the value into the expected type.

SetFS: PROC [t: Relship, a: Attribute, v: ROPE];
GetFS: PROC[t: Relship, a: Attribute] RETURNS [ROPE];

GetFS and SetFS provide a convenient veneer on top of GetF and SetF that provide the illusion that all relation attributes are string-valued. The effect is something like the Relational data model, and is useful for applications such as a relation displayer and editor that deal only with strings. The semantics of GetFS and SetFS depend upon the actual type of the value v of attribute a:

type GetFS returns SetFS assigns attribute to be

RopeType the string v the string v

IntType v as a decimal string the string as a decimal integer

BoolType "TRUE" or "FALSE" true if "TRUE", false if "FALSE"

a domain D the entity name the entity with name v

("" if v = NIL) (or NIL if v is null string)

AnyDomainType <domain-name>:<entity-name> the entity with given domain, name

(or NIL if v is null string)

The same signals generated by GetF and SetF, such as IllegalAttribute, can also be generated by these procedures. The string NIL represents the undefined value. The signal NotFound is generated in the last case above if no entity with the given name is found.

NameOf: PROC [e: Entity] RETURNS [s: ROPE];
ChangeName: PROC [e: Entity, s: ROPE];

NameOf and ChangeName retrieve or change the name of an entity, respectively. They generate the signal IllegalEntity if e is not an entity.

ChangeName should be used with caution. It is not quite equivalent to destroying and re-creating an entity with the new name but the same existing relationships referencing it. ChangeName is considerably faster than that, and furthermore entity-valued variables which reference the entity are not nullified by ChangeName, though they would be by DestroyEntity. These features should be a help, not a hindrance. However, changing an entity name may invalidate references to the entity from outside the segment, e.g. in another segment or in some application-maintained file such as a log of updates.

DomainOf: PROC[e: Entity] RETURNS [Domain];
RelationOf: PROC[t: Relship] RETURNS [Relation];

DomainOf and RelationOf can be used to find the entity representative of an entity's domain or a relationship's relation, respectively. The signal IllegalEntity is generated if e is not an entity. The signal IllegalRelship is generated if r is not a relationship.

SegmentOf: PROC[e: Entity] RETURNS [Segment];

SegmentOf returns the segment in which an entity is stored. It can be applied to domain, relation, or attribute entities.

Eq: PROC [e1: Entity, e2: Entity] RETURNS [BOOL];

Eq returns TRUE iff the same database entity is referenced by e1 and e2. This is not equivalent to the Cedar expression "e1 = e2", which computes Cedar REF equality. If e1 and e2 are in different segments, Eq returns true iff they have the same name and their domains have the same name.

Null: PROC [x: EntityOrRelship] RETURNS [BOOL];

Null returns TRUE iff its argument has been destroyed, is NIL, or has been invalidated by abortion of the transaction under which it was created.

GetP: PROC [e: Entity, aIs: Attribute, aOf: Attribute← NIL] RETURNS [Value];
SetP: PROC [e: Entity, aIs: Attribute, v: Value, aOf: Attribute← NIL] RETURNS[Relship];

GetP and SetP are convenience routines for a common use of relationships, to represent "properties" of entities. Properties allow the client to think of values stored in relationships referencing an entity as if they are directly accessible fields (or "properties") of the entity itself. See the figure on page 15 illustrating properties. GetP finds a relationship whose from attribute is equal to e, and returns that relationship's to attribute. The from attribute may be defaulted if the relation is binary, it is assumed to be the other attribute of to's relation. If it is not binary, the current implementation will find the "first" other attribute, where "first" is defined by the order of the original calls calls to DeclareAttribute. SetP defaults the from attribute similarly to GetP, but operates differently depending on whether from is a key of the relation. Whether it is a key or not, any previous relationship that referenced e in the from attribute is automatically deleted. In either case, a new relationship is created whose from attribute is e and whose to attribute is v. SetP returns the relationship it creates for the convenience of the client. GetP and SetP can generate the same errors as SetF and GetF, e.g. if e is not an entity or to is not an attribute. In addition, GetP and SetP can generate the error IllegalProperty if to and from are from different relations. GetP generates the error MismatchedPropertyCardinality if more than one relationship references e in the from attribute; if no such relationships exist, it returns a null value of the type of the to attribute. SetP allows any number of existing relationships referencing e; it simply adds another one (when from is a key, of course, there will always be one relationship).

GetPList: PROC [e: Entity, to: Attribute, from: Attribute← NIL] RETURNS [LIST OF Value];
SetPList: PROC [e: Entity, to: Attribute, vl: LIST OF Value, from: Attribute← NIL];

GetPList and SetPList are similar to GetP and SetP, but they assume that any number of relationships may reference the entity e with their from attribute. They generate the signal MismatchedPropertyCardinality if this is not true, i.e. the from attribute is a key. GetPList returns the list of values of the to attributes of the relationships that reference e with their from attribute. Cedar has LISP-like list manipulation facilities. SetPList destroys any existing relationships that reference e with their from attribute, and creates Length[vl] new ones, whose from attributes reference e and whose to attributes are the elements of vl. GetPList and SetPList may generate any of the errors that GetF and SetF may generate, and the error IllegalProperty if to and from are from different relations.

Note that the semantics of SetPList are not quite consistent with the semantics of SetP. SetPList replaces the current values associated with a "property" with the new values (i.e., destroys and re-creates relationships); SetP adds a new property value, unless the aOf attribute is a key, in which case it replaces the current value. The semantics are defined in this way because this has proven the most convenient in our application programs.

Examples of the use of the property procedures for data access can be found in Section 4.3. Properties are also useful for obtaining information about the data schema. For example, GetP[a, aRelationIs] will return the attribute a's relation, and GetPList[d, aTypeOf] will return all the attributes that can reference domain d.

E2V: PROC[e: Entity] RETURNS[v: Value];
B2V: PROC[b: BOOLEAN] RETURNS[v: Value];
I2V: PROC[i: LONG INTEGER] RETURNS[v: Value];
S2V: PROC[s: ROPE] RETURNS[v: Value];

The x2V routines convert the various Cedar types to Values. The conversion is not normally required for ropes and entities since the compiler will widen these into the REF ANY type Value.

V2E: PROC[v: Value] RETURNS[Entity];
V2B: PROC[v: Value] RETURNS[BOOLEAN];
V2I: PROC [v: Value] RETURNS[LONG INTEGER];
V2S: PROC [v: Value] RETURNS[ROPE];

The V2x routines convert Values to the various Cedar types. The MismatchedValueType error is raised if the value is of the wrong type. It is recommended that these routines be used rather than user-written NARROWs, as the representation of Values may change. Also, NARROWs of opaque types don't yet work in the Cedar compiler.

3.5 Query operations on domains and relations

In this section we describe queries upon domains and relations: operations that enumerate entities or relationships satisfying some constraint.

RelationSubset: PROC[
r: Relation, constraint: AttributeValueList← NIL]
RETURNS [RelshipSet];

NextRelship: PROC[rs: RelshipSet] RETURNS [Relship];

PrevRelship: PROC[rs: RelshipSet] RETURNS [Relship];

ReleaseRelshipSet: PROC [rs: RelshipSet];

AttributeValueList: TYPE = LIST OF AttributeValue;
AttributeValue: TYPE = RECORD [
attribute: Attribute,
low: Value,
high: Value← NIL -- omitted where same as low or not applicable --];

The basic query operation is RelationSubset. It returns a generator of all the relationships in relation r which satisfy a constraint list of attribute values. The relationships are enumerated by calling NextRelship repeatedly; it returns NIL when there are no more relationships. PrevRelship may similarly be called repeatedly to back the enumeration up, returning the previous relationship; it returns NIL if the enumeration is at the beginning. ReleaseRelshipSet should be called when the client is finished with the query.

The constraint list may be NIL, in which case all of the relationships in r will be enumerated. Otherwise, relationships which satisfy the concatenation of constraints on attributes in the list will be enumerated. If an index exists on some subset of the attributes, the relationships will be enumerated sorted on the concatenated values of those attributes. For a RopeType, IntType, or TimeType attribute a of r, the contraint list may contain a record of the form [a, b, c] where the attribute value must be greater or equal to b and less than or equal to c to satisfy the constraint. For any type of attribute, the list may contain a record of the form [a, b] where the value of the attribute must exactly equal b. The Cedar ROPE literals "" and "\377" may be used in queries as an infinitely large and infinitely small string, respectively. The signal MismatchedAttributeValueType is generated by RelationSubset if one of the low or high values in the list is of a different type than its corresponding attribute.

DomainSubset: PROC[
d: Domain,
lowName, highName: ROPE← NIL,
searchSubDomains: BOOL← TRUE,
searchSegment: Segment← NIL]
RETURNS [EntitySet];

NextEntity: PROC[EntitySet] RETURNS [Entity];

PrevEntity: PROC[EntitySet] RETURNS [Entity];

ReleaseEntitySet: PROC[EntitySet];

DomainSubset enumerates all the entities in a domain. If lowName and highName are NIL, the entire domain is enumerated, in no particular order. Otherwise, only those entities whose names are lexicographically greater than lowName and less than highName are enumerated, in lexicographic order. If searchSubDomains is TRUE, subdomains of d are also enumerated. Each subdomain is sorted separately. The searchSegment argument is currently only used if d is one of the system domains, e.g. the Domain domain. It is used to specify which segment to search.

Analogously to relation enumeration, NextEntity and PrevEntity may be used to enumerate the entities returned by DomainSubset, and ReleaseEntitySet should be called upon completion.

GetDomainRefAttributes: PROC [d: Domain] RETURNS [AttributeList];

This procedure returns a list of all attributes, of any relation defined in d's segment, which reference domain d or one of its superdomains. The list does not include AnyDomainType attributes, which can reference any domain. GetDomainRefAttributes is implemented via queries on the data schema. GetDomainRefAttributes is useful for application-independent tools; most specific applications can code-in the relevant attributes.

GetEntityRefAttributes: PROC [e: Entity] RETURNS [AttributeList];

This procedure returns a list of all attributes in which some existing relationship actually references e, including AnyDomainType attributes.

3.6 System domains and relations

In this section we describe what one might call the schema schema, the pre-defined system domains and relations which constitute the data schema for client-defined domains and relations. The typical database application writer may skip this section, since the schema declaration operations defined in Section 3.3 are adequate when the data schema is completely defined and known at the time a program is written. The system domains and relations we describe in this section are most useful for general-purpose tools (e.g. for displaying, querying, or dumping any database), where the tools must examine the data schema "on the fly".

As noted earlier, the permanent repository for data describing user-defined data in a database is the database's data schema, represented by schema entities and relationships. Schema entities are members of one of the pre-defined system domains: DomainDomain, RelationDomain, DatatypeDomain, and so on. Every client-defined domain, relation, or attribute contains a representative entity in these domains. Client-defined datatypes are not currently permitted, so the only entities in the DataType domain are the pre-defined IntType, RopeType, and BoolType.

The information about the client-defined domains and attributes are encoded by relationships in the database. Domains participate in the system relation dSubType, which encodes a domain type hierarchy:

dSubType: Relation;
dSubTypeOf: Attribute; -- the domain in this attribute is a super-type of
dSubTypeIs: Attribute; -- the domain in this attribute

The dSubType has one element per direct domain-subdomain relationship, it does not contain the transitive closure of that relation. However, it is guaranteed to contain no cycles. That is, the database system checks that there is no set of domains d1, d2, ... dN, N>1, such that d1 is a subtype of d2, d2 is a subtype of d3, and so on to dN, and d1=dN. The dSubType may define a lattice as opposed to a tree, i.e. the sSubType attribute is not a key of the relation.

The information about attributes is encoded as binary relations, one relation for each argument to the DeclareAttribute procedure defining properties of the attribute. The names are easy to remember; for each argument, e.g. Foo, we define the aFoo relation, with attributes aFooOf and aFooIs. The aFooIs attribute is the value of that argument, and the aFooOf attribute is [the entity representative of] the attribute it pertains to. Thus we have the following relations:

aRelation: PUBLIC READONLY Relation; -- Specifies attribute - relation correspondence:
-- [aRelationOf: KEY Attribute, aRelationIs: Relation]
aRelationOf: PUBLIC READONLY Attribute; -- attribute whose relation we are specifying
aRelationIs: PUBLIC READONLY Attribute; -- the relation of that attribute

aType: PUBLIC READONLY Relation; -- Specifies types of relation attributes:
-- [aTypeOf: KEY Attribute, aTypeIs: ValueType]
aTypeOf: PUBLIC READONLY Attribute; -- the attribute
aTypeIs: PUBLIC READONLY Attribute; -- domain or datatype of the attribute

aUniqueness: PUBLIC READONLY Relation; -- Specifies attribute value uniqueness:
-- [aUniquenessOf: KEY Attribute, aUniquenessIs: INT LOOPHOLE[Uniqueness]]
aUniquenessOf: PUBLIC READONLY Attribute; -- the attribute
aUniquenessIs: PUBLIC READONLY Attribute; -- INT for Uniqueness: 0=None, 1=Key, etc.

aLength: PUBLIC READONLY Relation; -- Specifies length of attributes:
-- [aLengthOf: KEY Attribute, aLengthIs: INT]
aLengthOf: PUBLIC READONLY Attribute; -- the attribute
aLengthIs: PUBLIC READONLY Attribute; -- INT corresponding to attribute's length

aLink: PUBLIC READONLY Relation; -- Specifies whether attribute is linked:
-- [aLinkOf: KEY Attribute, aLinkIs: INT]
aLinkOf: PUBLIC READONLY Attribute; -- the attribute
aLinkIs: PUBLIC READONLY Attribute; -- 0=unlinked, 1=linked, 2 =colocated

The final set of system relations pertain to index factors. Each index on a relation is defined to include one or more attributes of a relation. For each attribute in the index, there is an index factor entity. For each index, there is an index entity. Each index factor is associated with exactly one index and exactly one attribute. Indices may have many index factors, however, and an attribute may be associated with more than one index factor, since attributes may participate in multiple indices. The two relations pertaining to indices map indices on to their index factors, and index factors to the attributes they index:

ifIndex: PUBLIC READONLY Relation; -- Specifies the index factors for each index
-- [ifIndexOf: KEY IndexFactor, ifIndexIs: Index]
ifIndexOf: PUBLIC READONLY Attribute; -- the index factor
ifIndexIs: PUBLIC READONLY Attribute; -- index of the factor

ifAttribute: PUBLIC READONLY Relation; -- Specifies attribute index factor corresponds to
-- [ifAttributeOf: KEY IndexFactor, ifAttributeIs: Attribute]
ifAttributeOf: PUBLIC READONLY Attribute; -- the index factor
ifAttributeIs: PUBLIC READONLY Attribute; -- the attribute this factor represents

The relations on attributes, index factors, and domains can be queried with the RelationSubset or GetPList operations. For example, GetP[a, aRelationIs] returns the attribute a's relation. GetPList[r, aRelationOf] returns the relation r's attributes. RelationSubset[dSubType, LIST[[dSubTypeIs, d]]] will enumerate all the dSubType relationships in which d is the subtype.

As noted earlier, the data schema (attributes, relations, domains, indices, index factors, and relations pertaining to these) may only be read, not written by the database client. In order to ensure the consistency of the schema, it must be written indirectly through the schema definition procedures: DeclareDomain, DeclareRelation, DeclareAttribute, and DeclareSubType. Attempts to perform updates through operations such as SetP result in the error ImplicitSchemaUpdate.

3.7 Errors

When a database system operation invokes an error, the SIGNAL Error is generated, with an error code indicating the type of error that occured. The error code is a Cedar enumerated type:

Error: SIGNAL [code: ErrorCode];
ErrorCode: TYPE = {

AlreadyExists, -- Entity already exists and client said version=NewOnly
BadUserPassword, -- On an OpenTransaction
DatabaseNotInitialized, -- Attempt to do operation without calling Initialize
FileNotFound, -- No existing segment found with given name
IllegalAttribute, -- Attribute not of the given relship's Relation or not an attribute
IllegalValueType, -- Type passed DeclareAttribute is not datatype or domain
IllegalDomain, -- Argument is not actually a domain
IllegalFileName, -- No directory or machine given for segment
IllegalEntity, -- Argument to GetP, or etc., is not an Entity
IllegalRelship, -- Argument to GetF, or etc., is not a Relship
IllegalRelation, -- Argument is not a relation
IllegalSegment, -- Segment passed to DeclareDomain, or etc., not yet declared
IllegalString, -- Nulls not allowed in ROPEs passed to the database system
IllegalSuperType, -- Can't define subtype of domain that already has entities
IllegalValue, -- Value is not REF INT, ROPE, REF BOOL, or Entity
IllegalValueType, -- Type passed DeclareAttribute is not datatype or domain
ImplicitSchemaUpdate, -- Attempt to modify schema with SetP, DeclareEntity, etc.
InternalError, -- Impossible internal state (possibly bug or bad database)
MismatchedProperty, -- aOf and aIs attribute not from the same relation
MismatchedAttributeValueType, -- Value not same type as required (SetF)
MismatchedExistingAttribute, -- Existing attribute is different (DeclareAttribute)
MismatchedExistingSegment, -- Existing segment is different (DeclareSegment)
MismatchedPropertyCardinality, -- Did GetP with aOf that is not a Key
MismatchedSegment, -- Attempt to create ref across segment boundary (SetF)
MismatchedValueType, -- value passed V2E, V2I, etc. not of expected type
MultipleMatch, -- More than one relationship satisfied avl on DeclareRelship.
NonUniqueEntityName, -- Entity in domain with that name already exists
NonUniqueKeyValue, -- Relship already exists with that value
NotFound, -- Version is OldOnly but no such Entity, Relation, or etc found
NotImplemented, -- Action requested is not yet implemented
NILArgument, -- Attempt to perform operation on NIL argument
NullifiedArgument, -- Entity or relationship has been deleted or invalidated
ProtectionViolation, -- Read or write to segment not permitted this user.
SegmentNotDeclared, -- Attempt to open transaction w/o DeclareSegment
ServerNotFound -- File server does not exist or does not respond
};

In this report, the expression "generates the error X" means that the SIGNAL Error is generated with code=X. Unless otherwise specified, the client may CONTINUE from the signal, aborting the operation in question. Signals should not be RESUMEd except by a wizard who knows the result of proceeding with an illegal operation.

Two special signals are associated with the file system level:

Aborted: SIGNAL [trans: Transaction];
Failure: SIGNAL [why: ATOM, server: ROPE];

The Aborted signal can be generated by any database operation, and indicates that the transaction has been aborted. The client must call AbortTransaction and may then call OpenTransaction to proceed. The Failure signal is generated when a transaction cannot be open due to server failure or communication difficulties.

4. Application Example

APPLICATION EXAMPLE

This section provides a simple example of the use of Cypress. Section 4.1 introduces the example, a database of documents. Section 4.2 is a discussion of database design: the process of representing abstractions of real-world information structures in a database, somewhat specialized to the data structures available in Cypress. In Section 4.3, a working program is illustrated.

Our example is necessarily short; don't expect any startling revelations on these pages. We will try to consider some of the most common cases, however.

4.1 A database application

What are the properties of a well-designed database? To a large extent these properties follow from the general properties of databases. For instance, we would like our databases to extend gracefully as new types of information are added, since the existing data and programs are likely to be quite valuable.

It may be useful to consider the following point. The distinguishing aspect of information stored in a database system is that at least some of it is stored in a form that can be interpreted by the system itself, rather than only by some application-specific program. Hence, one important dimension of variation among different database designs is in the amount of the database that is system-interpretable, i.e. the kinds of queries that can be answered by the system.

As an example of variation in this dimension, consider the problem of designing a database for organizing a collection of Mesa modules. In the present Mesa environment, this database would need to include at least the names of all the definitions modules, program modules, configuration descriptions, and current .Bcd files. A database containing only this information is little more than a file directory, and therefore the system's power to answer queries about information in this database is very limited. A somewhat richer database might represent the DIRECTORY and IMPORTS sections of each module as relationships, so that queries such as "which modules import interface Y?" can be answered by the system. This might be elaborated further to deal with the use of individual types and procedures from a definitions module, and so on. There may be a limit beyond which it is useless to represent smaller objects in the database; if we aren't interested in answering queries like "what procedures in this module contain IF statements?", it may be attractive to represent the body of a procedure (or some smaller naming scope) as a text string that is not interpretable by the database system, even though it is stored in a database.

We shall illustrate design ideas with a database of information about documents. Our current facilities, which again are simply file directories, leave much to be desired. The title of a document on the printed page does not tell the reader where the document is stored or how to print a copy. Relationships between different versions of the same basic document are not explicit. Retrievals by content are impossible. Our goal here is not to solve all of these problems, but to start a design that has the potential of dealing with some of them.

4.2 Schema design

Each document in our example database has a title and a set of authors. Hence we might represent a collection of documents with a domain of entities whose name is the title of the document, and an author property specifying the authors:

Document: Domain = DeclareDomain["Domain"];
dAuthors: Property = DeclareProperty["author", Document, RopeType];

Here the authors' names are concatenated into a single string, using some punctuation scheme to allow the string to be decoded into the list of authors. This is a very poor database design because it does not allow the system to respond easily to queries involving authors; the system cannot parse the encoded author list.

Note that in the above definition authors are strings, so anything is acceptable as an author. This weak typing has some flexibility: the database will never complain that it doesn't know the author you just attached to a certain document. However, the system is not helpful in catching errors when a new document is added to the database. If "Mark R. Brown" is mistakenly spelled "Mark R. Browne", then one of Mark's papers will not be properly retrieved by a later search. A step in the direction of stronger type checking is to provide a separate domain for authors.

To represent authors as entities, and to allow a variable number number of authors for a document, a better design would be:

Document: Domain = DeclareDomain["Domain"];
Person: Domain = DeclareDomain["Person"];
author: Property = DeclareProperty["author", Document, Person];

Incidentally, in the last line above we define a property rather than relation for brevity. Instead of the author property declaration we could have written:

author: Relation = DeclareRelation["author"];
authorOf: Attribute = DeclareAttribute[author, "of", Document];
authorIs: Attribute = DeclareAttribute[author, "is", Person];

The property declaration has exactly the same effect on the database as the relation declaration, since it automatically declares an author relation with an "of" and "is" attribute. However, the relation is not available in a Cedar Mesa variable in the property case, so operations such as RelationSubset cannot be used. Therefore most non-trivial applications will generally not use DeclareProperty, but will still use operations such as SetP by using one of the relation's attributes. For example, if joe is a person entity and book is a document entity, one can write:

SetP[joe, authorOf, book]

or, equivalently,

SetP[book, authorIs, joe]

We now have one Document entity per document, plus, for each document, one author relationship per author of that document. Conversely, we have one Person entity per person, and one author relationship per document the person authored. Of course, these are one and the same author relationships referencing the Person and Document entities. Figure 4-1 illustrates a few such entities and relationships. Each author relationship points to its Document entity via the authorOf attribute, and to its Person entity via the authorIs attribute.

Frequently a database application requires some representation of sets or lists, for example to represent the people in an organization or steps in a procedure. Sets and lists are not primitives of the data model per se; sets and lists are normally represented as relations. For example, in our database the set of authors of a particular document is stored via a set of author relationships referencing the document. The operation

GetPList[book, authorOf]

could be used to retrieve this list for some particular book. If we wish to maintain an ordering on this set, e.g. so that the authors of a book are kept in some particular order for each book, we need to use some list representation. In our Cypress implementation GetPList (and RelationSubset) return relationships in the same order they were created by SetP, SetPList, or DeclareRelship, so that a client may maintain an order by the order of calls. A variant of SetF and SetP is under consideration that allows the client to specify where new relationships should be placed in the ordering of relationships referencing a particular entity. Another alternative, the one conventionally used in the Relational model, is to define another attribute to the relation specify position in the ordering:

authorOrder: Attribute = DeclareAttribute[author, "order", IntType];

Using an ordering attribute is usually a better solution than depending on the semantics of the Cypress implementation's ordering, as it makes the ordering explicit in the relation. In databases where a large number of relationships are expected to refer to the same entity, it is also more space efficient in our implementation.

If an authorOrder attribute is defined, the client may wish to redefine the authorOf attribute so that links (pointers) are not maintained between the Document entities and author relationships, instead defining a more space-efficient B-tree index on the [authorOf, authorOrder] pair:

authorOf: Attribute = DeclareAttribute[
relation: author, name: "of", type: Document, link: FALSE];

authorIndex: Index = DeclareIndex[author, LIST[[authorOf, authorOrder]]];

The Cypress implementation will use this index to process any call of the form

RelationSubset[author, LIST[[authorOf, x]]

This call to RelationSubset will therefore enumerate authors of document x sorted by authorOrder. Cypress will also use the index in processing for GetPList[..., authorOf] as GetPList uses RelationSubset.

This solution is also somewhat less than perfect, as it depends upon the fact that the Cypress implementation orders relationships when an index exist; but indices are not intended to change the semantics of the operations, only to improve performance. Probably the best solution, if the ordering is important to the semantics of a database application, is to represent a list by a binary "next" relation connecting the entities in an ordering.

Documents have other interesting properties. Some of these, for example the date on which the document was produced, are in one-to-one correspondence with documents. Such properties can be defined by specifying a relation or property as being keyed on the document:

publDate: Property = DeclareProperty["publDate", Document, RopeType, Key];

We are using the convention that domain names are capitalized and relation, attribute, and property names are not capitalized, both for the Cedar Mesa variable names and in the names used in the database system itself. If and when the database system is better integrated with Cedar Mesa, the Cedar and database names will be one and the same.

We might wish to include additional information for particular kinds of documents, for example conference papers. Conference papers may participate in the same relations as other documents. For example, they have authors. In addition, we may want to define relations in which only conference papers may participate, for example a presentation relation which defines who presented the paper, and where. We can define a conference paper to be a sub-domain of documents, and define relations which pertain specifically to conference papers:

ConferencePaper: Domain = DeclareDomain["ConferencePaper"];
... DeclareSubType[of: Document, is: ConferencePaper]; ...
Conference: Domain = DeclareDomain["Conference"];
presentation: Relation = DeclareRelation["presentation"];
presentationOf: Attribute = DeclareAttribute["of", presentation, ConferencePaper];
presentationAt: Attribute = DeclareAttribute["at", presentation, Conference];
presentationBy: Attribute = DeclareAttribute["by", presentation, Person];

Figure 4-2 illustrates a fragment of a database using this extended design.

The reader will note that we have defined our database schema in the functionally irreducible form described in Section 2.7: i.e., the relations have as few attributes as possible so as to represent atomic facts. This normalization is not necessary in the design of a schema, but often makes the database easier to understand and use, and avoids anomalies in data updates as a result of redundantly storing the same information. Note that the presentation relation is an example of a functionally irreducible relation that is not binary. It cannot be decomposed into smaller relations without losing information or introducing artificial entities to represent the presentations themselves.

What other information should be present in our document database? Subject keywords would certainly be useful. Since one document will generally have many associated keywords, we would introduce another relation, say docKeyword, to represent the new information. Should keywords be entities? Again there is a tradeoff, but the argument for entities seems persuasive: limiting the range of keywords increases the value of the database for retrieval. The keyword entities could also participate in relationships with a dictionary of synonyms, Computing Reviews categories, etc.

This is certainly not a complete design, and the reader is encouraged to fit his or her own ideas into the database framework.

CypressFig6.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in

4.3 Example

The following program defines a small schema and database of persons and documents. It illustrates the use of most of the procedures defined in Section 3.

DocTestImpl: PROGRAM
IMPORTS DB, IO, Rope =

BEGIN OPEN IO, DB

tty: IO.Handle← CreateViewerStreams["VLTest1Impl.log"].out;

Person, Conference: Domain;
Thesis, ConferencePaper, Document: Domain;
author: Relation;
authorOf, authorIs: Attribute;
presentation: Relation;
presentationOf, presentationBy, presentationDate: Attribute;
publDate: Attribute;
rick, mark, nori: --Person-- Entity;
cedarPaper, cypressDoc, thesis: --Document-- Entity;

Initialize: PROC =
BEGIN
tty.PutF["Defining data dictionary...\n"];
-- Declare domains and make ConferencePapers and Theses be subtypes of Document:
Person← DeclareDomain["Person"];
Conference← DeclareDomain["Conference"];
Document← DeclareDomain["Document"];
ConferencePaper← DeclareDomain["ConferencePaper"];
Thesis← DeclareDomain["Thesis"];
DeclareSubType[of: Document, is: ConferencePaper];
DeclareSubType[of: Document, is: Thesis];
-- Declare publDate property of Document
publDate← DeclareProperty["publDate", Document, IntType];
-- Declare author relation between Persons and Documents
author← DeclareRelation["author"];
authorOf← DeclareAttribute[author, "of", Document];
authorIs← DeclareAttribute[author, "is", Person];
-- Declare presentation relation
presentation← DeclareRelation["presentation"];
presentationOf← DeclareAttribute[presentation, "of", Document];
presentationBy← DeclareAttribute[presentation, "by", Person];
presentationAt← DeclareAttribute[presentation, "at", Conference];
END;

InsertData: PROC =
BEGIN t: Relship;
tty.PutF["Inserting data...\n"];
cedarPaper← DeclareEntity[ConferencePaper, "The Cedar DBMS"];
cypressDoc← DeclareEntity[Document, "Cypress DB Concepts & Facilities"];
thesis← DeclareEntity[Thesis, "An Analysis of Priority Queues"];
sigmod← DeclareEntity[Conference, "SIGMOD 81"];
rick← DeclareEntity[Person, "Rick Cattell"];
mark← DeclareEntity[Person, "Mark Brown"];
-- Note we can create entity and then set name...
nori← DeclareEntity[Person];
ChangeName[nori, "Nori Suzuki"];
-- Data can be assigned with SetP, SetF, or DeclareRelship's initialization list:
t← DeclareRelship[presentation,, NewOnly];
SetF[t, presentationtOf, cedarPaper];
SetF[t, presentationBy, mark];
SetF[t, presentationAt, sigmod];
[]← SetPList[cypressDoc, authorIs, LIST[rick, mark]];
-- the Cedar notation LIST[ ... ] defines a list
[]← SetPList[cedarPaper, authorIs, LIST[rick, mark, nori]];
[]← DeclareRelship[author, LIST[[authorOf, thesis], [authorIs, mark]]];
[]← SetP[cypressDoc, publDate, I2V[1982]];
[]← SetP[thesis, publDate, I2V[1977]];
-- the I2V[...] calls needed because Cedar Mesa does not yet coerce INT to REF ANY
-- Check that thesis can't be presented at conference:
ok← FALSE;
t← DeclareRelship[presentation];
SetF[t, presentationOf, thesis
! MismatchedAttributeValueType => {ok ← TRUE; CONTINUE}];
IF NOT ok THEN ERROR;
END;

DestroySomeData: PROCEDURE =
-- Destroy one person entity and all frog entities
BEGIN flag: BOOL← FALSE;
tty.Put[char[CR], rope["Deleting Rick from database..."], char[CR]];
DestroyEntity[DeclareEntity[Person, "Frank Baz", OldOnly]];
DestroyDomain[Frog];
END;

PrintDocuments: PROC =
-- Use DomainSubset with no constraints to enumerate all Documents
BEGIN
doc: -- Document -- Entity;
authors: LIST OF Value;
es: EntitySet;
tty.PutF["Documents:\n\n"];
tty.PutF["Title authors\n"];
es← DomainSubset[Document];
WHILE (doc← NextEntity[es])#NIL DO
tty.PutF["%g ", rope[GetName[doc]]];
authors← GetPList[doc, authorIs];
FOR al: LIST OF Entity← NARROW[authors], al.rest UNTIL al=NIL DO
tty.PutF["%g ", rope[GetName[al.first]]]] ENDLOOP;
ENDLOOP;
ReleaseEntitySet[es];
END;

PrintPersonsPublications: PROC [pName: ROPE] =
-- Use RelationSubset to enumerate publications written by person
BEGIN
p: Person← DeclareEntity[Person, pName, OldOnly];
authorT: --author-- Relship;
rs: RelshipSet;
first: BOOL← TRUE;
IF p=NIL THEN
{tty.PutF["%g is not a person!", rope[pName]]; RETURN};
tty.PutF["Papers written by %g are:\n", rope[pName]];
rs← RelationSubset[author, LIST[[authorIs, p]]];
WHILE (authorT← NextRelship[rs])#NIL DO
IF first THEN first← FALSE ELSE tty.Put[rope[", "]];
tty.Put[rope[GetFS[authorRS, authorOf]]];
ENDLOOP;
tty.PutF["\n"];
ReleaseRelshipSet[rs];
END;

tty.Put[rope["Creating database..."], char[CR]];
Initialize[];
DeclareSegment["[Local]Test", $Test, 1,, NewOnly];
OpenTransaction[$Test];
Initialize[];
InsertData[];
PrintDocuments[];
PrintPersonsPublications["Mark Brown"];
DestroySomeData[];
PrintDocuments[];
CloseTransaction[TransactionOf[$Test]];
tty.Close[];

END.