CYPRESS DOCUMENTATION CYPRESS DOCUMENTATION CYPRESS DOCUMENTATION Cypress Database System Documentation R. G. G. Cattell Release as [Indigo]Documentation>CypressDoc.Tioga Written by R. G. G. Cattell, June 13, 1983 Last edited by Beach, February 9, 1984 5:40:26 pm PST by Jim Donahue, February 10, 1984 12:16:55 pm PST Abstract: This document describes the Cypress system, and is aimed at potential writers of database applications in the Cedar programming environment. It should be accurate as of the date above, and is recommended as better documentation than CSL Report 83-4, from which it was derived. Suggestions are welcomed on both the design of Cypress and its exposition in this document. We will assume little knowledge of database systems, and little knowledge of Cedar. We will not explain the motivation for this particular design of Cypress; see the CSL Report for that. You should also consult the documentation for database tools not described here; see a database wizard for details. XEROX Xerox Corporation Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, California 94304 DRAFT  For Internal Xerox Use Only  DRAFT TABLE OF CONTENTS 1. Introduction 1 2. Cypress data model concepts 3 2.1 Data independence 2.2 Basic primitives 2.3 Names and keys 2.4 Basic operations 2.5 Aggregate operations 2.6 Convenience operations 2.7 Normalization 2.8 Segments 3. Model level interface 19 3.1 Types 3.2 Transactions and segments 3.3 Data schema definition 3.4 Basic operations 3.5 Query operations 3.6 System domains and relations 3.7 Errors 4. Application example 39 4.1 A database application 4.2 Schema design 4.3 Example program 1. Introduction INTRODUCTION INTRODUCTION Cypress was motivated by the need for database management facilities in Cedar. Its goals are (1) convenience, making it easy to build new database applications in Cedar, and (2) reasonable performance for a wide class of applications. If you are unsure whether an application you have in mind is appropriate, talk to someone who has built one. You should also consult others to see what auxiliary documentation and folklore is available. The conceptual data structure provided by a database system, analogous to the Mesa type system, is called a data model. The data model specifies how data may be structured and accessed; thus the data model is what you need to know in order to use the system. The Cypress data model described herein includes features that are accepted in some form in a number of models in the literature, particularly those that seemed most useful for database applications we envision in CSL. The data model is described conceptually in Section 2. In Section 3 we describe the Cedar interface to Cypress: DB.mesa. Finally, Section 4 provides a simple example of Cypress use. We cover only the abstract model and the DB interface, not its implementation or rationale. The reader interested in these should consult CSL Report 83-4. A set of formal axioms for the model can also be found there. Clients interested in tools for examining or modifying databases, either as an end-user or for debugging an application program, should consult the database tools documentation: [Indigo]Documentation>SquirrelDoc.tioga and .press, where ® refers to the current cedar release version number such as 5.1. 2. Cypress data model concepts CYPRESS DATA MODEL CONCEPTS CYPRESS DATA MODEL CONCEPTS In this section, we give an informal description of the Cypress data model. We describe the particulars of the Cedar interface in Section 3. 2.1 Data independence We deal here with the conceptual data model: the logical primitives for data access and data type definition. This should be carefully distinguished from the physical data storage and access mechansisms. The physical representation of data is hidden as much as possible from the database client to facilitate data independence, the guarantee that a user's program will continue to work (perhaps with a change in efficiency) even though the physical data representation is redesigned. For any particular database using our conceptual data model, the actual specification of the types of data in the database, using the primitives the model provides, is termed the data schema. Note that a mapping must be provided between the conceptual data model and the physical representation, either automatically or with further instruction from the client; we will do some of both. The logical to physical mapping is intimately associated with the performance of the database system as viewed by the user performing operations at the conceptual level. 2.2 Basic primitives Three basic primitives are defined in the model: an entity, datum, and relationship. An entity represents an abstract or concrete object in the world: a person, an organization, a document, a product, an event. In programming languages and knowledge representation entities have variously been referred to as atoms, symbols, and nodes. A datum, unlike an entity, represents literal information such as times, weights, part names, or phone numbers. Character strings and integers are possible datum types. It is a policy decision whether something is represented as an entity or merely a datum: e.g., an employee's spouse may be represented in a database system as a datum (the spouse's name), or the spouse may be an entity in itself. The database system provides a higher level of logical integrity checking for entities than for datum values, as we will see later: unique entity identifiers, checks on entity types, and removal of dependent data upon entity deletion. We shall discuss the entity/datum choice further in Section 4.2. We will use the term value to refer to something that can be either a datum or an entity. In many programming languages, there is no reason to distinguish entity values from datum values. Indeed, most of the Cypress operations deal with any kind of value, and some make it transparent to the caller whether an entity or datum value is involved. The transparent case makes relational operations possible in our model, as we will see in Section 2.5. A relationship is a tuple whose elements are [entity or datum] values. We refer to the elements (fields) of relationships by name instead of position. These names for positions are called attributes. Note that we have separated the representatives of unique objects (entities) from the representation of information about objects (relationships), unlike some object-oriented programming languages and data models. Therefore an entity is not an "object" (or "record") in the programming language sense, although entities are representatives of real-world objects. We also define entity types, datum types, and relationship types. These are called domains, datatypes, and relations, respectively. We make use of these three types through one fundamental type constraint: every relationship in a relation has the same attributes, and the values associated with each attribute must be from a pre-specified domain or datatype. One might think of a relation as a "record type" in a programming language, although relations permit more powerful operations than record types. As an example, consider a member relation that specifies that a given person is a member of a given organization with a given job title, as in the following figure. The person and organization might be entities, while the title might be a string datum. We relax the fundamental type constraint somewhat in allowing a lattice of types of domains: a particular value may then belong to the pre-specified domain or one of its sub-domains. For example, one could be a member of a University, a Company, or any other type of Organization one chooses to define. Other relations, e.g. an "offers-course" relation, might apply only to a University. CypressFig1.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in Relationships represent facts about the world. A relation (a set of relationships) represents a kind of relationship in which entities and values can participate. We will schematically represent that John Smith is a member of Acme company with title "manager" by drawing lines for the relationship (labelled with the relation and its attributes), a circle for each entity, and a box for the datum: CypressFig2.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in One should normally think of relations as types, not sets. A relation can be defined as the set of all relationships of its type, however, and thus can be treated as a relation in the normal mathematical sense. Note that we differ from mathematical convention in another minor point: the use of attributes to refer to tuple elements by name instead of position. Reference by position is therefore not necessary and is in fact not permitted. We can often omit attribute names without ambiguity since the types of participating entities imply their role in a relationship. However, they are necessary in the general case; e.g., a boss relation between a person and a person requires the directionality to define its semantics. We can summarize the six basic primitives of the data model in tabular form. Familiarity with these six terms is essential to understanding the remainder of this document: Type Instance Instance Represents Domain Entity physical or abstract object Datatype Datum numerical measurement or symbolic tag Relation Relationship logical correspondence among objects and values Our terminology might be more consistent if we called a domain an "entity type," and a relation a "relationship type." Instead we have compromised on the terms most widely used in the literature for all six of the basic concepts. The reader will find the remainder of this document much more understandable if these six terms are commited to memory before proceeding. 2.3 Names and keys The data model also provides primitives to specify uniqueness of relationships and entities. For relationships, this is done with keys, and for entities, with names. A key for a relation is an attribute or set of attributes whose value uniquely identifies a relationship in the relation. No two relationships may have the same values for a key attribute or attribute set (a relation may have more than one key, in which case relationships must have unique values for all keys). A name acts similarly for a domain. The name of an entity must uniquely specify it in its domain. Consider a spouse relation, a binary relation between a person and a person. Both of its attributes are keys, because each person may have only one spouse (if we choose to define spouse this way!). For an author relation, neither the person nor the document are keys, since a person may author more than one document and a document may have more than one author. The person and document attributes together would comprise a key, however: there should be only one tuple for a particular person-document pair. CypressFig3.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in <==SubDirectory>SegmentName.segment". Each segment has a unique name, the name of a Cedar ATOM which is used to refer to it in Cypress operation. The name of the Cedar ATOM is normally, though not necessarily, the same as that of the file in which it is stored, except the extension ".segment" and the prefix specifying the location of the file is omitted in the ATOM. If the file is on the local file system, its name is preceded by "[Local]". For example, "[Local]Foo" refers to a segment file on the local disk named Foo.database; "[Alpine]Baz" refers to a segment named Baz.segment on the directory on the Alpine server. It is generally a bad idea to access database segments other than through the database interface. However, because segments are physically independent and contain no references to other files by file identifier or explicit addresses within files, the segment files may be moved from machine to machine or renamed without effect on their contents. If a segment file in a set of segments comprising a client database is deleted, the others may still be opened to produce a database missing only that segment's entities and relationships. A segment is defined by the operation DeclareSegment: DeclareSegment: PROC[ filePath: ROPE, segment: Segment, number: INT_ 0, readOnly: BOOL_ FALSE, version: Version_ OldOnly, nBytesInitial, nBytesPerExtent: LONG CARDINAL_ 32768] RETURNS [Segment]; Segment: TYPE = ATOM; Version: TYPE = {NewOnly, OldOnly, NewOrOld}; The version parameter to DeclareSegment defaults to OldOnly to open an existing file. The signal IllegalFileName is generated if the directory or machine name is missing from fileName, and FileNotFound is generated at the time a transaction is opened on the segment if the file does not exist. If version NewOnly is passed, a new segment file will be created, erasing any existing one. In this case, a number assigned to the segment by the database administrator must also be passed. This number hack is necessitated by our current implementation of segments (it specifies the section of the database address space in which to map this segment). Please bear with us. Finally, the client program can pass version=NewOrOld to open a new or existing segment file; in this case the segment number must also be passed, of course. The other parameters to DeclareSegment specify properties of the segment. If readOnly=TRUE, then writes are not permitted on the segment; any attempt to invoke a procedure which modifies data will generate the error ProtectionViolation. nBytesInitial is the initial size to assign to the segment, and nBytesPerExtent is the incremental increase in segment size used when more space is required for data in the file. For convenience, a call is available to return the list of segments that have been declared in the current Cypress session: GetSegments: PROC RETURNS[LIST OF Segment ]; A transaction is associated with a segment by using OpenTransaction: OpenTransaction: PROC[ segment: Segment, userName, password: ROPE_ NIL, useTrans: Transaction_ NIL ]; If useTrans is NIL then OpenTransaction establishes a new connection and transaction with the corresponding (local or remote) file system. Otherwise it uses the supplied transaction. The same transaction may be associated with more than one segment by calling OpenTransaction with the same useTrans argument for each. The given user name and password, or by default the logged in user, will be used if a new connection must be established. Any database operations upon data in a segment before a transaction is opened or after a transaction abort will invoke the Aborted signal. The client should catch this signal on a transaction abort, block any further database operations and wait for completion of any existing ones. Then the client may re-open the aborted transaction by calling OpenTransaction. When the remote transaction is successfully re-opened, the client's database operations may resume. Note that operations on data in segments under different transactions are independent. Normally there will be one transaction (and one or more segments) per database application program. A client may find what transaction has been associated with a particular segment by calling TransactionOf: PROC [segment: Segment] RETURNS [Transaction]; Transactions may be manipulated by the following procedures: MarkTransaction: PROC[trans: Transaction]; AbortTransaction: PROC [trans: Transaction]; CloseTransaction: PROC [trans: Transaction]; MarkTransaction commits the current database transaction, and immediately starts a new one. User variables which reference database entities or relationships are still valid. AbortTransaction aborts the current database transaction. The effect on the data in segments associated with the segment is as if the transactions had never been started, the state is as it was just after the OpenTransaction call or the most recent MarkTransaction call. Any attempts to use variables referencing data fetched under the transaction will invoke the NullifiedArgument error. A call to OpenTransaction is necessary to do more database operations, and all user variables referencing database items created or retrieved under the corresponding transaction must be re-initialized (they may reference entities or relationships that no longer exist, and in any case they are marked invalid by the database system). A simple client program using the database system might have the form, then: Initialize[]; DeclareSegment["[Local]Test", $Test]; OpenTransaction[$Test]; ... ... database operations, including zero or more MarkTransaction calls ... ... CloseTransaction[TransactionOf[$Test]]; 3.3 Data schema definition The definition of the client's data schema is done through calls to procedures defined in this section. The data schema is represented in a database as entities and relationships, and although updates to the schema must go through these procedures to check for illegal or inconsistent definitions, the schema can be read via the normal data operations described in the next section. Each domain, relation, etc., has an entity representative that is used in data operations which refer to that schema item. For example, we pass the domain entity when creating a new entity in the domain. The types of schema items are: Domain, Relation, Attribute, Datatype, Index, IndexFactor: TYPE = Entity; Of course, since the schema items are entities, they must also belong to domains; there are pre-defined domains, which we call system domains, in the interface for each type of schema entity: DomainDomain, RelationDomain, AttributeDomain, DatatypeDomain, IndexDomain: Domain; There are also pre-defined system relations, which contain information about sub-domains, attributes, and indices. Since these are not required by the typical (application-specific) database client, we defer the description of the system relations to Section 3.6. In general, any of the data schema may be extended or changed at any time; i.e., data operations and data schema definition may be intermixed. However, there are a few specific ordering constraints on schema definition we will note shortly. Also, the database system optimizes for better performance if the entire schema is defined before any data are entered. The interactive schema editing tool described in the database tools documentation allows the schema to be changed regardless of ordering constraints and existing data, by recreating schema items and copying data invisibly to the user when necessary. All the data schema definition operations take a Version parameter which specifies whether the schema element is a new or existing one. The version defaults to allowing either (NewOrOld): i.e., the existing entity is returned if it exists, otherwise it is created. This feature avoids separate application code for creating the database schema the first time the application program is run. DeclareDomain: PROC [name: ROPE, segment: Segment, version: Version_ NewOrOld, estRelations: INT_ 5] RETURNS [d: Domain]; DeclareSubType: PROC[sub, super: Domain]; DeclareDomain defines a domain with the given name in the given segment and returns its representative entity. If the domain already exists and version=NewOnly, the signal AlreadyExists is generated. If the domain does not already exist and version=OldOnly, then NIL is returned. The parameter estRelations is used to estimate the largest number of relations in which entities of this domain are expected to participate. The client may define one domain to be a subtype of another by calling DeclareSubType. This permits entities of the subdomain to participate in any relations in which entities of the superdomains may participate. All client DeclareSubType calls should be done before declaring relations on the superdomains (to allow some optimizations). The error MismatchedSegment is generated if the sub-domain and super-domain are not in the same segment. DeclareRelation: PROC [ name: ROPE, segment: Segment, version: Version_ NewOrOld] RETURNS [r: Relation]; DeclareAttribute: PROC [ r: Relation, name: ROPE, type: ValueType_ NIL, uniqueness: Uniqueness _ None, length: INT_ 0, link: {Linked, Unlinked, Colocated, Remote}_ yes, version: Version_ NewOrOld] RETURNS[a: Attribute]; Uniqueness: TYPE = {NonKey, Key, KeyPart, OptionalKey}; DeclareRelation defines a new or existing relation with the given name in the given segment and returns its representative entity. If the relation already exists and version=NewOnly, the signal AlreadyExists is generated. If the relation does not already exist and version=OldOnly, then NIL is returned. DeclareAttribute is called once for each attribute of the relation, to define their names, types, and uniqueness. If version=NewOrOld and the attribute already exists, Cypress checks that the new type, uniqueness, etc. match the existing attribute. The error MismatchedExistingAttribute is generated if there is a discrepancy. The attribute name need only be unique in the context of its relation, not over all attributes. Note this is the only exception to the data model's rule that names be unique in a domain. Also note that we could dispense with DeclareAttribute altogether by passing a list into the DeclareRelation operation; we define a separate procedure for programming convenience. The attribute type should be a ValueType, i.e. it may be one of the pre-defined types (IntType, RopeType, BoolType, AnyDomainType) or the entity representative for a domain. For pre-defined types, the actual values assigned to attributes of the relationship instances of the relation must have the corresponding type: REF INT, ROPE, REF BOOL, or Entity. If the attribute has a domain as type, the attribute values in relationships must be entities of that domain or some sub-domain thereof. The type is permitted to be one of the pre-defined system domains such as the DomainDomain, thereby allowing client-defined extensions to the data schema (for example, a comment for each domain describing its purpose). The attribute uniqueness indicates whether the attribute is a key of the relation. If its uniqueness is NonKey, then the attribute is not a key of the relation. If its uniqueness is OptionalKey, then the system will ensure that no two relationships in r have the same value for this attribute (if a value has been assigned). The error NonUniqueKeyValue is generated if a non-unique key value results from a call to the SetP, SetF, SetFS, or CreateRelship procedures we define later. Key acts the same as OptionalKey, except that in addition to requiring that no two relationships in r have the same value for the attribute, it requires that every entity in the domain referenced by this attribute must be referenced by a relationship in the relation: the relationships in the relation and the entities in the domain are in one-to-one correspondence. Finally, if an attribute's uniqueness is KeyPart, then the system will ensure that no two relationships in r have the same value for all key attributes of r, though two may have the same values for some subset of them. The length and link arguments to DeclareAttribute have no functional effect on the attribute, but are hints to the database system implementation. For RopeType fields, length characters will be allocated for the string within the space allocated for a relationship in the database. There is no upper limit on the size of a string-valued attribute; if it is longer than length, it will be stored separately from the relationship with no visible effect except for the performance of database applications. The link field is used only for entity-valued fields; it suggests whether the database system should link together relationships which reference an entity in this attribute. In addition, it can suggest that the relationships referencing an entity in this attribute be physically co-located as well as linked. Again, its logical effect is only upon performance, not upon the legal operations. DestroyRelation: PROC[r: Relation]; DestroyDomain: PROC[d: Domain]; DestroySubType: PROC[sub, super: Domain]; Relations, domains, and subdomain relationships may be destroyed by calls to the above procedures. Destroying a relation destroys all of it relationships. Destroying a domain destroys all of its entities and also any relationships which reference those entities. Destroying a sub-domain relationship has no effect on existing domains or their entities; it simply makes entities of domain sub no longer eligible to participate in the relations in which entities of domain super can participate. Existing relationships violating the new type structure are allowed to remain. Existing relations and domains may only be modified by destroying them with the procedures above, with one exception: the operation ChangeName (described in Section 3.4) may be used to change the name of a relation or domain. DeclareIndex: PROC [ relation: Relation, indexedAttributes: AttributeList, version: Version]; DeclareIndex has no logical effect on the database; it is a performance hint, telling the database system to create a B-Tree index on the given relation for the given indexedAttributes. The index will be used to process queries more efficiently. Each index key consists of the concatenated values of the indexedAttributes in the relationship the index key references. For entity-valued attributes, the value used in the key is the string name of the entity. The version parameter may be used as in other schema definition procedures, to indicate a new or existing index. If any of the attributes are not attributes of the given relation then the signal IllegalIndex is generated. The optimal use of indices, links, and colocation, as defined by DeclareIndex and DeclareAttribute, is complex. It may be necessary to do some space and time analysis of a database application to choose the best trade-off, and a better trade-off may later be found as a result of unanticipated access patterns. Note, however, that a database may be rebuilt with different links, colocation, or indices, and thanks to the data independence our interface provides, existing programs will continue to work without change. If a relation is expected to be very small (less than 100 relationships), then it might reasonably be defined with neither links nor indices on its attributes. In the typical case of a larger relation, one should examine the typical access paths: links are most appropriate if relationships that pertain to particular entities are involved, indices are more useful if sorting or range queries are desired. B-tree indices are always maintained for domains; that is, an index contains entries for all of the entities in a domain, keyed by their name, so that sorting or lookup by entity name is quick. String comparisons are performed in the usual lexicographic fashion. DeclareProperty: PROC [ relationName: ROPE, of: Domain, type: ValueType, uniqueness: Uniqueness_ None, version: Version_ NewOrOld] RETURNS [property: Attribute]; DeclareProperty provides a shorthand for definition of a binary relation between entities of the domain "of" and values of the specified type. The definitions of type and uniqueness are the same as for DeclareAttribute. A new relation relationName is created, and its attributes are given the names "of" and "is". The "is" attribute is returned, so that it can be used to represent the property in GetP and SetP defined in the next section. 3.4 Basic operations on entities and relationships In this section, we describe the basic operations on entities and relationships; we defer the operations on domains and relations to the next section. A number of error conditions are common to all of the procedures in this section. Since values are represented as REF ANYs, all type checking must currently be done at run-time. The procedures in this section indicate illegal arguments by generating the errors IllegalAttribute, IllegalDomain, IllegalRelation, IllegalValue, IllegalEntity, and IllegalRelship, according to the type of argument expected. The error NILArgument is generated if NIL is passed to any procedure that cannot accept NIL for that argument. The error NullifiedArgument is generated if an entity or relationship is passed in after it has been deleted or rendered invalid by transaction abort or close. DeclareEntity: PROC[ d: Domain, name: ROPE_ NIL, version: Version_ NewOrOld] RETURNS [e: Entity]; DeclareEntity finds or creates an entity in domain d with the given name. The name may be omitted if desired, in which case an entity with a unique name is automatically created. If version is OldOnly and an entity with the given name does not exist, NIL is returned. If version is NewOnly and an entity with the given name already exists, the signal NonUniqueEntityName is generated. DeclareRelship: PROC [ r: Relation, avl: AttributeValueList_ NIL, version: Version_ NewOrOld] RETURNS [Relship]; DeclareRelship finds or creates a relship in r with the given attribute values. If version is NewOnly, a new relship with the given attribute values is generated. If version is OldOnly, the relship in r with the given attribute values is returned if it exists, otherwise NIL is returned. If version is NewOrOld, the relship with the given attribute values is returned if it exists, otherwise one is created. If the creation of a new relship violates the key constraints specified by DeclareAttribute, the signal NonUniqueAttributeValue is generated. DestroyEntity: PROC[e: Entity]; DestroyEntity removes e from its domain, destroys all relationships referencing it, and destroys the entity representative itself. Any client variables that reference the entity automatically take on the null value (Null[e] returns TRUE), and cause error NullifiedArgument if passed to database system procedures. After an entity is destroyed, its old name may be re-used in creating a new one. DestroyRelship: PROC[t: Relship]; DestroyRelship removes t from its relation, and destroys it. Any client variables that reference the relationship automatically take on the null value, and will cause error NullifiedArgument if subsequently passed to database system procedures. SetF: PROC[t: Relship, a: Attribute, v: Value]; SetF assigns the value v to attribute a of relationship t. If the value is not of the same type as the attribute (or a subtype thereof if the attribute is entity-valued), then the error MismatchedAttributeValueType is generated. If a is not an attribute of t's relation, IllegalAttribute is generated. GetF: PROC[t: Relship, a: Attribute] RETURNS [Value]; GetF retrieves the value of attribute a of relationship t. If a is not an attribute of t's relation, error IllegalAttribute is generated. The client should use the V2x routines described in the next section to coerce the value into the expected type. SetFS: PROC [t: Relship, a: Attribute, v: ROPE]; GetFS: PROC[t: Relship, a: Attribute] RETURNS [ROPE]; GetFS and SetFS provide a convenient veneer on top of GetF and SetF that provide the illusion that all relation attributes are string-valued. The effect is something like the Relational data model, and is useful for applications such as a relation displayer and editor that deal only with strings. The semantics of GetFS and SetFS depend upon the actual type of the value v of attribute a: type GetFS returns SetFS assigns attribute to be RopeType the string v the string v IntType v as a decimal string the string as a decimal integer BoolType "TRUE" or "FALSE" true if "TRUE", false if "FALSE" a domain D the entity name the entity with name v ("" if v = NIL) (or NIL if v is null string) AnyDomainType : the entity with given domain, name (or NIL if v is null string) The same signals generated by GetF and SetF, such as IllegalAttribute, can also be generated by these procedures. The string NIL represents the undefined value. The signal NotFound is generated in the last case above if no entity with the given name is found. NameOf: PROC [e: Entity] RETURNS [s: ROPE]; ChangeName: PROC [e: Entity, s: ROPE]; NameOf and ChangeName retrieve or change the name of an entity, respectively. They generate the signal IllegalEntity if e is not an entity. ChangeName should be used with caution. It is not quite equivalent to destroying and re-creating an entity with the new name but the same existing relationships referencing it. ChangeName is considerably faster than that, and furthermore entity-valued variables which reference the entity are not nullified by ChangeName, though they would be by DestroyEntity. These features should be a help, not a hindrance. However, changing an entity name may invalidate references to the entity from outside the segment, e.g. in another segment or in some application-maintained file such as a log of updates. DomainOf: PROC[e: Entity] RETURNS [Domain]; RelationOf: PROC[t: Relship] RETURNS [Relation]; DomainOf and RelationOf can be used to find the entity representative of an entity's domain or a relationship's relation, respectively. The signal IllegalEntity is generated if e is not an entity. The signal IllegalRelship is generated if r is not a relationship. SegmentOf: PROC[e: Entity] RETURNS [Segment]; SegmentOf returns the segment in which an entity is stored. It can be applied to domain, relation, or attribute entities. Eq: PROC [e1: Entity, e2: Entity] RETURNS [BOOL]; Eq returns TRUE iff the same database entity is referenced by e1 and e2. This is not equivalent to the Cedar expression "e1 = e2", which computes Cedar REF equality. If e1 and e2 are in different segments, Eq returns true iff they have the same name and their domains have the same name. Null: PROC [x: EntityOrRelship] RETURNS [BOOL]; Null returns TRUE iff its argument has been destroyed, is NIL, or has been invalidated by abortion of the transaction under which it was created. GetP: PROC [e: Entity, aIs: Attribute, aOf: Attribute_ NIL] RETURNS [Value]; SetP: PROC [e: Entity, aIs: Attribute, v: Value, aOf: Attribute_ NIL] RETURNS[Relship]; GetP and SetP are convenience routines for a common use of relationships, to represent "properties" of entities. Properties allow the client to think of values stored in relationships referencing an entity as if they are directly accessible fields (or "properties") of the entity itself. See the figure on page 15 illustrating properties. GetP finds a relationship whose from attribute is equal to e, and returns that relationship's to attribute. The from attribute may be defaulted if the relation is binary, it is assumed to be the other attribute of to's relation. If it is not binary, the current implementation will find the "first" other attribute, where "first" is defined by the order of the original calls calls to DeclareAttribute. SetP defaults the from attribute similarly to GetP, but operates differently depending on whether from is a key of the relation. Whether it is a key or not, any previous relationship that referenced e in the from attribute is automatically deleted. In either case, a new relationship is created whose from attribute is e and whose to attribute is v. SetP returns the relationship it creates for the convenience of the client. GetP and SetP can generate the same errors as SetF and GetF, e.g. if e is not an entity or to is not an attribute. In addition, GetP and SetP can generate the error IllegalProperty if to and from are from different relations. GetP generates the error MismatchedPropertyCardinality if more than one relationship references e in the from attribute; if no such relationships exist, it returns a null value of the type of the to attribute. SetP allows any number of existing relationships referencing e; it simply adds another one (when from is a key, of course, there will always be one relationship). GetPList: PROC [e: Entity, to: Attribute, from: Attribute_ NIL] RETURNS [LIST OF Value]; SetPList: PROC [e: Entity, to: Attribute, vl: LIST OF Value, from: Attribute_ NIL]; GetPList and SetPList are similar to GetP and SetP, but they assume that any number of relationships may reference the entity e with their from attribute. They generate the signal MismatchedPropertyCardinality if this is not true, i.e. the from attribute is a key. GetPList returns the list of values of the to attributes of the relationships that reference e with their from attribute. Cedar has LISP-like list manipulation facilities. SetPList destroys any existing relationships that reference e with their from attribute, and creates Length[vl] new ones, whose from attributes reference e and whose to attributes are the elements of vl. GetPList and SetPList may generate any of the errors that GetF and SetF may generate, and the error IllegalProperty if to and from are from different relations. Note that the semantics of SetPList are not quite consistent with the semantics of SetP. SetPList replaces the current values associated with a "property" with the new values (i.e., destroys and re-creates relationships); SetP adds a new property value, unless the aOf attribute is a key, in which case it replaces the current value. The semantics are defined in this way because this has proven the most convenient in our application programs. Examples of the use of the property procedures for data access can be found in Section 4.3. Properties are also useful for obtaining information about the data schema. For example, GetP[a, aRelationIs] will return the attribute a's relation, and GetPList[d, aTypeOf] will return all the attributes that can reference domain d. E2V: PROC[e: Entity] RETURNS[v: Value]; B2V: PROC[b: BOOLEAN] RETURNS[v: Value]; I2V: PROC[i: LONG INTEGER] RETURNS[v: Value]; S2V: PROC[s: ROPE] RETURNS[v: Value]; The x2V routines convert the various Cedar types to Values. The conversion is not normally required for ropes and entities since the compiler will widen these into the REF ANY type Value. V2E: PROC[v: Value] RETURNS[Entity]; V2B: PROC[v: Value] RETURNS[BOOLEAN]; V2I: PROC [v: Value] RETURNS[LONG INTEGER]; V2S: PROC [v: Value] RETURNS[ROPE]; The V2x routines convert Values to the various Cedar types. The MismatchedValueType error is raised if the value is of the wrong type. It is recommended that these routines be used rather than user-written NARROWs, as the representation of Values may change. Also, NARROWs of opaque types don't yet work in the Cedar compiler. 3.5 Query operations on domains and relations In this section we describe queries upon domains and relations: operations that enumerate entities or relationships satisfying some constraint. RelationSubset: PROC[ r: Relation, constraint: AttributeValueList_ NIL] RETURNS [RelshipSet]; NextRelship: PROC[rs: RelshipSet] RETURNS [Relship]; PrevRelship: PROC[rs: RelshipSet] RETURNS [Relship]; ReleaseRelshipSet: PROC [rs: RelshipSet]; AttributeValueList: TYPE = LIST OF AttributeValue; AttributeValue: TYPE = RECORD [ attribute: Attribute, low: Value, high: Value_ NIL -- omitted where same as low or not applicable --]; The basic query operation is RelationSubset. It returns a generator of all the relationships in relation r which satisfy a constraint list of attribute values. The relationships are enumerated by calling NextRelship repeatedly; it returns NIL when there are no more relationships. PrevRelship may similarly be called repeatedly to back the enumeration up, returning the previous relationship; it returns NIL if the enumeration is at the beginning. ReleaseRelshipSet should be called when the client is finished with the query. The constraint list may be NIL, in which case all of the relationships in r will be enumerated. Otherwise, relationships which satisfy the concatenation of constraints on attributes in the list will be enumerated. If an index exists on some subset of the attributes, the relationships will be enumerated sorted on the concatenated values of those attributes. For a RopeType, IntType, or TimeType attribute a of r, the contraint list may contain a record of the form [a, b, c] where the attribute value must be greater or equal to b and less than or equal to c to satisfy the constraint. For any type of attribute, the list may contain a record of the form [a, b] where the value of the attribute must exactly equal b. The Cedar ROPE literals "" and "\377" may be used in queries as an infinitely large and infinitely small string, respectively. The signal MismatchedAttributeValueType is generated by RelationSubset if one of the low or high values in the list is of a different type than its corresponding attribute. DomainSubset: PROC[ d: Domain, lowName, highName: ROPE_ NIL, searchSubDomains: BOOL_ TRUE, searchSegment: Segment_ NIL] RETURNS [EntitySet]; NextEntity: PROC[EntitySet] RETURNS [Entity]; PrevEntity: PROC[EntitySet] RETURNS [Entity]; ReleaseEntitySet: PROC[EntitySet]; DomainSubset enumerates all the entities in a domain. If lowName and highName are NIL, the entire domain is enumerated, in no particular order. Otherwise, only those entities whose names are lexicographically greater than lowName and less than highName are enumerated, in lexicographic order. If searchSubDomains is TRUE, subdomains of d are also enumerated. Each subdomain is sorted separately. The searchSegment argument is currently only used if d is one of the system domains, e.g. the Domain domain. It is used to specify which segment to search. Analogously to relation enumeration, NextEntity and PrevEntity may be used to enumerate the entities returned by DomainSubset, and ReleaseEntitySet should be called upon completion. GetDomainRefAttributes: PROC [d: Domain] RETURNS [AttributeList]; This procedure returns a list of all attributes, of any relation defined in d's segment, which reference domain d or one of its superdomains. The list does not include AnyDomainType attributes, which can reference any domain. GetDomainRefAttributes is implemented via queries on the data schema. GetDomainRefAttributes is useful for application-independent tools; most specific applications can code-in the relevant attributes. GetEntityRefAttributes: PROC [e: Entity] RETURNS [AttributeList]; This procedure returns a list of all attributes in which some existing relationship actually references e, including AnyDomainType attributes. 3.6 System domains and relations In this section we describe what one might call the schema schema, the pre-defined system domains and relations which constitute the data schema for client-defined domains and relations. The typical database application writer may skip this section, since the schema declaration operations defined in Section 3.3 are adequate when the data schema is completely defined and known at the time a program is written. The system domains and relations we describe in this section are most useful for general-purpose tools (e.g. for displaying, querying, or dumping any database), where the tools must examine the data schema "on the fly". As noted earlier, the permanent repository for data describing user-defined data in a database is the database's data schema, represented by schema entities and relationships. Schema entities are members of one of the pre-defined system domains: DomainDomain, RelationDomain, DatatypeDomain, and so on. Every client-defined domain, relation, or attribute contains a representative entity in these domains. Client-defined datatypes are not currently permitted, so the only entities in the DataType domain are the pre-defined IntType, RopeType, and BoolType. The information about the client-defined domains and attributes are encoded by relationships in the database. Domains participate in the system relation dSubType, which encodes a domain type hierarchy: dSubType: Relation; dSubTypeOf: Attribute; -- the domain in this attribute is a super-type of dSubTypeIs: Attribute; -- the domain in this attribute The dSubType has one element per direct domain-subdomain relationship, it does not contain the transitive closure of that relation. However, it is guaranteed to contain no cycles. That is, the database system checks that there is no set of domains d1, d2, ... dN, N>1, such that d1 is a subtype of d2, d2 is a subtype of d3, and so on to dN, and d1=dN. The dSubType may define a lattice as opposed to a tree, i.e. the sSubType attribute is not a key of the relation. The information about attributes is encoded as binary relations, one relation for each argument to the DeclareAttribute procedure defining properties of the attribute. The names are easy to remember; for each argument, e.g. Foo, we define the aFoo relation, with attributes aFooOf and aFooIs. The aFooIs attribute is the value of that argument, and the aFooOf attribute is [the entity representative of] the attribute it pertains to. Thus we have the following relations: aRelation: PUBLIC READONLY Relation; -- Specifies attribute - relation correspondence: -- [aRelationOf: KEY Attribute, aRelationIs: Relation] aRelationOf: PUBLIC READONLY Attribute; -- attribute whose relation we are specifying aRelationIs: PUBLIC READONLY Attribute; -- the relation of that attribute aType: PUBLIC READONLY Relation; -- Specifies types of relation attributes: -- [aTypeOf: KEY Attribute, aTypeIs: ValueType] aTypeOf: PUBLIC READONLY Attribute; -- the attribute aTypeIs: PUBLIC READONLY Attribute; -- domain or datatype of the attribute aUniqueness: PUBLIC READONLY Relation; -- Specifies attribute value uniqueness: -- [aUniquenessOf: KEY Attribute, aUniquenessIs: INT LOOPHOLE[Uniqueness]] aUniquenessOf: PUBLIC READONLY Attribute; -- the attribute aUniquenessIs: PUBLIC READONLY Attribute; -- INT for Uniqueness: 0=None, 1=Key, etc. aLength: PUBLIC READONLY Relation; -- Specifies length of attributes: -- [aLengthOf: KEY Attribute, aLengthIs: INT] aLengthOf: PUBLIC READONLY Attribute; -- the attribute aLengthIs: PUBLIC READONLY Attribute; -- INT corresponding to attribute's length aLink: PUBLIC READONLY Relation; -- Specifies whether attribute is linked: -- [aLinkOf: KEY Attribute, aLinkIs: INT] aLinkOf: PUBLIC READONLY Attribute; -- the attribute aLinkIs: PUBLIC READONLY Attribute; -- 0=unlinked, 1=linked, 2 =colocated The final set of system relations pertain to index factors. Each index on a relation is defined to include one or more attributes of a relation. For each attribute in the index, there is an index factor entity. For each index, there is an index entity. Each index factor is associated with exactly one index and exactly one attribute. Indices may have many index factors, however, and an attribute may be associated with more than one index factor, since attributes may participate in multiple indices. The two relations pertaining to indices map indices on to their index factors, and index factors to the attributes they index: ifIndex: PUBLIC READONLY Relation; -- Specifies the index factors for each index -- [ifIndexOf: KEY IndexFactor, ifIndexIs: Index] ifIndexOf: PUBLIC READONLY Attribute; -- the index factor ifIndexIs: PUBLIC READONLY Attribute; -- index of the factor ifAttribute: PUBLIC READONLY Relation; -- Specifies attribute index factor corresponds to -- [ifAttributeOf: KEY IndexFactor, ifAttributeIs: Attribute] ifAttributeOf: PUBLIC READONLY Attribute; -- the index factor ifAttributeIs: PUBLIC READONLY Attribute; -- the attribute this factor represents The relations on attributes, index factors, and domains can be queried with the RelationSubset or GetPList operations. For example, GetP[a, aRelationIs] returns the attribute a's relation. GetPList[r, aRelationOf] returns the relation r's attributes. RelationSubset[dSubType, LIST[[dSubTypeIs, d]]] will enumerate all the dSubType relationships in which d is the subtype. As noted earlier, the data schema (attributes, relations, domains, indices, index factors, and relations pertaining to these) may only be read, not written by the database client. In order to ensure the consistency of the schema, it must be written indirectly through the schema definition procedures: DeclareDomain, DeclareRelation, DeclareAttribute, and DeclareSubType. Attempts to perform updates through operations such as SetP result in the error ImplicitSchemaUpdate. 3.7 Errors When a database system operation invokes an error, the SIGNAL Error is generated, with an error code indicating the type of error that occured. The error code is a Cedar enumerated type: Error: SIGNAL [code: ErrorCode]; ErrorCode: TYPE = { AlreadyExists, -- Entity already exists and client said version=NewOnly BadUserPassword, -- On an OpenTransaction DatabaseNotInitialized, -- Attempt to do operation without calling Initialize FileNotFound, -- No existing segment found with given name IllegalAttribute, -- Attribute not of the given relship's Relation or not an attribute IllegalValueType, -- Type passed DeclareAttribute is not datatype or domain IllegalDomain, -- Argument is not actually a domain IllegalFileName, -- No directory or machine given for segment IllegalEntity, -- Argument to GetP, or etc., is not an Entity IllegalRelship, -- Argument to GetF, or etc., is not a Relship IllegalRelation, -- Argument is not a relation IllegalSegment, -- Segment passed to DeclareDomain, or etc., not yet declared IllegalString, -- Nulls not allowed in ROPEs passed to the database system IllegalSuperType, -- Can't define subtype of domain that already has entities IllegalValue, -- Value is not REF INT, ROPE, REF BOOL, or Entity IllegalValueType, -- Type passed DeclareAttribute is not datatype or domain ImplicitSchemaUpdate, -- Attempt to modify schema with SetP, DeclareEntity, etc. InternalError, -- Impossible internal state (possibly bug or bad database) MismatchedProperty, -- aOf and aIs attribute not from the same relation MismatchedAttributeValueType, -- Value not same type as required (SetF) MismatchedExistingAttribute, -- Existing attribute is different (DeclareAttribute) MismatchedExistingSegment, -- Existing segment is different (DeclareSegment) MismatchedPropertyCardinality, -- Did GetP with aOf that is not a Key MismatchedSegment, -- Attempt to create ref across segment boundary (SetF) MismatchedValueType, -- value passed V2E, V2I, etc. not of expected type MultipleMatch, -- More than one relationship satisfied avl on DeclareRelship. NonUniqueEntityName, -- Entity in domain with that name already exists NonUniqueKeyValue, -- Relship already exists with that value NotFound, -- Version is OldOnly but no such Entity, Relation, or etc found NotImplemented, -- Action requested is not yet implemented NILArgument, -- Attempt to perform operation on NIL argument NullifiedArgument, -- Entity or relationship has been deleted or invalidated ProtectionViolation, -- Read or write to segment not permitted this user. SegmentNotDeclared, -- Attempt to open transaction w/o DeclareSegment ServerNotFound -- File server does not exist or does not respond }; In this report, the expression "generates the error X" means that the SIGNAL Error is generated with code=X. Unless otherwise specified, the client may CONTINUE from the signal, aborting the operation in question. Signals should not be RESUMEd except by a wizard who knows the result of proceeding with an illegal operation. Two special signals are associated with the file system level: Aborted: SIGNAL [trans: Transaction]; Failure: SIGNAL [why: ATOM, server: ROPE]; The Aborted signal can be generated by any database operation, and indicates that the transaction has been aborted. The client must call AbortTransaction and may then call OpenTransaction to proceed. The Failure signal is generated when a transaction cannot be open due to server failure or communication difficulties. 4. Application Example APPLICATION EXAMPLE APPLICATION EXAMPLE This section provides a simple example of the use of Cypress. Section 4.1 introduces the example, a database of documents. Section 4.2 is a discussion of database design: the process of representing abstractions of real-world information structures in a database, somewhat specialized to the data structures available in Cypress. In Section 4.3, a working program is illustrated. Our example is necessarily short; don't expect any startling revelations on these pages. We will try to consider some of the most common cases, however. 4.1 A database application What are the properties of a well-designed database? To a large extent these properties follow from the general properties of databases. For instance, we would like our databases to extend gracefully as new types of information are added, since the existing data and programs are likely to be quite valuable. It may be useful to consider the following point. The distinguishing aspect of information stored in a database system is that at least some of it is stored in a form that can be interpreted by the system itself, rather than only by some application-specific program. Hence, one important dimension of variation among different database designs is in the amount of the database that is system-interpretable, i.e. the kinds of queries that can be answered by the system. As an example of variation in this dimension, consider the problem of designing a database for organizing a collection of Mesa modules. In the present Mesa environment, this database would need to include at least the names of all the definitions modules, program modules, configuration descriptions, and current .Bcd files. A database containing only this information is little more than a file directory, and therefore the system's power to answer queries about information in this database is very limited. A somewhat richer database might represent the DIRECTORY and IMPORTS sections of each module as relationships, so that queries such as "which modules import interface Y?" can be answered by the system. This might be elaborated further to deal with the use of individual types and procedures from a definitions module, and so on. There may be a limit beyond which it is useless to represent smaller objects in the database; if we aren't interested in answering queries like "what procedures in this module contain IF statements?", it may be attractive to represent the body of a procedure (or some smaller naming scope) as a text string that is not interpretable by the database system, even though it is stored in a database. We shall illustrate design ideas with a database of information about documents. Our current facilities, which again are simply file directories, leave much to be desired. The title of a document on the printed page does not tell the reader where the document is stored or how to print a copy. Relationships between different versions of the same basic document are not explicit. Retrievals by content are impossible. Our goal here is not to solve all of these problems, but to start a design that has the potential of dealing with some of them. 4.2 Schema design Each document in our example database has a title and a set of authors. Hence we might represent a collection of documents with a domain of entities whose name is the title of the document, and an author property specifying the authors: Document: Domain = DeclareDomain["Domain"]; dAuthors: Property = DeclareProperty["author", Document, RopeType]; Here the authors' names are concatenated into a single string, using some punctuation scheme to allow the string to be decoded into the list of authors. This is a very poor database design because it does not allow the system to respond easily to queries involving authors; the system cannot parse the encoded author list. Note that in the above definition authors are strings, so anything is acceptable as an author. This weak typing has some flexibility: the database will never complain that it doesn't know the author you just attached to a certain document. However, the system is not helpful in catching errors when a new document is added to the database. If "Mark R. Brown" is mistakenly spelled "Mark R. Browne", then one of Mark's papers will not be properly retrieved by a later search. A step in the direction of stronger type checking is to provide a separate domain for authors. To represent authors as entities, and to allow a variable number number of authors for a document, a better design would be: Document: Domain = DeclareDomain["Domain"]; Person: Domain = DeclareDomain["Person"]; author: Property = DeclareProperty["author", Document, Person]; Incidentally, in the last line above we define a property rather than relation for brevity. Instead of the author property declaration we could have written: author: Relation = DeclareRelation["author"]; authorOf: Attribute = DeclareAttribute[author, "of", Document]; authorIs: Attribute = DeclareAttribute[author, "is", Person]; The property declaration has exactly the same effect on the database as the relation declaration, since it automatically declares an author relation with an "of" and "is" attribute. However, the relation is not available in a Cedar Mesa variable in the property case, so operations such as RelationSubset cannot be used. Therefore most non-trivial applications will generally not use DeclareProperty, but will still use operations such as SetP by using one of the relation's attributes. For example, if joe is a person entity and book is a document entity, one can write: SetP[joe, authorOf, book] or, equivalently, SetP[book, authorIs, joe] We now have one Document entity per document, plus, for each document, one author relationship per author of that document. Conversely, we have one Person entity per person, and one author relationship per document the person authored. Of course, these are one and the same author relationships referencing the Person and Document entities. Figure 4-1 illustrates a few such entities and relationships. Each author relationship points to its Document entity via the authorOf attribute, and to its Person entity via the authorIs attribute. Frequently a database application requires some representation of sets or lists, for example to represent the people in an organization or steps in a procedure. Sets and lists are not primitives of the data model per se; sets and lists are normally represented as relations. For example, in our database the set of authors of a particular document is stored via a set of author relationships referencing the document. The operation GetPList[book, authorOf] could be used to retrieve this list for some particular book. If we wish to maintain an ordering on this set, e.g. so that the authors of a book are kept in some particular order for each book, we need to use some list representation. In our Cypress implementation GetPList (and RelationSubset) return relationships in the same order they were created by SetP, SetPList, or DeclareRelship, so that a client may maintain an order by the order of calls. A variant of SetF and SetP is under consideration that allows the client to specify where new relationships should be placed in the ordering of relationships referencing a particular entity. Another alternative, the one conventionally used in the Relational model, is to define another attribute to the relation specify position in the ordering: authorOrder: Attribute = DeclareAttribute[author, "order", IntType]; Using an ordering attribute is usually a better solution than depending on the semantics of the Cypress implementation's ordering, as it makes the ordering explicit in the relation. In databases where a large number of relationships are expected to refer to the same entity, it is also more space efficient in our implementation. If an authorOrder attribute is defined, the client may wish to redefine the authorOf attribute so that links (pointers) are not maintained between the Document entities and author relationships, instead defining a more space-efficient B-tree index on the [authorOf, authorOrder] pair: authorOf: Attribute = DeclareAttribute[ relation: author, name: "of", type: Document, link: FALSE]; authorIndex: Index = DeclareIndex[author, LIST[[authorOf, authorOrder]]]; The Cypress implementation will use this index to process any call of the form RelationSubset[author, LIST[[authorOf, x]] This call to RelationSubset will therefore enumerate authors of document x sorted by authorOrder. Cypress will also use the index in processing for GetPList[..., authorOf] as GetPList uses RelationSubset. This solution is also somewhat less than perfect, as it depends upon the fact that the Cypress implementation orders relationships when an index exist; but indices are not intended to change the semantics of the operations, only to improve performance. Probably the best solution, if the ordering is important to the semantics of a database application, is to represent a list by a binary "next" relation connecting the entities in an ordering. Documents have other interesting properties. Some of these, for example the date on which the document was produced, are in one-to-one correspondence with documents. Such properties can be defined by specifying a relation or property as being keyed on the document: publDate: Property = DeclareProperty["publDate", Document, RopeType, Key]; We are using the convention that domain names are capitalized and relation, attribute, and property names are not capitalized, both for the Cedar Mesa variable names and in the names used in the database system itself. If and when the database system is better integrated with Cedar Mesa, the Cedar and database names will be one and the same. We might wish to include additional information for particular kinds of documents, for example conference papers. Conference papers may participate in the same relations as other documents. For example, they have authors. In addition, we may want to define relations in which only conference papers may participate, for example a presentation relation which defines who presented the paper, and where. We can define a conference paper to be a sub-domain of documents, and define relations which pertain specifically to conference papers: ConferencePaper: Domain = DeclareDomain["ConferencePaper"]; ... DeclareSubType[of: Document, is: ConferencePaper]; ... Conference: Domain = DeclareDomain["Conference"]; presentation: Relation = DeclareRelation["presentation"]; presentationOf: Attribute = DeclareAttribute["of", presentation, ConferencePaper]; presentationAt: Attribute = DeclareAttribute["at", presentation, Conference]; presentationBy: Attribute = DeclareAttribute["by", presentation, Person]; Figure 4-2 illustrates a fragment of a database using this extended design. The reader will note that we have defined our database schema in the functionally irreducible form described in Section 2.7: i.e., the relations have as few attributes as possible so as to represent atomic facts. This normalization is not necessary in the design of a schema, but often makes the database easier to understand and use, and avoids anomalies in data updates as a result of redundantly storing the same information. Note that the presentation relation is an example of a functionally irreducible relation that is not binary. It cannot be decomposed into smaller relations without losing information or introducing artificial entities to represent the presentations themselves. What other information should be present in our document database? Subject keywords would certainly be useful. Since one document will generally have many associated keywords, we would introduce another relation, say docKeyword, to represent the new information. Should keywords be entities? Again there is a tradeoff, but the argument for entities seems persuasive: limiting the range of keywords increases the value of the database for retrieval. The keyword entities could also participate in relationships with a dictionary of synonyms, Computing Reviews categories, etc. This is certainly not a complete design, and the reader is encouraged to fit his or her own ideas into the database framework. CypressFig6.press leftMargin: 1 in, topMargin: 1 in, width: 2.125 in, height: .75 in 4.3 Example The following program defines a small schema and database of persons and documents. It illustrates the use of most of the procedures defined in Section 3. DocTestImpl: PROGRAM IMPORTS DB, IO, Rope = BEGIN OPEN IO, DB tty: IO.Handle_ CreateViewerStreams["VLTest1Impl.log"].out; Person, Conference: Domain; Thesis, ConferencePaper, Document: Domain; author: Relation; authorOf, authorIs: Attribute; presentation: Relation; presentationOf, presentationBy, presentationDate: Attribute; publDate: Attribute; rick, mark, nori: --Person-- Entity; cedarPaper, cypressDoc, thesis: --Document-- Entity; Initialize: PROC = BEGIN tty.PutF["Defining data dictionary...\n"]; -- Declare domains and make ConferencePapers and Theses be subtypes of Document: Person_ DeclareDomain["Person"]; Conference_ DeclareDomain["Conference"]; Document_ DeclareDomain["Document"]; ConferencePaper_ DeclareDomain["ConferencePaper"]; Thesis_ DeclareDomain["Thesis"]; DeclareSubType[of: Document, is: ConferencePaper]; DeclareSubType[of: Document, is: Thesis]; -- Declare publDate property of Document publDate_ DeclareProperty["publDate", Document, IntType]; -- Declare author relation between Persons and Documents author_ DeclareRelation["author"]; authorOf_ DeclareAttribute[author, "of", Document]; authorIs_ DeclareAttribute[author, "is", Person]; -- Declare presentation relation presentation_ DeclareRelation["presentation"]; presentationOf_ DeclareAttribute[presentation, "of", Document]; presentationBy_ DeclareAttribute[presentation, "by", Person]; presentationAt_ DeclareAttribute[presentation, "at", Conference]; END; InsertData: PROC = BEGIN t: Relship; tty.PutF["Inserting data...\n"]; cedarPaper_ DeclareEntity[ConferencePaper, "The Cedar DBMS"]; cypressDoc_ DeclareEntity[Document, "Cypress DB Concepts & Facilities"]; thesis_ DeclareEntity[Thesis, "An Analysis of Priority Queues"]; sigmod_ DeclareEntity[Conference, "SIGMOD 81"]; rick_ DeclareEntity[Person, "Rick Cattell"]; mark_ DeclareEntity[Person, "Mark Brown"]; -- Note we can create entity and then set name... nori_ DeclareEntity[Person]; ChangeName[nori, "Nori Suzuki"]; -- Data can be assigned with SetP, SetF, or DeclareRelship's initialization list: t_ DeclareRelship[presentation,, NewOnly]; SetF[t, presentationtOf, cedarPaper]; SetF[t, presentationBy, mark]; SetF[t, presentationAt, sigmod]; []_ SetPList[cypressDoc, authorIs, LIST[rick, mark]]; -- the Cedar notation LIST[ ... ] defines a list []_ SetPList[cedarPaper, authorIs, LIST[rick, mark, nori]]; []_ DeclareRelship[author, LIST[[authorOf, thesis], [authorIs, mark]]]; []_ SetP[cypressDoc, publDate, I2V[1982]]; []_ SetP[thesis, publDate, I2V[1977]]; -- the I2V[...] calls needed because Cedar Mesa does not yet coerce INT to REF ANY -- Check that thesis can't be presented at conference: ok_ FALSE; t_ DeclareRelship[presentation]; SetF[t, presentationOf, thesis ! MismatchedAttributeValueType => {ok _ TRUE; CONTINUE}]; IF NOT ok THEN ERROR; END; DestroySomeData: PROCEDURE = -- Destroy one person entity and all frog entities BEGIN flag: BOOL_ FALSE; tty.Put[char[CR], rope["Deleting Rick from database..."], char[CR]]; DestroyEntity[DeclareEntity[Person, "Frank Baz", OldOnly]]; DestroyDomain[Frog]; END; PrintDocuments: PROC = -- Use DomainSubset with no constraints to enumerate all Documents BEGIN doc: -- Document -- Entity; authors: LIST OF Value; es: EntitySet; tty.PutF["Documents:\n\n"]; tty.PutF["Title authors\n"]; es_ DomainSubset[Document]; WHILE (doc_ NextEntity[es])#NIL DO tty.PutF["%g ", rope[GetName[doc]]]; authors_ GetPList[doc, authorIs]; FOR al: LIST OF Entity_ NARROW[authors], al.rest UNTIL al=NIL DO tty.PutF["%g ", rope[GetName[al.first]]]] ENDLOOP; ENDLOOP; ReleaseEntitySet[es]; END; PrintPersonsPublications: PROC [pName: ROPE] = -- Use RelationSubset to enumerate publications written by person BEGIN p: Person_ DeclareEntity[Person, pName, OldOnly]; authorT: --author-- Relship; rs: RelshipSet; first: BOOL_ TRUE; IF p=NIL THEN {tty.PutF["%g is not a person!", rope[pName]]; RETURN}; tty.PutF["Papers written by %g are:\n", rope[pName]]; rs_ RelationSubset[author, LIST[[authorIs, p]]]; WHILE (authorT_ NextRelship[rs])#NIL DO IF first THEN first_ FALSE ELSE tty.Put[rope[", "]]; tty.Put[rope[GetFS[authorRS, authorOf]]]; ENDLOOP; tty.PutF["\n"]; ReleaseRelshipSet[rs]; END; tty.Put[rope["Creating database..."], char[CR]]; Initialize[]; DeclareSegment["[Local]Test", $Test, 1,, NewOnly]; OpenTransaction[$Test]; Initialize[]; InsertData[]; PrintDocuments[]; PrintPersonsPublications["Mark Brown"]; DestroySomeData[]; PrintDocuments[]; CloseTransaction[TransactionOf[$Test]]; tty.Close[]; END.