Design and Implementation of aRelationship-Entity-Datum Data ModelR. G. G. CattellCSL-83-4 April 1983 [P83-00004]c Copyright Xerox Corporation 1983. All rights reserved.Abstract: The Model level of the Cypress Database Management System, built upon theearlier Cedar DBMS, provides data description and access capabilities at a higher level ofabstraction than the existing system and other conventional DBMS's. In this report wedescribe the design of the Cypress data model and discuss issues in the efficientimplementation of such a model. Cypress incorporates features motivated by experience withlocal database applications. It may be viewed as an integration of a number of existing datamodels; we present the criteria that led to this choice. The Cypress primitives include simpledata values such as strings or integers, entities representing real or abstract objects, andrelationships among entities and/or simple data values. We also provide mechanisms for ahierarchy of types, relational keys, and segmentation of databases into independent files.Cypress allows a conventional relational query language. We argue that our extensions tosimpler data models allow a more powerful and efficient implementation, and we describe theoptimizations Cypress performs. We also discuss some preliminary experience with usertools and applications developed in conjunction with Cypress.CR categories: H.2.1, H.2.2, H.4.1Key words and phrases: Cedar, Cypress, database systems, semantic model, data modelXEROXXerox CorporationPalo Alto Research Center3333 Coyote Hill RoadPalo Alto, California 94304 ^ep [$ V!qX QqrF*s Lts8 E q uF B!9 @.( >m%&7 <8V :1, 7$; 51+ 3g; 12S .9 ,N *L (`= q u quv6 w*+(uX*+*+ *+V T?QR"*`^a TABLE OF CONTENTS1.Introduction11.1Data modelling1.2Cypress and Cedar1.3Design criteria and motivation2.Cypress data model concepts52.1Data independence2.2Basic primitives2.3Names and keys2.4Basic operations2.5Aggregate operations2.6Convenience operations2.7Normalization2.8Segments2.9Augments2.10Views2.11Summary3.Model level interface273.1Types3.2Transactions and segments3.3Data schema definition3.4Basic operations3.5Query operations3.6System domains and relations3.7Errors4. Application example474.1A database application4.2Schema design4.3Example program5. Data model design issues575.1Relations and attributes5.2Entities and domains5.3Entities as relationships5.4Relationships as entities5.5Lists and sets5.6Entity names5.7Keys, normalization, and dependencies5.8Generalization and type hierarchies5.9Access primitives5.10Views, segments, and augments5.11Summary!B^yiZx 8@X! X W! U! S<8@P! O`! M! LX! J! IP! G! FH! D! C@! A! ?d8@= ! ;! :! 8! 6! 5x! 3! 18@/D `- ` ,< `)8@'! & ! $! #! !}! ! t! %! #l! ! d! j Jk6.Data model implementation issues736.1Storage level structures6.2Entities and relationships6.3Values6.4Caching the data schema6.5Links and colocation6.6Surrogate relations6.7Basic operations6.8Aggregate operations6.9Summary7.Database environment and applications877.1Database environment7.2Database tools7.3Database applications7.4Summary8.Status and conclusions1018.1Summary8.2Some results8.3Status and plans8.4Summary9.Annotated bibliography10510.Appendix and index113_xX8@\! [:! Y! X2! V! U*! S! R"! P! NF%8@K! Jj! H! Gb! E 8@B! A.! ?! >&! ;8@9w8@90!'11. Introduction1.1 Data modellingA data model is a scheme for describing the types of data that may be stored in a database: howthese data may be structured, and how they may be accessed. Any particular database consists ofthe data themselves plus a data schema that describes the types of data in terms of the datadescription primitives of the data model. The three data models most widely used in the past arethe network, hierarchical, and relational models. A good summary of these models can be found inComputing Surveys, March 1976.In recent years, data models with more sophisticated representation schemes have been proposed,often called semantic data models. Unfortunately, this more recent work is voluminous and difficultto understand: their terminology is mutually conflicting, the models are complex, and related workwas done concurrently by a number of authors. Furthermore, almost none of the models wereactually implemented. Related problems of data structure specification have also been addressedextensively in the programming language and artificial intelligence literature, as abstract data typemechanisms and knowledge representation languages. A survey of the literature is beyond thescope of this document. An annotated bibliography is included in the last section, however, andTsichritzis & Lochovsky [1982] summarize much of the recent data modelling work.The data model described herein is the result of analyzing the strengths and weakness of a numberof proposed models, and integrates a variety of viewpoints. It includes only those features that areaccepted in some form in a number of models, or proved particularly useful for our databaseapplications. We will call our model the Cypress data model. It might alternately be called theRelationship-Entity-Datum model, after its three basic primitives.The Cypress data model is described in Section 2. Sections 3 and 4 provide a description of theprogrammer's interface to our implementation, and an example of its use. Section 5 contrasts theCypress data model to others in the literature, and describes how we arrived at this particularintegration of data modelling ideas. Section 6 describes issues in the implementation of the Cypressdata model; as none of the models we reference have actually been implemented, this is animportant result of the work. To complete our description of the Cypress work, Section 7 coversexperience with database access tools and applications built upon the system. Sections 8 and 9provide a summary and annotated bibliography. An appendix of formal axioms for the model andan index of important terms are also included.For the sake of clean exposition, Section 2 covers only the abstract model, not its implementation orHYfp \Tq Ur Qps pP OO Mrs p6 K>a I X Fsp BpO @~ sp2 >JR <!9 9/1 7[ 5xD 3C2. 0P ,pH *\ ( P &O)8 #B pY <% C [Y 'Y !? Y S . 3pK =^DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL2rationale. The reader interested only in the basic ideas may read about the data model in Section 2and about the applications it enables in Section 7, without loss of continuity. A potential client ofCypress should read Sections 2, 3, and 4.1.2 Cypress and CedarThe motivation for design and implementation of a database system in the Computer ScienceLaboratory arose from the needs of anticipated and existing database applications running in theCedar Programming Environment. The design of a data model was deferred to the second phase ofthe earlier Cedar Database Managment System (DBMS) project to allow a choice of model aftersome experience with our needs. With the second phase of development, the name of the CedarDBMS was changed to Cypress, to distinguish it from the Cedar DBMS which it replaced, theCedar Programming Environment in which it provides database facilities, and the Cedar MesaProgramming Language in which it is written.The original Cedar DBMS, described by Brown, Cattell, and Suzuki [1980], implemented tupleswhose elements are integers, strings, and references to other tuples. It provides a large virtualmemory of tuples which can be moved or deleted safely, optional B-trees for indexing of tuples, andconcurrent transaction-based access to data over a network. It was built in three levels: the Cache,Storage, and Tuple Levels. The latter (top) level provides primitive query mechanisms, and doeslittle or no checking of the integrity of data type, form, uniqueness, or referents; the addition of theModel Level introduces types to the essentially typeless underlying system. Initial clients of the Cedar DBMS included:1.The CSL Notebook, a database of memoranda from laboratory members,2.PDB, a personal database including notes, bibliographies, addresses, phone numbers, etc.,3.A database of components of large Mesa (Mitchell et al [1976]) systems and their inter-dependencies,4.A database of Mesa source programs decomposed to the level of procedures, types, andvariables,5.A database of data and events in an automated office system.Most of these applications built their own data model on top of the existing system, to enable thedata structures and access mechanisms they required. In some cases, building on top as opposed tovfptF ptpt ptptpXtFpt Mp _!C \R Zf) Sr Op1( MQ K> IP= G%7 D= BO @7, <\p"9 :'b 7M 5+; 3;% 1U44 .K=Op *+ &p`B "sp`Y p`?` Ap` J` p`< pT T  & M L>Q\INTRODUCTION3integrating the data model with the existing system (an option not available to them) led todifficulties with the performance or integrity of the total system. This is not surprising, even thoughthe original system was designed to allow several data modelling choices at a future date; the choiceof data model must influence the choices at all levels of the system, if adequate performance is to beobtained.The addition of a data model to the database system makes possible the development of general-purpose tools. The more data semantics encoded in the database, the more the database system andits associated tools may provide without specific knowledge of an application's semantics. In orderto print out data in a meaningful form, for example, a tool must know which data comprise namesfor objects. In order to abbreviate and/or check the type of input data, a tool must know whattypes of data may be related to what others in what ways. In order to efficiently representpotentially circular pointer structures linearly, a tool needs a specification of allowed datarelationships. And so on.The introduction of a data model also simplifies sharing a single database among multipleapplications. A hierarchy of types allows different applications to have differing perspectives on thesame objects. The logical integrity checks help protect the applications from one another. Moresophisticated data modelling mechanisms allow applications different independent views of the dataor different physical environments (files) for their data. Sharing a database between applications isimportant to avoid redundant, inconsistent representation of the same information, to mutuallybenefit from multiple data input sources, and, most importantly, to present one simple data accessand manipulation mechanism to computer users. This contrasts, for instance, to three separatepersonnel databases (and access tools) for applications dealing with phone numbers, electronic mail,and laboratory bibliographies. Finally, the introduction of the Model Level data model is important to future plans for the nextlevel of the Cypress DBMS, the Query Level. At the Query Level, a database query language willbe implemented to allow access and/or updates to individual data items or data aggregates, in aconcise form amenable to optimization and decoupled operation in a database server on a computernetwork. The Model Level provides the functions to enumerate the database objects satisfying thequeries parsed by the Query Level.In summary, the development of the Model Level of the Cypress DBMS is a follow-on to the Cache,Storage, and Tuple Levels, a prerequisite of the Query Level, and motivated by:1.the perceived needs of, or shortcomings for, future and existing database applications,2.facilitation of shared databases for partial or complete integration of multiple applications,%fpt HYp _B \L ZK Xx9- U R"pR O': M;) KN IP S G\ DQ Bl >pP  <\D# :'8) 7] 5C# 3U 1UK /!O ,?% *r$p &T $aE "--2 W ] I" mp_ O p`W p`^ N g>Q[DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL43.the need for more data semantics to enable useful database utilities, e.g. a browser, and4.the need for a query language for database access. 1.3 Design criteria and motivationHow do we choose a data model? Why should one data model be better than another? In mostcases we can find a representational isomorphism between two models, such that we canautomatically map between the two equivalent representation schemes. However, the models maystill differ in the data access primitives they provide and the ease with which a user can understandand operate within the model. The arguments concerning ease of understanding are generally quitesubjective, however, and will be avoided here except where these arguments are generally well-accepted in the literature.The choices in the development of the Cypress Data Model were made on the basis of two criteria:simplicity and utility. Simplicity means ease of understanding and also representational parsimony,i.e., avoiding two or more mechanisms to represent the same semantics. Utility means the degree towhich the data model avoids writing application-specific code for some function or integrity check;every feature need not be of use to all applications, though the feature should be of negligible costto those that do not require it.Simplicity and utility can of course be mutually conflicting; the result must be a balanceincorporating the features deemed important for most applications, in the simplest form discovered.To make the presentation less confusing, we separate the discussion of the model itself in thesections to follow from the analysis of the simplicity and utility of its features in Section 5 and thediscussion of its implementation in Section 6.vfptF ptpt ptptpXtFpt Mp ^`Y Zp`2.qp Tr" P4pC M< KD I?& Gb<% E-N B >p%; 3 /pA -zE +E^ )P &. M&O?QA52. Cypress data model conceptsIn this section, we give an informal description of the Cypress data model. The evaluation andjustification of the model have been deferred to Section 5. A more formal description of the modelhas been deferred to a future paper (some axioms can be found in the appendix). 2.1 Data independenceWe deal here with the conceptual data model, the logical primitives for data access and data typedefinition. This should be carefully distinguished from the physical data model, the mechanisms weare given for actual storage and access of data. The physical model in the Cypress implementation correspondsto the Storage level. The physical data model is hidden as much as possible from the database client tofacilitate data independence, the guarantee that a user's program will continue to work (perhaps witha change in efficiency) even though the physical data representation is redesigned. For any particular database using the given conceptual and physical models, the actual specificationsof this database using the primitives the models provide are termed the conceptual data schema andphysical data schema. Note that a mapping must be provided between the conceptual and physicallevels, either automatically or with further instruction from the client; we will do some of both.The logical to physical mapping is intimately associated with the performance of the databasesystem as viewed by the user performing operations at the conceptual level. Performance isgenerally not a criteria for choosing between conceptual data models, unless there is no known wayto map the differing conceptual views into the same or equally efficient implementations.2.2 Basic primitivesThree basic primitives are defined in the model: an entity, datum, and relationship.An entity represents an abstract or concrete object in the world: a person, an organization, adocument, a product, an event. In programming languages and knowledge representation entitieshave variously been referred to as atoms, symbols, and nodes. A datum, unlike an entity, representsliteral information such as times, weights, part names, or phone numbers. Character strings andintegers are possible datum types.It is a policy decision whether something is represented as an entity or merely a datum: e.g., anemployee's spouse may be represented in a database system as a datum (the spouse's name), or thespouse may be an entity in itself. The database system provides a higher level of logical integritychecking for entities than for datum values, as we will see later: unique entity identifiers, checks on<\pTsqPpPNc-6KO4pEKrAopsp )?:=sp ='t>:pP8 sp,6"S4p2F6 sp0%#sp-s pK+b)t8%'@L% X "Yrp5s ps ppA<"); XE "pspSD Zdh . \>]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL6entity types, and removal of dependent data upon entity deletion. We shall discuss theentity/datum choice further in Section 4.2.We will use the term value to refer to something that can be either a datum or an entity. In manyprogramming languages, there is no reason to distinguish entity values from datum values. Indeed,most of the Cypress operations deal with any kind of value, and some make it transparent to thecaller whether an entity or datum value is involved. The transparent case makes Relationaloperations possible in our model, as we will see in Section 2.5.A relationship is a tuple whose elements are [entity or datum] values. We refer to the elements(fields) of relationships by name instead of position. These names for positions are called attributes.Note that we have separated the representatives of unique objects (entities) from the representationof information about objects (relationships), unlike some object-oriented programming languagesand data models. Therefore an entity is not an "object" (or "record") in the programming languagesense, although entities are representatives of real-world objects. We discuss the advantages of theentity-relationship distinction along with other data model design issues in Section 5. We also define entity types, datum types, and relationship types. These are called domains,datatypes, and relations, respectively. We make use of these three types through one fundamentaltype constraint: every relationship in a relation has the same attributes, and the values associatedwith each attribute must be from a pre-specified domain or datatype. One might think of a relationas a "record type" in a programming language, although relations permit more powerful operationsthan record types.As an example, consider a member relation that specifies that a given person is a member of a givenorganization with a given job title, as in the following figure. The person and organization might beentities, while the title might be a string datum. We relax the fundamental type constraintsomewhat in allowing a lattice of types of domains: a particular value may then belong to the pre-specified domain or one of its sub-domains. For example, one could be a member of a University,a Company, or any other type of Organization one chooses to define. Other relations, e.g. an"offers-course" relation, might apply only to a University.<==BBpB-.C@CtlJeC)+zRD@:p Cǿ`3dDs?Cc`EufdDpC0tCCC6@C?9C|+5޽ yBDrCtlGbpEd\Dm"CPмa@B@CBÐueCClDhC,B"BP;>%}%~Dq#Cڂ%|-d CZBCW2D@D@CQ2@/ CEDrClC -RdC`CQ2+ACh"ClCCBCK27AD@D`-d CZ,B"3>BmCڂ%~@|CڂCڃCHB";>B] Cl@DbC -cCHBhpBClB CECClG,CZBPB%}CڂDq"CZ%~%|-cBBDD@D@C`0/BCCDrl -cw+ABD $DbCBCZ7ABDD@D`-c:,BmB%}%~Dq#Cڂ%|CHCH*B]BClBCCg ClG-co BBm CEDrClC CZB BB]CڂCڂCZ%~Cڃ-b,BBAD@D`C`A_BBACClDhC-RbBBB Cl@DbC CZ,BBB/D@D@-bBBpB%}CڂDq"CZ%~%|CH*BpBCh"ClCC,cCHCBBm3?CDrl BBB];@Cڂ%~@|CڂCڃ,c:,CZCW4AD@D`A_CQ3AClB CECClGJohn Smith]EϕEu Bk@BBpB-.CCtlLeC)+zRD@:p Cǿ`3dDs?C`#a4EufdDpC0tCCC6@C?9C|+5޽ yBDrCtl"bEd\Dm"CPмa@B@CBÐueCClDhC,B"BP;>%}%~Dq#Cڂ%|&ud)CZBCW2D@D@CQ2@/ CEDrClC &d7C`CQ2+ACh"ClCCBCK27AD@D`&d)CZ,B"3>BmCڂ%~@|CڂCڃCHB";>B] Cl@DbC '!dCHBhpBClB CECClG,CZBPB%}CڂDq"CZ%~%|'DcBBDD@D@C`0/BCCDrl 'Rc+ABD $DbCBCZ7ABDD@D`'DcZ,BmB%}%~Dq#Cڂ%|CHCH*B]BClBCCg ClG'!c(o BBm CEDrClC CZB BB]CڂCڂCZ%~Cڃ&c,BBAD@D`C`A_BBACClDhC&bBBB Cl@DbC CZ,BBB/D@D@&ucBBpB%}CڂDq"CZ%~%|CH*BpBCh"ClCC&Cc(CHCBBm3?CDrl BBB];@Cڂ%~@|CڂCڃ& cZ,CZCW4AD@D`A_CQ3AClB CECClG0aD`D`@@D`@CCCD`C@0aC`BCQ4Cl@CCg ClG,BCK4/D@D@0bBCZBpB 3@CڃCڂCZ%~CڃBpB ;@ "DbC0bFCHCHCBBh3>CClDhC,B$BP;>%}%~Dq#Cڂ%|0biCZBCW2D@D@CQ2@/ CEDrClC 12bwC`CQ2+ACh"ClCCBCK27AD@D`1obiCZ,B$3>BmCڂ%~݀|CڂCڃCHB$;>B] Cl@DbC 1bFCHBhpBClD CECClG,CZBPB%~CڂDq"CZ%~%|1bBBDD@D@C`@/BCCDrl 1a+ABD "DbCBCZ7ABDD@D`1a,BmB%}%~Dq#Cڂ%|CHCH*B]BCl@CCg ClG1ahn BBm CEDrClC CZB BB]CڃCڂCZ%~Cڃ1oaE,BBAD@D`C`A_BBACClDhC12a7BBB Cl@DbC CZ,BBB/D@D@0aEBBpB%~CڂDq"CZ%~%|CH*BpBCh"ClCC0ahCH CBBm3?CDrl B BB];@Cڂ%~݀|CڂCڃ0a,CZCW4AD@D`A_CQ3AClD CECClGJohnMaryJohnTom#[D`D`@@D`@CCCD`C@#[C`BCQ4ClBCCg ClG,BCK4/D@D@#\4BCZBrB 3@CڂCڂCZ%} CڃBrB ;@ $DbC#\fCHCHCBBh3=CClDgC,B$BP;=%}%~}Dq#Cڂ%|$\CZBCW2D@D@CQ2@/ CEDrClB $R\C`CQ2+ACh"ClBCBCK27AD@D`$\CZ,B$3=BmCڂ%~|CڂCڃCHB$;=B] Cl@DbC $\fCHBhpBClB CECClG,CZBPB%}CڂDq#@CZ%} %|$\4BBDD@D@C`0/BCCDrl $[+ABD $DbCBCZ7ABDD@D`$[,BmB%}%~}Dq#Cڂ%|CHCH*B]BClBCCg ClG$[n BBm CEDrClB CZB BB]CڂCڂCZ%} Cڃ$[e,BBAD@D`C`A_BBACClDgC$R[WBBB Cl@DbC CZ,BBB/D@D@$[eBBpB%}CڂDq#@CZ%} %|CH*BpBCh"ClBC#[CHCBBm3?CDrl BBB];@Cڂ%~|CڂCڃ#[,CZCW4AD@D`A_CQ3AClB CECClG-r\D`D`@@D`@CCCD`C@-R\C`BCQ4ClBCCg ClG,BCK4/D@D@-`\TBCZBtB 3@CڂCڂCZ%~CڃBtB ;@ $DbC-\CHCHCBBh3>CClDhC,B"BP;>%}%~Dq#Cڂ%|-\CZBCW2D@D@CQ2@/ CEDrClC -\C`CQ2+ACh"ClCCBCK27AD@D`./\CZ,B"3>BmCڂ%~@|CڂCڃCHB";>B] Cl@DbC .a\CHBhpBClB CECClG,CZBPB%}CڂDq"CZ%~%|.\TBBDD@D@C`0/BCCDrl .\+ABD $DbCBCZ7ABDD@D`.[,BmB%}%~Dq#Cڂ%|CHCH*B]BClBCCg ClG.a[o BBm CEDrClC CZB BB]CڂCڂCZ%~Cڃ./[,BBAD@D`C`A_BBACClDhC-[wBBB Cl@DbC CZ,BBB/D@D@-[BBpB%}CڂDq"CZ%~%|CH*BpBCh"ClCC-[CHCBBm3?CDrl BBB];@Cڂ%~@|CڂCڃ-`[,CZCW4AD@D`A_CQ3AClB CECClG7[D`D`@@D`@CCCD`C@7[C`BCQ4Cl@CCg ClG,BCK4/D@D@7\4BCZBpB 3@CڃCڂCZ%~CڃBpB ;@ "DbC8\fCHCHCBBh3>CClDhC,B$BP;>%}%~Dq#Cڂ%|85\CZBCW2D@D@CQ2@/ CEDrClC 8r\C`CQ2+ACh"ClCCBCK27AD@D`8\CZ,B$3>BmCڂ%~݀|CڂCڃCHB$;>B] Cl@DbC 8\fCHBhpBClD CECClG,CZBPB%~CڂDq"CZ%~%|9\4BBDD@D@C`@/BCCDrl 9[+ABD "DbCBCZ7ABDD@D`9[,BmB%}%~Dq#Cڂ%|CHCH*B]BCl@CCg ClG8[n BBm CEDrClC CZB BB]CڃCڂCZ%~Cڃ8[e,BBAD@D`C`A_BBACClDhC8r[WBBB Cl@DbC CZ,BBB/D@D@85[eBBpB%~CڂDq"CZ%~%|CH*BpBCh"ClCC8[CH CBBm3?CDrl B BB];@Cڂ%~݀|CڂCڃ7[,CZCW4AD@D`A_CQ3AClD CECClG&pc+޻OH@0>Ba .Dd^CKDmD잒Ba$B*ɀC]C&DAZC]%aiA/+B@CSPH|FD]C}DYDCރd\B7Cs@CLC Cs@$_ڲ=ĪCC%Au34/-DNCV`-C^PD+BWAu M0 CZCDC+@2ECZ$^C޻B~tAw!0 ,DCC,BNGdDx BmDhC>DEHCR(tBG CqCC|R@Cp/N_G"PDCX=༠kBOXD\`CDzEVLAB×CJ~xPCACC뀻CC1aEQ滃:By@ 0ߨC>@CCD:UJA{@CVVbC[D]DBြc4B`rDS߾=ؽBxPCy)HC\C{#cDԩ ,' ?Ȗ,CDh8C>C.5_DDo!໚hN7p4A`C!ြ CCDn벾0Mp.AU@!C\DpCn)!6^lD'H bBdBsC̀C_C$D|NA /ǰ)`*CDrCC!*)]"ptF ptpt ptptpXtFptpjUrjQ$p*7jN!spspspjL$BjJ,4jHSYspjEVjAp sp-&j?aj=sp@j;^ Gj9*<%j6j&p3j"+5j VspjnJj:X j?!j_jV$jzp9j\` p8pj+7jEjspC jdsp,sp d M >^< [w$x 77777777777777777x7777777777777777777777_x77777777777777777x77777777777777777rcr\W/[w9[Wx 77777777777777777x77777777777777777x777777777777777777777777 _%^/r^w6_7#####%[W'd2a 777777 d< tuCYPRESS DATA MODEL CONCEPTS9name] that may be used to uniquely identify an internal entity. The three are interchangeable,however, since they must always be in one-to-one correspondence. The reader may find it simple to think of entity-valued attributes of relationships as pointers to theentities, in fact bi-directional pointers, since the operations we provide allow access in eitherdirection. This is a useful analogy. However, there is no constraint that the model be implementedwith pointers, and the relationships of a relation could equally well be conceptualized as rows of atable whose columns are the attributes of the relation and whose entries for entity-valued attributesare the string names of the entities. For example, the author relationships in the previous figurecould also be displayed in the form:Author:PersonBookGeorgeBackgammon for BeginnersJohnBackgammon for BeginnersJohnAdvanced BackgammonTomAdvanced BackgammonThus our introduction of entities to the Relational model does not entail a different representationthan, say, a Network model might imply, but simply additional integrity checks on entity names andtypes, and new operations upon entities. This compatibility with the Relational data model isimportant, as it allows the application of the powerful Relational calculus or algebra as a querylanguage. We return to query languages in Section 2.5.Note that the only information about an entity associated directly with the entity is the name; thiscontrasts with most other data models. A person's age or spouse, for example, would berepresented in the Cypress data model via age or spouse relations. Thus the relationships in adatabase are the information-carrying elements: entities do not have attributes. However the modelprovide an abbreviation, properties, to access information such as age or spouse in a singleoperation. We will discuss properties later. In addition, the physical data modelling algorithms canstore these data directly with the person entity as a result of the relation key information (since aperson can have only one age, a field can be allocated for that field in the stored objectrepresenting a person.)2.4 Basic operationsThe data model provides the capability to define and examine the data schema, and performoperations on entities, relationships, and aggregates of entities and relationships. In this section wediscuss the basic operations on entities and relationships. In Section 2.5, we discuss the operations+\ptpXtptpt 8 M =[{CYPRESS DATA MODEL CONCEPTS112.5 Aggregate operationsThere are two kinds of operations upon domains and relations, the aggregate types in our model:the definition of domains and relations, and queries on domains and relations. We first discusstheir definitions.Schema definitionAs in other database models and a few programming languages, the Cypress model is self-representing: the data schema is stored and accessible as data. Thus application-independent toolscan be written without coded-in knowledge of the types of data and their relationships.Client-defined domains, relations, and attributes are represented by entities. These entities belong tospecial built-in domains, called the system domains:the Domain domain, with one element (entity) per domainthe Attribute domain, with one element per attributethe Relation domain, with one element per relationThere is also a predefined Datatype domain, with pre-defined elements StringType, BoolType, andIntType, called built-in types. An implementation may also allow client-defined datum types, but theimplementation described in Section 3 currently does not.There may be other system domains, depending upon the implementation of the Relationship-Entity-Datum model, for example to represent indices on relations. The Domain domain, Attributedomain, and all other domains are members of the Domain domain.Information about domains, relations, and attributes are represented by system relations in which thesystem entities participate. The pre-defined SubType relation is a binary relation between domainsand their subdomains. There are also predefined binary relations that map attributes to informationabout the attributes:aRelation: maps an attribute entity to its relation entity.aType: maps an attribute to its type entity (a domain or a built-in type)aUniqueness: maps an attribute entity to {TRUE, FALSE}, depending whether it is partof a key of its relation. We are assuming only one key per relation, here; our implementation relaxes thisassumption in the case of single-attribute keys. The following diagram graphically illustrates a segment of a data schema describing the memberrelation and several domains. The left side of the figure shows two subdomains of Organization,(Company and University), and the right shows the types and uniqueness properties of the memberrelation's attributes memberOf, memberIs, and memberAs.+YptpXtptpt;pQrNpBspK4,IEsBp<@M>gW;_"F9+%sp6#u p&3up0up- uvupu susp*up s ptD(9%p`#e%4!0?(Hs p .up "dup3{upEsu pE >t2 0tpXup3u  p?udpupus d >Z DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL12<==CClDhC,B$BP;>%}%~Dq#Cڂ%|CYiCZBCW2D@D@CQ2@/ CEDrClC CRYwC`CQ2+ACh"ClCCBCK27AD@D`CYiCZ,B$3>BmCڂ%~݀|CڂCڃCHB$;>B] Cl@DbC CYFCHBhpBClD CECClG,CZBPB%~CڂDq"CZ%~%|CYBBDD@D@C`@/BCCDrl CX+ABD "DbCBCZ7ABDD@D`CX,BmB%}%~Dq#Cڂ%|CHCH*B]BCl@CCg ClGCXhn BBm CEDrClC CZB BB]CڃCڂCZ%~CڃCXE,BBAD@D`C`A_BBACClDhCCRX7BBB Cl@DbC CZ,BBB/D@D@CXEBBpB%~CڂDq"CZ%~%|CH*BpBCh"ClCCBXhCH CBBm3?CDrl B BB];@Cڂ%~݀|CڂCڃBX,CZCW4AD@D`A_CQ3AClD CECClG+dWD`D`@@D`@CCCD`C@+dWC`BCQ4ClBCCg ClG,BCK4/D@D@+dBCZBtB 3@CڂCڂCZ%~CڃBtB ;@ $DbC,dCHCHCBBh3>CClDhC,B"BP;>%}%~Dq#Cڂ%|,5dCZBCW2D@D@CQ2@/ CEDrClC ,rdC`CQ2+ACh"ClCCBCK27AD@D`,dCZ,B"3>BmCڂ%~@|CڂCڃCHB";>B] Cl@DbC ,dCHBhpBClB CECClG,CZBPB%}CڂDq"CZ%~%|-dBBDD@D@C`0/BCCDrl -dW+ABD $DbCBCZ7ABDD@D`-d,BmB%}%~Dq#Cڂ%|CHCH*B]BClBCCg ClG,co BBm CEDrClC CZB BB]CڂCڂCZ%~Cڃ,c,BBAD@D`C`A_BBACClDhC,rcBBB Cl@DbC CZ,BBB/D@D@,5cBBpB%}CڂDq"CZ%~%|CH*BpBCh"ClCC,cCHCBBm3?CDrl BBB];@Cڂ%~@|CڂCڃ+d,CZCW4AD@D`A_CQ3AClB CECClGmember,RUD`D`@@D`@CCCD`C@,2UC`BCQ4ClBCCg ClG,BCK4/D@D@,@V4BCZBtB 3@CڂCڂCZ݀%~CڂBtB ;@ $DbC,cVfCHCHCBBh3>CClDhC,B"BP;>%}%~Dq"Cڂ%},VCZBCW2D@D@CQ2@/ CEDrClC ,VC`CQ2+ACh"ClCCBCK27AD@D`-VCZ,B"3>BmCڂ%}@CڂCځCHB";>B] Cl@DbC -AVfCHBhpBClB CECClG,CZBPB%}CڃDq"CZ%~%~-dV4BBDD@D@C`0/BCCDrl -rU+ABD $DbCBCZ7ABDD@D`-dU,BmB%}%~Dq"Cڂ%}CHCH*B]BClBCCg ClG-AUo BBm CEDrClC CZB BB]CڂCڂCZ݀%~Cڂ-Ue,BBAD@D`C`A_BBACClDhC,UWBBB Cl@DbC CZ,BBB/D@D@,UeBBpB%}CڃDq"CZ%~%~CH*BpBCh"ClCC,cUCHCBBm3?CDrl BBB];@Cڂ%}@CڂCځ,@U,CZCW4AD@D`A_CQ3AClB CECClG*r]7D`D`@@D`@CCCD`C@*R]7C`BCQ4ClBCCg ClG,BCK4/D@D@*`]tBCZBtB 3@CڂCڂCZ%~CڃBtB ;@ $DbC*]CHCHCBBh3>CClDhC,B"BP;>%}%~Dq#Cڂ%|*]CZBCW2D@D@CQ2@/ CEDrClC *]C`CQ2+ACh"ClCCBCK27AD@D`+/]CZ,B"3>BmCڂ%~@|CڂCڃCHB";>B] Cl@DbC +a]CHBhpBClB CECClG,CZBPB%}CڂDq"CZ%~%|+]tBBDD@D@C`0/BCCDrl +]7+ABD $DbCBCZ7ABDD@D`+\,BmB%}%~Dq#Cڂ%|CHCH*B]BClBCCg ClG+a\o BBm CEDrClC CZB BB]CڂCڂCZ%~Cڃ+/\,BBAD@D`C`A_BBACClDhC*\BBB Cl@DbC CZ,BBB/D@D@*\BBpB%}CڂDq"CZ%~%|CH*BpBCh"ClCC*\CHCBBm3?CDrl BBB];@Cڂ%~@|CڂCڃ*`\,CZCW4AD@D`A_CQ3AClB CECClG2`D`D`@@D`@CCCD`C@`C`BCQ4ClBCCg ClG,BCK4/D@D@ `BCZBrB 3@CڂCڂCZ%} CڃBrB ;@ $DbCCa&CHCHCBBh3=CClDgC,B$BP;=%}%~}Dq#Cڂ%|uaICZBCW2D@D@CQ2@/ CEDrClB aWC`CQ2+ACh"ClBCBCK27AD@D`aICZ,B$3=BmCڂ%~|CڂCڃCHB$;=B] Cl@DbC !a&CHBhpBClB CECClG,CZBPB%}CڂDq#@CZ%} %|D`BBDD@D@C`0/BCCDrl R`+ABD $DbCBCZ7ABDD@D`D`z,BmB%}%~}Dq#Cڂ%|CHCH*B]BClBCCg ClG!`Hn BBm CEDrClB CZB BB]CڂCڂCZ%} Cڃ`%,BBAD@D`C`A_BBACClDgC`BBB Cl@DbC CZ,BBB/D@D@u`%BBpB%}CڂDq#@CZ%} %|CH*BpBCh"ClBCC`HCHCBBm3?CDrl BBB];@Cڂ%~|CڂCڃ `z,CZCW4AD@D`A_CQ3AClB CECClG;r`7D`D`@@D`@CCCD`C@;R`7C`BCQ4Cl@CCg ClG,BCK4/D@D@;``tBCZBpB 3@CڃCڂCZ%~CڃBpB ;@ "DbC;`CHCHCBBh3>CClDhC,B$BP;>%}%~Dq#Cڂ%|;`CZBCW2D@D@CQ2@/ CEDrClC ;`C`CQ2+ACh"ClCCBCK27AD@D`BmCڂ%~݀|CڂCڃCHB$;>B] Cl@DbC 00BDC,DMB@+1aaC0^)`1O@`dC*C@@&7EI9B@hABs)_@BވCDRa3@Bކ+ ^AQ=n`AҤ5\"ZAv2@CjCAvD8 @̀@n1CD`@1aRelation+dW5НA@܀@!)-CD`@pEɡ:B^BR8pjA޶C@R@RD@&dW>TxB<B`=e8C7DlBEu|DtC8ؼ!PC}/7C@Ҁ]-C#c1Ҽ2nBJн;.@p  CIDqCj/@ DǪrDAV6BX?CCB1t4CD!c+'CKu1W*M4xPDsDCˈ DZDnΗB> 𽡦PCxCj0WCx bj#,CZ?p7!8Bgx"ր(8DqC"ԀDBfD{5BIsAAߓ CVCˈBO4w`CTaRelationDiB/^DGslA%17`?mC=B7CB[A+p3(@W.A DED\u-0]D9Acs9𑽡BoCvtPJCVCwZDP%>7DhECH>$cgC CC@=Cۀ ` ^mDYC `CMCDHECM ^t ZtkdDHCqt ^zB_CpDYpSB_ Z UB"@CnCB sZoCDl3` dSubType R[[D@D@ R[D@D``~]>/H)DrCз`CWCиB"/H`CX ]BZ*_@ DoC$`*`]lCՠC$Cr,6 ۠CՠZYDg3C"ZCC"C2@`C)cm `mCؐ@#h`$CjCؐcA'oCjDqCP#'p`n^Cݓ@(CVpCݔ`@"lCVDpJ`Cj("l^.\FCw;=BXCECw=^CDDr€ 0\CXC` vC\0CC`\+iCDq ` uh)]"ptF ptpt ptptpXtFptpjB3j?25j=L2*j;Qj8`5up" V2u&`/up V,u3j(sj% p u.p&j"Dj 'u p)jnJj: t.Aj^p u!p j)Kjj4u pu pj  + up j !upj |=jGu pj_jd up5F M >Q^< 2Q$x 77777777777777777x77777777777777777-cx77777777777777777x77777777777777777x77777777777777777x77777777777777777x77777777777777777x77777777777777777RWcW ####5Z ####DRX7 777=2b Ia773bw 7777+R_ 77777!bw 77?r\W 777772] 77,2X ####2Vw 77Z Z__+[=_W##########x77777777777777777rT #### 2\########FVatuCYPRESS DATA MODEL CONCEPTS13languages (and MultiRelationSubset implementations) could be built on top of the operations wehave defined thus far. For the purposes of this report, however, we will assume that our querylanguage is the Relational algebra (Codd[1970]), providing the join, projection, and selectionoperators on relations. We choose the relational algebra due to its wide acceptance in the literature,and now commercially. Our choice does not preclude an entity-centric query language in futurework, however. We and others have designed but have not implemented such languages. To demonstrate that the relational algebra can be used with our system, we will define a mappingof the relational operations onto the Cypress model. Some augmentation of the relational algebra isnecessary to include the operations our model provides upon domains. The important aspects ofthe mapping are:1.The Cypress relations exist in our relational mapping essentially unchanged, except thatsome type constraints exist on attribute values; these constraints simply appear asexceptional conditions when violated, they do not effect the form of the relationaloperations.2.For each domain in our schema, we define a unary relation in our mapping with the entityname as its only attribute. The sole purpose of the unary domain relations is to specify theexistence of entities in the domain. For example, a Person relation would contain onetuple per person entity in the database. The unary domain relations allow DomainSubsetto be performed in the relational algebra as a select operation on the domain.3.Entity-valued attributes of relations appear to have the name of the corresponding entitystored in them. There are no "atomic" entity values, so the ChangeName operation isexpensive in this model. All references to an entity in the database appear as the stringname of the entity, and the system generates an error if an entity-valued attribute isinitiaized to an entity which does not exist in the domain.Our representation of the data schema is as before: schema entities and relationships in the domaindomain, relation domain, attribute domain, subtype relation, and attribute relations.The join, project, and select operations in the relational algebra can now be performed byMultiRelationSubset, or by the appropriate calls to RelationSubset. Note that we must typicallydeclare new relations for the result and intermediate results of computing such queries. A selectoperation on a domain may be used to determine whether a particular entity exists, or to enumeratethe entities of a domain. A select operation on a relation may be used to find relationships thatreference a particular entity. Query operations on the schema can be used to determine whatrelations might reference an entity of a particular domain.The unary relations representing domains are used only to determine the existence of entities. An+\ptpXtptpt;pTup"RDPQ?sps psNp@'KIIT6E`CY AoD?:;_X9*S6S4 0@.V,}) up*H;u (p/sp$89sp"4u p (2I f;(<VUzWFupu p 5- b-5tB?;dT  n \=] DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL14implementation could automatically create entities in these domains when an entity-valued attributeof a relationship is assigned a previously non-existent entity name. Note that domain relations mayin fact be ignored altogether by the user if this is done. Domain relations are purely an extension tothe Relational model. We chose not to automatically create entities in our mapping, as we wouldlose the logical integrity check the entities provide. 2.6 Convenience operationsSome more convenient specialized operations are built upon the basic operations described in theprevious two sections. They implement what we call properties and translucent attributes. Althoughtheoretically speaking these operations add no power to the model, they permit a significantlydifferent perspective on the data access and so should be thought of as part of the model.PropertiesProperties allow the client to treat entities as if they, like relationships, had "attributes." Theyprovide the convenience of treating attributes of relationships that reference an entity as if theywere attributes (or properties) of the entity itself. The property operations are: 1.GetPList[entity, attribute1, attribute2]: Attribute1 and attribute2 must be from the samerelation. Returns the values of attribute1 for all relationships in the relation that referencethe entity via attribute2. Attribute2 may be omitted, in which case it is assumed to be theonly other entity-valued attribute of the relation.2.GetP[entity, attribute1, attribute2]: this is identical to GetPList except exactly onerelationship must reference the entity via attribute2; otherwise an error is generated. GetPalways returns one value.3.SetPList[entity, attribute1, value list, attribute2]: Attribute1 and attribute2 must be fromthe same relation. Destroys any existing relationships whose attribute2 equals the entity,and creates new ones for each value in the list, with attribute1 equal to the value, andattribute2 equal to the entity. Attribute2 can be defaulted as in GetPList.4.SetP[entity, attribute1, value, attribute2]: this is identical to SetPList except it simplyadds a new relationship referencing the entity instead of destroying any existing ones(unless attribute1 is a key of its relation, in which case the existing one must be replaced).Thus the property operations allow information specified through relationships to be treated asproperties of the entity itself, in single operations. The property operations and the operationsdefined in earlier sections may be used interchangeably, as there is only one underlyingrepresentation of information: the relationships. As an example of the use of properties, consider)\ptF ptpt ptptpXtFptpjT;(jR7 sjPQ:p jNWjK6 jErjApOj?4s ps p j=LP j;Zj7uMtp`"uCClDhC,B"BP;>%}%~Dq#Cڂ%|-[)CZBCW2D@D@CQ2@/ CEDrClC -[7C`CQ2+ACh"ClCCBCK27AD@D`.[)CZ,B"3>BmCڂ%~@|CڂCڃCHB";>B] Cl@DbC .A[CHBhpBClB CECClG,CZBPB%}CڂDq"CZ%~%|.dZBBDD@D@C`0/BCCDrl .rZ+ABD $DbCBCZ7ABDD@D`.dZZ,BmB%}%~Dq#Cڂ%|CHCH*B]BClBCCg ClG.AZ(o BBm CEDrClC CZB BB]CڂCڂCZ%~Cڃ.Z,BBAD@D`C`A_BBACClDhC-YBBB Cl@DbC CZ,BBB/D@D@-ZBBpB%}CڂDq"CZ%~%|CH*BpBCh"ClCC-cZ(CHCBBm3?CDrl BBB];@Cڂ%~@|CڂCڃ-@ZZ,CZCW4AD@D`A_CQ3AClB CECClGB`D`D`@@D`@CCCD`C@A`C`BCQ4Cl@CCg ClG,BCK4/D@D@Ba4BCZBpB 3@CڃCڂCZ%~CڃBpB ;@ "DbCB#afCHCHCBBh3>CClDhC,B$BP;>%}%~Dq#Cڂ%|BUaCZBCW2D@D@CQ2@/ CEDrClC BaC`CQ2+ACh"ClCCBCK27AD@D`BaCZ,B$3>BmCڂ%~݀|CڂCڃCHB$;>B] Cl@DbC CafCHBhpBClD CECClG,CZBPB%~CڂDq"CZ%~%|C$a4BBDD@D@C`@/BCCDrl C2`+ABD "DbCBCZ7ABDD@D`C$`,BmB%}%~Dq#Cڂ%|CHCH*B]BCl@CCg ClGC`n BBm CEDrClC CZB BB]CڃCڂCZ%~CڃB`e,BBAD@D`C`A_BBACClDhCB`WBBB Cl@DbC CZ,BBB/D@D@BU`eBBpB%~CڂDq"CZ%~%|CH*BpBCh"ClCCB#`CH CBBm3?CDrl B BB];@Cڂ%~݀|CڂCڃB`,CZCW4AD@D`A_CQ3AClD CECClGJohn Smith.[1EϕEu BkDBBpB-.CCtlHeC)+zRD@:pנCǿ`3lDs?C3C^EufdDpC0tCCC6C?9C|+5޽ yBDrCtl7'`Ed\Dm"CPмa@B@CBÐte@CJİBRCDs@8\CdPm'~B hAS# `C.x=CъCLC~:DBpA|)C޶DrCD!H@)9F\PDnj,zAACPZxBGȽC+hCwC.DB# B BfB8<UCшDn}C|.vR:.[EE?6GCZC\@3CCMtC3DaA BBHCwDepCeCi[)=6pB\ZpЀCDnCLGЀER DfqCJm輧5@<CrB_0gwCs^}^BPl4 B68 y\C\DqC` xEZ,DɆA*AWAMO&)C/CLD?K@C0$^! E{BW\`9ڨ`l\C[D]DB@[EнI,&ȴC[dCC2: @%*CC1]benB@ Pn ?ϸCWD]@DDJYCm/@ĀPp?G@CC[C'by_DN0# Ax@B`DM DU:2xEtP-P-e6]@5CCXCW@CFcWPCT)@AJM ApDB D^6TE_щ,bB`C(`@C2:B\CcB{k"y@ёAHClD`@EH{AA|߀cWKCg໡AoCr]r[[]r]][D@D@R]D@D`r[[D@D@r[wD@D`[]D@D`[D@D@]wr]wD@D`]D@D@professormembermemberOfmemberIsmemberAs-Z =ucgý?s`C =BODsDCV@EE]5D|%C=1TPCCCM_A(&C)#VN3&0@HTAP@BҸDrCzXDA\C##@lu@AT @ CxCVH=@Cw(V;\0sCllpBDA^JC}DnCKEQ'Ddr4C`8A@A@CpCzX-HCp&0U6VhTPCB՝%C&DeBfoEmDAC⏼@C-AtACFCK>]HCG"TdG3ý}PCuFBxӠHADDD]*'E` RCe8AfA],C[BfpـC\Tw"CBu_D`쿟?'c@B<DO DS'Eo0@`B;@C~;Cှ'ACUwWwWwUwUwUwWwD@D`2UwD@D@WWWWD@D`WD@D@WwUwD@D@WwD@D`UUD@D@UWD@D`34ageageIsageOf$S&SD@D@$SWD@D`2r`7r`D@D@2r`WD@D`$_)_D@D@$_WD@D`+]"ptpXtptpt;pUGA3>  u p8u;pupupu9ptp:5u4pu53 p/upupu*+p+upup ) upupu pup' upup%Rup;1!vupup,Bup upup up upupup gZM1ts p5' F9t\d) up% =^< rR$Dr` x 77777777777777777x77777777777777777.rX 7777777777####22`c02[\j51upup j2j.upupj,$Dj*Ispj&u&p%j$Vj"J$up(j Gj:sj2pKjup3j05jt&Bpj >rj 6p8s pj>%jejHjd- ( M?QOCYPRESS DATA MODEL CONCEPTS17Publication:PersonBookDateGeorgeBackgammon for Beginners1978JohnBackgammon for Beginners1978MaryHow to Play Chess1981MaryHow to Cheat at Chess1982This relation represents the fact that John and George wrote a book together entitled "Backgammonfor Beginners," published in 1978, and Mary wrote two books on the subject of chess, in 1981 and1982. Alternatively, we could encode the same information in two relations, an author relation and apublication-date relation:Author:PersonBookGeorgeBackgammon for BeginnersJohnBackgammon for BeginnersMaryHow to Play ChessMaryHow to Cheat at ChessPublication-date:BookDateBackgammon for Beginners1978How to Play Chess1981How to Cheat at Chess1982Although the second two relations may seem more verbose than the first one, they are actuallyrepresentationally better in some sense, because the publication dates of books are not representedredundantly. If one wants to change the publication date of "Backgrammon for Beginners" to 1979,for example, it need only be changed in one place in the publication-date relation but in two placesin the publication relation. If the date were changed in only one place in the publication relation,the database would become inconsistent. This kind of behavior is called an update anomaly. Thesecond two relations are said to be a normalized form (as it happens, third normal form) of the firstrelation, and thereby avoid this particular kind of update anomaly. A variety of successively stricter criteria for normalization have been developed and studied, basedon different kinds of real-world dependencies between attributes. Work on normalization willalmost certainly continue through the forseeable future.Relational normalization is not strictly part of the Cypress data model. However the model'soperations (and the tools we will develop in the implementation) encourage what we will callfunctionally irreducible form, in which relations are of the smallest order that is naturally meaningful.This form is in some sense the most fully normalized canonical form that could be defined. +\ptpXtptpt;p T-s QQpQQ%IQQ O%I M%I K[%I I&%IF<%C>"A%@? ;s 99qp99q 7< 5 2 0 ,s *8*pi*8* (Zi &&i #i JFaLK9,:spHzC)Tpr/5 >X 8pHJsp8d@4 ?Q]"6DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL18Functionally irreducible normal form cannot be defined simply syntactically, but rather requiresresort to the semantics of relations. Specifically, the presence or absence of a relationship in arelation represents the truth of some predicate on the state of the represented world (Kent [1979]).For example, a "member" relationship represents the fact that a particular person is a member of aparticular organization. (The absence of such a relationship may mean falsity of that predicate or lack of knowledge;we fortunately need not concern ourselves with this distinction here). A relation is in irreducible form if it is ofthe smallest order possible without introducing new artificial domain(s) not otherwise desired (allrelations can be reduced to binary by introducing artificial domains). Biller [1979] provides a moreprecise definition of irreducible form. We will allow a slight weakening of irreducible form,functionally irreducible form, which permits combining two or more irreducible relations only whentheir semantics are mutually dependent (and therefore all present or absent in our worldrepresentation). For example, a birthday relation between a person, month, day, and year can becombined instead of using three relations. Another example would be an address relation between aperson, street, city, and zip code. Combining an age and phone relation would not result infunctionally irreducible form, however, as their semantics are not mutually dependent. The functionally irreducible relations seen by the user are independent of the physical representationchosen by the system for efficiency, so we are concerned only with the logical data access. Note thatin addition to avoiding update anomalies, functionally irreducible form provides a one-to-onecorrespondence between the relationships in the database and the atomic facts they represent, acanonical form that is in some sense more natural than any other form. 2.8 SegmentsWe would like a mechanism to divide up large databases, to provide different perspectives or subsetsof the data to different users or application programs. In this section we discuss a mechanism toprovide this separation: segments. A segment is a set of entities and relationships that a databaseclient chooses to treat as one logical and physical part of a database.In introducing segments, we will slightly change the definition of an entity, previously defined to beuniquely determined by its domain and name. We will treat entities with the same name anddomain in different segments as different entities, although they may represent the same externalentity. The unique identifier of an internal entity is now the triple[segment, domain, name]. A consequence of this redefinition of entities is that relations and domains do not span segments,either. Application programs must maintain any desired correspondence between entities, domains,or relations with the same name in different segments. We will return to this later. In the nextsection, we will discuss a more powerful but more complex and expensive mechanism, augments, inwhich the database system itself maintains the correspondence.)[{ptF ptpt ptptpXtFptpjSWjQkDjO7sp5jMV jJtwt((jH.ps psp jFeHjD1JjAIj?Jj=Xj;_!sp'j9*3spj6.spspj4V:Fpj1Vj/aj-PR j+(7j(F2pj"r jp:*jTbj spCjGjpNjL jzEjFF` >p6pj 66,jNj7rp jSspjd=s M v?Q\>CYPRESS DATA MODEL CONCEPTS19We introduce three new operations to the data model in conjunction with segments:DeclareSegment[segment, file]: opens a segment with the given name, whose data isstored in the given file.GetSegments[] returns a list of all the segments which have been opened.SegmentOf[entity or relationship] returns the segment in which a given entity orrelationship exists. It may also be applied to relations or domains, since they are entities.With the addition of segments to the data model, we redefine the semantics of the basic accessoperations as follows:1.DeclareDomain and DeclareRelation take an additional argument, namely the segment inwhich the defined domain or relation will reside. The entity representing a domain orrelation now represents data in a particular segment.2.DeclareEntity and DeclareRelationship are unaffected: they implicitly refer to thesegment in which the respective domain or relation was defined. By associating a segment(and therefore a transaction and underlying file) with each relation or domain entityreturned to the database client, we conveniently obviate the need for additional argumentsto every invocation of the basic operations in the data model.3.DestroyEntity, DestroyRelationship, GetF, SetF, DomainOf, RelationOf, and Eq aresimilarly unaffected: they deal with entities and relationships in whatever segment they aredefined. Note that by our definition, entities in different segments are never Eq. Also notethat nothing in our definition makes a SetF across a segment boundary illegal (i.e.SetF[relationship, attribute, entity] where the relationship and entity are in differentsegments). Our current implementation requires that special procedures GetFR and SetFR be used onattributes that can cross segment boundaries, see Section 3.4.DomainSubset and RelationSubset are unchanged when applied to client-defined domainsor relations, i.e., they enumerate only in the segment in which the relation or domain wasdeclared. However an optional argument may be used when applied to one of the systemdomains or relations (e.g. the Domain domain), allowing enumeration over a specificsegment or all segments. RelationSubset's attribute-value-list arguments implictly indicate the appropriatesegment even for system relations, so a segment is not normally needed unless the entire relation isenumerated.Note that the data in a segment is stored in an underlying file physically independent from othersegments, perhaps on another machine. Introducing a file system into the conceptual data model+[5ptpXtptpt;pSZQO~up!MIInu p:Eu"p #C]T >M#u9 pup!wBC=up @u&p2 tAq<pu pu p5a?,@ J tT X p?"d_  >Q\ODESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL20may seem like an odd transgression at this point. From a practical point of view, however, webelieve it better to view certain problems at the level of file systems. This point of view allowssegments to be used for the following purposes:1.Physical independence: Different database applications typically define their data inseparate segments. As a result one application can continue to operate although the datafor another has been logically or physically damaged. One application can entirely rebuildits database without affecting another, or an application can continue to operate in adegraded mode missing data in an unavailable segment.2.Logical independence: Different database applications may have information which pertainsto the same external entity, e.g. a person with a particular social security number. Whenone application performs a DestroyEntity operation, however, we would like the entity todisappear only from that application's point of view. Information maintained by otherapplications should remain unchanged. 3.Protection: Clients can trust the protection provided by a file system more easily than acomplex logical protection mechanism provided by the database system. An even higherassurance of protection can be achieved by physical isolation of the segment at a particularcomputer site. A more complex logical protection mechanism would be desirable for somepurposes, but was deemed beyond the scope of Cypress.4.Reliability: Because segment files are physically as well as logically separate, the probabilityof simultaneous physical failure is lower than for data in the same file. An even higherdegree of independence can be achieved using segments at different sites. Replication ofdata can be provided at the level of segments to provide recovery from failure. Ourimplementation's file systems do not do this, however.5.Performance: Data may be distributed to sites where they are most frequently used. Forexample, personal data may reside on a client's machine while publicly accessed data resideon a file server. If the file system provides replication, it can be used to improveperformance for commonly accessed data.Concurrency control may also be handled by the file system, although locks should be at agranularity finer than whole segments (e.g., file pages). As noted earlier, information about an external entity may be distributed over multiple segments.One or more database applications may cooperate in maintaining the illusion that entities, domains,and relations span segment boundaries. This illusion may be used in at least two ways:1.Private additions may be added to a public segment by adding entities or relationships in aprivate segment. The new relationships may reference entities in the public segment by)\ptF ptpt ptptpXtFptpjT7'jRcjPQ/Lu`sp@`JAF`H J`EI `C5?`sp"$`=L `;_s p`9*&0`6%.3`s p1`0F`.O `,} I`*H5&l`s p>`$8#7`"&3`8t`6p`s pH`/,`VE`!'jQj 9)j Nj!BjtW`*1`dF " M \?Q]3CYPRESS DATA MODEL CONCEPTS21creating representative entities with the same name in the private segment. An examplewould be personal phone numbers and addresses added to a public database of phonenumbers and addresses: an application program would make the two segments appear tothe user as one database.2.If two applications use separate segments A and B, they may safely reference each other'sdata yet remain physically independent. One of the applications may destroy andreconstruct its segment if it uses the same unique names for its entities. If both applicationshave relationships referencing an entity e, and application A does a DestroyEntity operationon e, the entity and relationships referencing it disappear from application A's point ofview, but application B's representative entity and relationships remain.2.9 AugmentsIn this section, we discuss a more elaborate mechanism for segmenting databases, with morepowerful properties. We call these augments, because they are "additive" segments as we will see.Entities, domains, and relations are defined independently of augments. An entity with the samename and domain appears as the same entity regardless of particular augments in which data arestored. However, the entity may or may not be defined in a particular augment. Also, relationshipscontinue to be associated with a particular segment.NOTE: The ideas in the remainder of Section 2 have not been implemented, they are included forfuture interest. These features are not discussed again in this report except for Section 5.10, incontrasting data models. Augments are intended for two main purposes:1.Additive databases: Information about an entity may be distributed among multipleaugments. The database system will make all the augment boundaries invisible, i.e. all thedata will appear as if it is one augment from the application program's point of view.2.Subtractive databases (versions): The relations in an augment encode some state of theworld the database represents. Updates, representing changes to that state, can be made byadding them in a separate augment. The database may then be viewed with and withoutthe changes (or a number of alternate changes) by adding and removing the augment(s) ontop.The operations on augments themselves appear very similar to those on segments. However, anordering on the open augment list is maintained by the database system. The DeclareAugment call+[ptpXtptpt;pSBQ6O~4MIIn*spsp'G9 PE@B)spsp@sp-sp >gsp28r 3apZ1-$sp&.@,C*d([4#spsB!wpsQCp,psp,=NVsps!p$ >E H WpYdsp@ /?Q\DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL22may specify where in the current list the new augment should be opened:DeclareAugment[augment, file, previous augment] opens an augment with the given name, whosedata is stored in the given file. It is defined to appear before the given previous augment in theordering, or at the end if none is given. GetAugments[] returns a list of all the augments whichhave been opened in order. AugmentOf[relationship] may be applied to a relationship todetermine in which augment it is stored.With the addition of augments to the data model, we redefine the semantics of the basic accessoperations as follows:1.DeclareRelation and DeclareDomain return a handle representing the relation or domainin all augments. The relation or domain is declared to exist in the augment passed as anargument; the declare procedure may be called additional times to define the same relationor domain to extend into other augments.2.DeclareEntity and DeclareRelationship create an entity or relationship, respectively, in thetop-most augment in the augment list in which the respective domain or relation is defined.3.GetF, DomainOf, RelationOf, and Eq are unchanged, defined in the obvious way. Anyentities returned by GetF, DomainOf, or RelationOf are of course augment-independent.Eq returns TRUE iff the two entities have the same name and their domains have the samename, regardless of whether the entities are in the same augment.4.SetF is defined as before, with two exceptions. Consider the callSetF[r, a, e]where r is a relationship in augment A, a is an attribute of RelationOf[r] with type T, and eis an entity value in domain D. This call will cause e to exist in A, if it does not already.D must already exist in A. Otherwise, the call fails and nothing is created in A. (The clientmay create D and retry the operation).5.DestroyEntity and DestroyRelationship destroy the given entity or relationship if there isno namesake of the domain or relation higher in the current augment list. Otherwise, theycreate an anti-entity or anti-relationship, respectively, in the top-most client-declarednamesake of the domain or relationship. The semantics of SetF is, in effect, that of aDestroyRelationship followed by a CreateRelationship with the attribute changed.6.DomainSubset searches its domain in all open augments, and has the property that anentity in an augment will not appear in the enumeration if an anti-entity with the samename exists in the entity's domain higher in the current augment list. DomainSubset never)\OptF ptpt ptptpXtFptpjTt< jR?u0p%jP cjM*u p(jKup jIm(jDpAjB>p`upu p4`u `p%spspu p sps`pspsp sp`spsp!sp `V sp{`u pup5`F;`  s psp.` $up`upup`u pD`N`d =u p  M ?Q]iICYPRESS DATA MODEL CONCEPTS23returns the same entity twice, even though it exists in more than one augment.7.RelationSubset similarly searches its relation in all open augments, and has the propertythat a relationship in an augment will not appear in the enumeration if an anti-relationshipwith the same attribute values exists in the relationship's relation higher in the currentaugment list.Augments act specially upon relations upon which keys have been defined: 1.If a CreateRelationship would create a relationship with the same key value as an existingrelationship in the relation in some open augment, then an anti-relationship is automaticallycreated for the existing relationship (in the same augment as the new relationship, the top-most one possible) before creating the new one.2.If an augment is opened containing relationships whose key values match existing ones inopen augments, then anti-relationships for the matching existing relationships areautomatically created in the newly opened augment at that time.The net effect of our definition of augments is that a user or program may make arbitrarymodifications to data in an underlying augment, in a fashion which appears exactly as if a singleaugment contains all the data. The underlying augment, however, is completely unchanged. Theunmodified data in the underlying augment may concurrently be examined by another user, orappear to be modified through another user's augment. Furthermore, the modifications made by thesame user over time are separated and can later be removed.Note that two augments can be merged in a straightforward way to produce an augment thatbehaves as the two in the same order in the same place in the current augment list. The augmentsare merged by combining the elements of the relations and domains in the two augments, discardingentities and anti-entities that match and relationships and anti-relationships that match.Augments are not related in definition or implementation to the atomic transactions that animplementation of the model also provides. However, the reader might find it informative to thinkof a transaction as an augment defined on top of the current data, which is automatically mergedwith the data when the transaction is committed. Transactions thus serve as "short-term" augments.The two mechanisms are useful for entirely different purposes in practice, however.+RptpXtptpt;pJNFu p#(DFBR@U ;H17up45>3`2*1,/-P"6+ENF(?$8 K"Cs pAZf,52;ONS D Z 5&5B-3"AdS V H?QSDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL242.10 ViewsA view is a relation whose relationships are not actually stored. Instead, the primitive retrieval andstorage operations on a view invoke procedures defined by a database client who defines the view.Arbitrary procedures may be defined, and views can therefore be used for a variety of purposes:1.Defining a pseudo-relation in terms of existing relations for convenience: e.g., a fatherview could be implemented in terms of parent and sex relations that are actually stored.2.Allowing changes to the logical representation of data (e.g., actually changing the databaseto store father and mother relations instead) without changing existing programs written interms of the older form.3.Implementing operations on entities in an object-based style by storing tuples in a view.E.g., a Send operation on a Message entity might be invoked by storing a tuple in a sendrelation with two attributes, the message and the recipient. Any result of such a sendoperation would be stored in the database (e.g. as a third attribute of the send relationshipreturned). Thus the view mechanism provides a basis for encoding of proceduralinformation in a database.Views are so-called because they provide a database client a different view of the database than whatwas originally stored. View definitions and implementations are stored in a database as any otherdata, being automatically retrieved when required by the database system.The implementor of a view must define the same operations as needed for any relation. Unlikeview mechanisms in some database systems, views may be defined directly in the underlyingprogramming language. In our case, this is Cedar, and the operations the implementation mustexport are defined as a Cedar interface exported by the view. The view can be dynamically loadedwhen required at run-time. The operations the view provides are:1.RelationSubset and CreateRelationship on the view2.DestroyRelationship, RelationOf, GetF and SetF on its relationshipsA view may also be defined at a higher level than the underlying programming language, in a querylanguage; this simplifies the definition of views as well as allowing some kinds of optimizations thatneed no longer treat a view as a "black box" implementation of an access definition.2.11 SummaryWe have introduced the three basic primitives of the Cypress data model: entity values, datumvalues, and relationships. The basic type checking provided by the model insures that the attributesof relationships are of the proper entity or datum types. Relationship types are called relations.)]"ptF ptpt ptptpXtFptpjUGr jR?psp;&jP JjM2-I`Su`Gp&upup$C`L`Aupup6 `?;`Y`9qupup1u`79% #<CV[d5 5>Q2273. Model Level InterfaceWe now describe the Cedar interface to the implementation of the Cypress data model. Aknowledge of the Cedar or Mesa programming language (Mitchell et al [1979]) and the Cedarprogramming environment is not essential to understanding this section. We will explain Cedarfeatures as they are encountered.We do assume that the reader is familiar with the basic conceptual data model, i.e., has read theprevious section. Our presentation is therefore slightly different in this section: we describe theprocedures in the database interface in roughly the order that a client will want to use them in aprogram. We present types and initialization, schema definition, the basic operations, and thenqueries. It should be emphasized that the interface we are about to describe is only one possibleimplementation of the abstract data model described in Section 2. For example, we have chosen toimplement a procedural interface called by Cedar programs, and to do type checking at run-time.We will discuss some of the trade-offs in our choice of interface in Section 6. The introduction ofnew interfaces, such as a Query level with a compiled access language, will provide a differentperspective on the Cypress data model.3.1 TypesIn this subsection we describe the most important types in the interface. Less pervasive types aretreated at the point where they are first used.Entity: TYPE;Relship: TYPE;An Entity or Relship is not the actual database entity or relationship; they are handles for the actualdatabase objects. All accesses to database objects are performed by calling interface procedures withthe handles as parameters. Even comparisons of two entities for equality must be done in this way.The Entity and Relship handles are allocated from storage and automatically freed by the garbagecollector when no longer needed.Value: TYPE = REF ANY;ValueType: TYPE;Datatype: TYPE;StringType, IntType, BoolType, AnyDomainType: DataType;Gfp ^q ZpH X= VX TV! Pz(9 NFK L'; IF G C  < A># ?d*5 =/L :F 8& 2pr .pY ,_/ (sX &O "spspsp:tp ?T  30 spsp'#  sX  ] )7 ?QY)DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL28Storing Cedar data values in tuples presents several problems. First, since we would like to define asingle operation to store a new value into a specified attribute of a Relship (for instance), there mustbe a single type for all values that pass through this "store-value" procedure. This is the type Valueabove, represented as untyped REFs in Cedar (a REF is a garbage-collectable Cedar pointer). TheDataTypes will be discussed in the next section. Entities, strings, integers, and booleans are thetypes of values the system currently recognizes and allows as attribute values. More precisely, thesefour types are Entity, ROPE (the Cedar name for "heavy-duty" strings), REF INT, and REF BOOL.In the case of an entity-valued attribute, an attribute's type may be AnyDomainType or a specificdomain may be specified. The latter is highly preferred, as AnyDomainType is a loophole in thetype mechanism and limits the kinds of operations that can be performed automatically by thedatabase system or associated tools. We currently provide no mechanism to store compound Cedardata structures such as arrays, lists, or records in a database; the database system's data structuringmechanisms should be used instead. (Cypress query operations such as RelationSubset cannot becomposed upon data that appears as uninterpreted bits in the database. We return to this issue inSection 4.)Note that a Value may be either an Entity or a Datum. Some operations accept any Value, e.g.SetF; others require an Entity, e.g. NameOf. Others may require an Entity from a particular client-defined domain, e.g. a Person. We might think of the hierarchy of built-in and client defined typesand instances of values like this:Value type hierarchyDatabase representative of typeValue (REF ANY)ValueType Datum DatumType ROPE StringType INT IntType BOOL BoolType Entity AnyDomainType person Entity Person domain employee Entity Employee domain ... other client-defined entities ... ... other client-defined domains ...As Cedar doesn't have a good mechanism for defining type hierarchies or new types for client-defined domains, most Cypress operations simply take a REF ANY or an Entity as argument,performing further type checking at run-time.vfpuF pupu pupupXuFpu Mp _L \Fsp Z^s Xxpsp sp) VDspH TW Qspsp'spsp O ;s p Mr=s p K>&6 I %: FM DE Bl_ @7 <\ spspspsp :'spspspsp 7sp6 5"/ Y0;r%/0;r+`/0;.s%+% )% 'i% %5% #% p s%psps%pspc)%( A S(spsp - $ M?QT3YMODEL LEVEL INTERFACE293.2 Transactions and segmentsIn this section we describe the basic operations to start up a database application's interaction withCypress. The client application's data is stored in one or more segments, accessed undertransactions. The Cypress system currently runs on the same machine as the client program,however transactions are implemented by the underlying file system which may reside on anothermachine. Data in remote segments may therefore be concurrently accessed by other instances ofCypress on other client machines.A transaction is a sequence of read and write commands. The system supports the property that theentire sequence of commands executes atomically with respect to all other data retrieval and updates,that is, the transaction executes as if no other transactions were in progress at the same time.Because there may in fact be other transactions accessing the same data at the same time, it ispossible that two transactions may deadlock, in which case one of them must be aborted. So theprice paid for concurrent access is that programs be prepared to retry aborted transactions.The database system provides the capability of accessing a database stored on the same machine asthe database client, using the Pilot file system (Redell et al [1979]), or on Alpine file servers (Brownet al [1983]). We currently permit only one transaction per segment per instance of the databasesoftware on a client machine. That is, data in remote segments may concurrently be updated byapplication programs under separate transactions, but on the same machine transactions are usedsimply to make application transactions on their respective segments independent. This transaction-per-segment scheme is a major simplification of the Cypress package. In addition, as we shall seepresently, nearly all Cypress procedures can automatically infer the appropriate segment andtransaction from the procedure arguments, avoiding the need to pass the transaction or segment forevery database operation. Calls to Initialize, DeclareSegment, and OpenTransaction start the database session. Atransaction is either passed in by the client, or created by the database package (the latter is just aconvenience feature). The operation MarkTransaction below forms the end of a databasetransaction and the start of a new one. The operation AbortTransaction may be used to abort atransaction. Data in a database segment may not be read or updated until the segment andtransaction have been opened. Clients must decide when to tell the system that a transaction iscomplete (with CloseTransaction), and must be prepared to deal with unsolicited notification thatthe current transaction has been aborted because of system failure or lock conflict.The client's interaction with the database system begins with a call to Initialize:"SfpuFpupuGp _r [:pf Y S  VI TF RhK P4! LXD J#%t p1 G*6 EF CF AR\ =v^ ;AJ 9 <% 6^ 4E 2p V 0;,6 .X +V ) %!spsp" #` !Y  sp" %7sp 3& ?! sp 4 ST wHs  & 0?QUDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL30Initialize: PROC[ nCachePages: CARDINAL_ 256, nFreeTuples: CARDINAL_ 32, cacheFileName: ROPE_ NIL ];Initialize initializes the database system and sets various system parameters: nCachePages tells thesystem how many pages of database to keep in virtual memory on the client's machine,nFreeTuples specifies the size to use for the internal free list of Entity and Relship handles, andcacheFileName is the name of the disk file used for the cache backing store. Any or all of thesemay be omitted in the call; they will be given default values. Initialize should be called before anyother operation; the schema declaration operations generate the error DatabaseNotInitialized if thisis violated.Before database operations may be invoked, the client must open the segment(s) in which the dataare stored. The location of the segment is specified by using the full path name of the file, e.g."[MachineName]SubDirectory>SegmentName.segment". Each segment has a uniquename, the name of a Cedar ATOM which is used to refer to it in Cypress operation. The name ofthe Cedar ATOM is normally, though not necessarily, the same as that of the file in which it isstored, except the extension ".segment" and the prefix specifying the location of the file is omitted inthe ATOM. If the file is on the local file system, its name is preceded by "[Local]". For example,"[Local]Foo" refers to a segment file on the local disk named Foo.database;"[Alpine]Baz" refers to a segment named Baz.segment on the directory onthe Alpine server. It is generally a bad idea to access database segments other than through thedatabase interface. However, because segments are physically independent and contain no referencesto other files by file identifier or explicit addresses within files, the segment files may be movedfrom machine to machine or renamed without effect on their contents. If a segment file in a set ofsegments comprising a client database is deleted, the others may still be opened to produce adatabase missing only that segment's entities and relationships. A segment is defined by theoperation DeclareSegment:DeclareSegment: PROC[ filePath: ROPE, segment: Segment, number: INT_ 0, readOnly: BOOL_ FALSE, version: Version_ OldOnly, nBytesInitial, nBytesPerExtent: LONG CARDINAL_ 32768] RETURNS [Segment];Segment: TYPE = ATOM;Version: TYPE = {NewOnly, OldOnly, NewOrOld};The version parameter to DeclareSegment defaults to OldOnly to open an existing file. The signalvfpuF pupu pupupXuFpu Mp _sX \ Z Xx T p&s p Rh J P4s p#spsp Ms pK K.s p IFsp Gb CN AR8+ ?O <sp4 : sp%, 8+= 6Ksp N 47 1Z /K -zM +E V )*9 &<! $] "s s p sX c3 /3 7   - psp s p sp M ?Q\xMODEL LEVEL INTERFACE31IllegalFileName is generated if the directory or machine name is missing from fileName, andFileNotFound is generated at the time a transaction is opened on the segment if the file does notexist. If version NewOnly is passed, a new segment file will be created, erasing any existing one. Inthis case, a number assigned to the segment by the database administrator must also be passed. Thisnumber is necessitated by our current implementation of segments (it specifies the section of the database address space inwhich to map this segment). Finally, the client program can pass version=NewOrOld to open a new orexisting segment file; in this case the segment number must also be passed, of course.The other parameters to DeclareSegment specify properties of the segment. If readOnly=TRUE,then writes are not permitted on the segment; any attempt to invoke a procedure which modifiesdata will generate the error ProtectionViolation. nBytesInitial is the initial size to assign to thesegment, and nBytesPerExtent is the incremental increase in segment size used when more space isrequired for data in the file.For convenience, a call is available to return the list of segments that have been declared in thecurrent Cypress session: GetSegments: PROC RETURNS[LIST OF Segment ];A transaction is associated with a segment by using OpenTransaction: OpenTransaction: PROC[ segment: Segment, userName, password: ROPE_ NIL, useTrans: Transaction_ NIL ];If useTrans is NIL then OpenTransaction establishes a new connection and transaction with thecorresponding (local or remote) file system. Otherwise it uses the supplied transaction. The sametransaction may be associated with more than one segment by calling OpenTransaction with thesame useTrans argument for each. The given user name and password, or by default the logged inuser, will be used if a new connection must be established.Any database operations upon data in a segment before a transaction is opened or after a transactionabort will invoke the Aborted signal. The client should catch this signal on a transaction abort,block any further database operations and wait for completion of any existing ones. Then the clientmay re-open the aborted transaction by calling OpenTransaction. When the remote transaction issuccessfully re-opened, the client's database operations may resume.Note that operations on data in segments under different transactions are independent. Normallythere will be one transaction (and one or more segments) per database application program. A clientmay find what transaction has been associated with a particular segment by calling"SfpuFpupuGp _sp.sp \s p1$ ZspM Xx sp6u Vge TVp sp R"V NFs p s p L9% Isps p G spA Et A.4 ?d ;sX, 7p4sp 3sX 1 /h -3 )Wpspwpsp '#7, $5sp "spR ; X uspsp sp0 Ad  /sp D E "B R ^ L?Q\DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL32TransactionOf: PROC [segment: Segment] RETURNS [Transaction];Transactions may be manipulated by the following procedures:MarkTransaction: PROC[trans: Transaction];AbortTransaction: PROC [trans: Transaction];CloseTransaction: PROC [trans: Transaction];MarkTransaction commits the current database transaction, and immediately starts a new one. Uservariables which reference database entities or relationships are still valid.AbortTransaction aborts the current database transaction. The effect on the data in segmentsassociated with the segment is as if the transactions had never been started, the state is as it was justafter the OpenTransaction call or the most recent MarkTransaction call. Any attempts to usevariables referencing data fetched under the transaction will invoke the NullifiedArgument error. Acall to OpenTransaction is necessary to do more database operations, and all user variablesreferencing database items created or retrieved under the corresponding transaction must be re-initialized (they may reference entities or relationships that no longer exist, and in any case they aremarked invalid by the database system).A simple client program using the database system might have the form, then:Initialize[];DeclareSegment["[Local]Test", $Test];OpenTransaction[$Test];...... database operations, including zero or more MarkTransaction calls ......CloseTransaction[TransactionOf[$Test]];3.3 Data schema definitionThe definition of the client's data schema is done through calls to procedures defined in this section.The data schema is represented in a database as entities and relationships, and although updates tothe schema must go through these procedures to check for illegal or inconsistent definitions, theschema can be read via the normal data operations described in the next section. Each domain,relation, etc., has an entity representative that is used in data operations which refer to that schemaitem. For example, we pass the domain entity when creating a new entity in the domain. The typesvfpuF pupu pupupXuFpu Mp _sX= [:p< W^sX* S, O, KpL IM EspM CW AR spsp ?,sp <spD :5* 8O 6K' 2pL.s ,_%*+'%pI#s!Y' r 'pV ,-tp ]  tpL Ug b4 M ?Q\2MODEL LEVEL INTERFACE33of schema items are:Domain, Relation, Attribute, Datatype, Index, IndexFactor: TYPE = Entity;Of course, since the schema items are entities, they must also belong to domains; there are pre-defined domains, which we call system domains, in the interface for each type of schema entity:DomainDomain, RelationDomain, AttributeDomain, DatatypeDomain, IndexDomain: Domain;There are also pre-defined system relations, which contain information about sub-domains, attributes,and indices. Since these are not required by the typical (application-specific) database client, wedefer the description of the system relations to Section 3.6. In general, any of the data schema may be extended or changed at any time; i.e., data operationsand data schema definition may be intermixed. However, there are a few specific orderingconstraints on schema definition we will note shortly. Also, the database system optimizes for betterperformance if the entire schema is defined before any data are entered. The interactive schema editingtool described in Section 7 allows the schema to be changed regardless of ordering constraints and existing data, byrecreating schema items and copying data invisibly to the user when necessary.All the data schema definition operations take a Version parameter which specifies whether theschema element is a new or existing one. The version defaults to allowing either (NewOrOld): i.e.,the existing entity is returned if it exists, otherwise it is created. This feature avoids separateapplication code for creating the database schema the first time the application program is run. DeclareDomain: PROC [name: ROPE, segment: Segment, version: Version_ NewOrOld, estRelations: INT_ 5] RETURNS [d: Domain];DeclareSubType: PROC[sub, super: Domain];DeclareDomain defines a domain with the given name in the given segment and returns itsrepresentative entity. If the domain already exists and version=NewOnly, the signal AlreadyExistsis generated. If the domain does not already exist and version=OldOnly, then NIL is returned.The parameter estRelations is used to estimate the largest number of relations in which entities ofthis domain are expected to participate.The client may define one domain to be a subtype of another by calling DeclareSubType. Thispermits entities of the subdomain to participate in any relations in which entities of thesuperdomains may participate. All client DeclareSubType calls should be done before declaringrelations on the superdomains (to allow some optimizations). The error MismatchedSegment isgenerated if the sub-domain and super-domain are not in the same segment."SfpuFpupuGp _ [:sXI W^pE U*t p2 QNsXS MrpY K>G I =u E-pN BP @60 >Ju <c :nN 6p.sp& 4^M 2)Q /P ,sX2 )H & ) "- psp sp up 0s p7spspsp  s pI [( >s p KJ s p& up x sp I ^ g@Q[ DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL34DeclareRelation: PROC [ name: ROPE, segment: Segment, version: Version_ NewOrOld] RETURNS [r: Relation];DeclareAttribute: PROC [ r: Relation, name: ROPE, type: ValueType_ NIL, uniqueness: Uniqueness _ None, length: INT_ 0, link: {Linked, Unlinked, Colocated, Remote}_ yes, version: Version_ NewOrOld] RETURNS[a: Attribute];Uniqueness: TYPE = {NonKey, Key, KeyPart, OptionalKey};DeclareRelation defines a new or existing relation with the given name in the given segment andreturns its representative entity. If the relation already exists and version=NewOnly, the signalAlreadyExists is generated. If the relation does not already exist and version=OldOnly, then NILis returned.DeclareAttribute is called once for each attribute of the relation, to define their names, types, anduniqueness. If version=NewOrOld and the attribute already exists, Cypress checks that the newtype, uniqueness, etc. match the existing attribute. The error MismatchedExistingAttribute isgenerated if there is a discrepancy. The attribute name need only be unique in the context of itsrelation, not over all attributes. Note this is the only exception to the data model's rule that names be unique in adomain. Also note that we could dispense with DeclareAttribute altogether by passing a list into the DeclareRelationoperation; we define a separate procedure for programming convenience.The attribute type should be a ValueType, i.e. it may be one of the pre-defined types (IntType,StringType, BoolType, AnyDomainType) or the entity representative for a domain. For pre-defined types, the actual values assigned to attributes of the relationship instances of the relationmust have the corresponding type: REF INT, ROPE, REF BOOL, or Entity. If the attribute has adomain as type, the attribute values in relationships must be entities of that domain or some sub-domain thereof. The type is permitted to be one of the pre-defined system domains such as theDomainDomain, thereby allowing client-defined extensions to the data schema (for example, acomment for each domain describing its purpose). The attribute uniqueness indicates whether the attribute is a key of the relation. If its uniquenessis NonKey, then the attribute is not a key of the relation. If its uniqueness is OptionalKey, then thesystem will ensure that no two relationships in r have the same value for this attribute (if a value hasbeen assigned). The error NonUniqueKeyValue is generated if a non-unique key value results froma call to the SetP, SetF, SetFS, or CreateRelship procedures we define later. Key acts the same asOptionalKey, except that in addition to requiring that no two relationships in r have the same valuefor the attribute, it requires that every entity in the domain referenced by this attribute must bereferenced by a relationship in the relation: the relationships in the relation and the entities in thevfpuF pupu pupupXuFpu Mp _sX \R Y V0 T1 RhO P4 LX7 H|p"sp sp FHW Ds p;spsps Ap >spU ;sp$ 95 sp 7fP 51$uN 3 U 1F -3p sp sp&s *#p (>' &#xpxp $aE "-spE s pM 1  s p D sp3s p 0sp Ksp  sps psp s pDsp L y26 M 2>Q];MODEL LEVEL INTERFACE35domain are in one-to-one correspondence. Finally, if an attribute's uniqueness is KeyPart, then thesystem will ensure that no two relationships in r have the same value for all key attributes of r,though two may have the same values for some subset of them.The length and link arguments to DeclareAttribute have no functional effect on the attribute, butare hints to the database system implementation. For StringType fields, length characters will beallocated for the string within the space allocated for a relationship in the database. There is noupper limit on the size of a string-valued attribute; if it is longer than length, it will be storedseparately from the relationship with no visible effect except for the performance of databaseapplications. The link field is used only for entity-valued fields; it suggests whether the databasesystem should link together relationships which reference an entity in this attribute. In addition, itcan suggest that the relationships referencing an entity in this attribute be physically co-located aswell as linked. Again, its logical effect is only upon performance, not upon the legal operations.DestroyRelation: PROC[r: Relation];DestroyDomain: PROC[d: Domain];DestroySubType: PROC[sub, super: Domain];Relations, domains, and subdomain relationships may be destroyed by calls to the above procedures.Destroying a relation destroys all of it relationships. Destroying a domain destroys all of its entitiesand also any relationships which reference those entities. Destroying a sub-domain relationship hasno effect on existing domains or their entities; it simply makes entities of domain sub no longereligible to participate in the relations in which entities of domain super can participate. Existingrelationships violating the new type structure are allowed to remain. Existing relations and domains may only bemodified by destroying them with the procedures above, with one exception: the operationChangeName (described in Section 3.4) may be used to change the name of a relation or domain.DeclareIndex: PROC [ relation: Relation, indexedAttributes: AttributeList, version: Version];DeclareIndex has no logical effect on the database; it is a performance hint, telling the databasesystem to create a B-Tree index on the given relation for the given indexedAttributes. The indexwill be used to process queries more efficiently. Each index key consists of the concatenated valuesof the indexedAttributes in the relationship the index key references. For entity-valued attributes,the value used in the key is the string name of the entity. The version parameter may be used as inother schema definition procedures, to indicate a new or existing index. If any of the attributes arenot attributes of the given relation then the signal IllegalIndex is generated."SfpuFpupuGp _(+sp \ %sptpsp Z< Vspsp sp0 T6s psp RhX P4 =sp MK Ksp9 IU GbR E-M ARsX# =v 9) 5pL 3E$ 1U X /!Tsp ,Espu *Ep) (O &Os p1" "ssX ?J c p-) /-spsp Q sp"+ Asp ]H )5s p  " ?QY)DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL36The optimal use of indices, links, and colocation, as defined by DeclareIndex and DeclareAttribute,is complex. It may be necessary to do some space and time analysis of a database application tochoose the best trade-off, and a better trade-off may later be found as a result of unanticipatedaccess patterns. Note, however, that a database may be rebuilt with different links, colocation, orindices, and thanks to the data independence our interface provides, existing programs will continueto work without change.If a relation is expected to be very small (less than 100 relationships), then it might reasonably bedefined with neither links nor indices on its attributes. In the typical case of a larger relation, oneshould examine the typical access paths: links are most appropriate if relationships that pertain toparticular entities are involved, indices are more useful if sorting or range queries are desired.B-tree indices are always maintained for domains; that is, an index contains entries for all of theentities in a domain, keyed by their name, so that sorting or lookup by entity name is quick. Stringcomparisons are performed in the usual lexicographic fashion.DeclareProperty: PROC [ relationName: ROPE, of: Domain, type: ValueType, uniqueness: Uniqueness_ None, version: Version_ NewOrOld] RETURNS [property: Attribute];DeclareProperty provides a shorthand for definition of a binary relation between entities of thedomain "of" and values of the specified type. The definitions of type and uniqueness are the sameas for DeclareAttribute. A new relation relationName is created, and its attributes are given thenames "of" and "is". The "is" attribute is returned, so that it can be used to represent the propertyin GetP and SetP defined in the next section.3.4 Basic operations on entities and relationshipsIn this section, we describe the basic operations on entities and relationships; we defer the operationson domains and relations to the next section.A number of error conditions are common to all of the procedures in this section. Since values arerepresented as REF ANYs, all type checking must currently be done at run-time. The procedures inthis section indicate illegal arguments by generating the errors IllegalAttribute, IllegalDomain,IllegalRelation, IllegalValue, IllegalEntity, and IllegalRelship, according to the type of argumentexpected. The error NILArgument is generated if NIL is passed to any procedure that cannot acceptNIL for that argument. The error NullifiedArgument is generated if an entity or relationship ispassed in after it has been deleted or rendered invalid by transaction abort or close. vfpuF pupu pupupXuFpu Mp _As psp \H ZP Xx04 VD X T P4U M:. KU IY E_ CG AR= =vsX ;A2 9 ; 6 2pH 0spsp5 .sps p ,_ X *+spsp #r2 pY - 12  s p6 5 s K,ps p#  s psp. spsp* W M g>Q[MODEL LEVEL INTERFACE37DeclareEntity: PROC[ d: Domain, name: ROPE_ NIL, version: Version_ NewOrOld] RETURNS [e: Entity];DeclareEntity finds or creates an entity in domain d with the given name. The name may beomitted if desired, in which case an entity with a unique name is automatically created. If versionis OldOnly and an entity with the given name does not exist, NIL is returned. If version is NewOnlyand an entity with the given name already exists, the signal NonUniqueEntityName is generated.DeclareRelship: PROC [ r: Relation, avl: AttributeValueList_ NIL, version: Version_ NewOrOld] RETURNS [Relship];DeclareRelship finds or creates a relship in r with the given attribute values. If version isNewOnly, a new relship with the given attribute values is generated. If version is OldOnly, therelship in r with the given attribute values is returned if it exists, otherwise NIL is returned. Ifversion is NewOrOld, the relship with the given attribute values is returned if it exists, otherwiseone is created. If the creation of a new relship violates the key constraints specified byDeclareAttribute, the signal NonUniqueAttributeValue is generated.DestroyEntity: PROC[e: Entity];DestroyEntity removes e from its domain, destroys all relationships referencing it, and destroys theentity representative itself. Any client variables that reference the entity automatically take on thenull value (Null[e] returns TRUE), and cause error NullifiedArgument if passed to database systemprocedures. After an entity is destroyed, its old name may be re-used in creating a new one.DestroyRelship: PROC[t: Relship];DestroyRelship removes t from its relation, and destroys it. Any client variables that reference therelationship automatically take on the null value, and will cause error NullifiedArgument ifsubsequently passed to database system procedures.SetF: PROC[t: Relship, a: Attribute, v: Value];SetF assigns the value v to attribute a of relationship t. If the value is not of the same type as theattribute (or a subtype thereof if the attribute is entity-valued), then the errorMismatchedAttributeValueType is generated. If a is not an attribute of t's relation, IllegalAttributeis generated."SfpuFpupuGp _sX \9 Z V p&sp TAs Rhpsp3sps P4p=sp LXsX J#H G D psp&sp Asp3spsp ? spEsp =vspsp D ;A K 9 sp sp 51sX 1U psp,! /!F! , spspsp  **3 &sX! # pspH &!sp 2 sX/ psp spsp. C wspspsp s Cp  >QXDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL38GetF: PROC[t: Relship, a: Attribute] RETURNS [Value];GetF retrieves the value of attribute a of relationship t. If a is not an attribute of t's relation, errorIllegalAttribute is generated. The client should use the V2x routines described in the next section tocoerce the value into the expected type.If GetF is performed upon an attribute of a relationship whose value has never been assigned, aspecial undefined value is returned. The distinguished undefined value depends upon the type ofthe attribute. For example, for entity-valued attributes, it is NIL. For IntType attributes, it is the largest negativenumber. Clients of the database system should use the following procedures to deal with undefinedvalues: IsUndefined: PROC [v: Value] RETURNS [BOOL];MakeUndefined: PROC[d: DataType] RETURNS [Value];IsUndefined takes a value and indicates whether it is undefined, and MakeUndefined produces anundefined value of a particular type, suitable for use with SetF to make a previously definedattribute become undefined. SetFS: PROC [t: Relship, a: Attribute, v: ROPE];GetFS: PROC[t: Relship, a: Attribute] RETURNS [ROPE];GetFS and SetFS provide a convenient veneer on top of GetF and SetF that provide the illusionthat all relation attributes are string-valued. The effect is something like the Relational data model,and is useful for applications such as a relation displayer and editor that deal only with strings. Thesemantics of GetFS and SetFS depend upon the actual type of the value v of attribute a:typeGetFS returnsSetFS assigns attribute to beStringTypethe string vthe string vIntTypev converted to decimal stringthe string converted to decimal integerBoolType"TRUE" or "FALSE"true if "TRUE", false if "FALSE"a domain Dthe name of the ref'd entitythe entity with name v(or null string if v is NIL)(or NIL if v is null string)AnyDomainTypesame, but includes domain:the entity with the given domain and name:(or NIL if v is null string)The same signals generated by GetF and SetF, such as IllegalAttribute, can also be generated bythese procedures. The string NIL represents the undefined value. The signal NotFound is generatedin the last case above if no entity with the given name is found.vfpuF pupu pupupXuFpu Mp _sX5 [:p spspspsp Ysp-* V( RspF PO NuT LXp)0 J# FHsX, D1 @7 p9s p >6sp ; 7sX0 55 1pspspsp /a -zJ +E spsp*sp s '.'iv`' 'iX -''i %5s ` - #`-' `-  `-`c- / `-)`- pspsp sp  sp-sp A  Mo>QX3MODEL LEVEL INTERFACE39NameOf: PROC [e: Entity] RETURNS [s: ROPE];ChangeName: PROC [e: Entity, s: ROPE];NameOf and ChangeName retrieve or change the name of an entity, respectively. They generatethe signal IllegalEntity if e is not an entity.ChangeName should be used with caution. It is not quite equivalent to destroying and re-creating an entity with the newname but the same existing relationships referencing it. ChangeName is considerably faster than that, and furthermoreentity-valued variables which reference the entity are not nullified by ChangeName, though they would be byDestroyEntity. These features should be a help, not a hindrance. However, changing an entity name may invalidatereferences to the entity from outside the segment, e.g. in another segment or in some application-maintained file such as alog of updates. DomainOf: PROC[e: Entity] RETURNS [Domain];RelationOf: PROC[t: Relship] RETURNS [Relation];DomainOf and RelationOf can be used to find the entity representative of an entity's domain or arelationship's relation, respectively. The signal IllegalEntity is generated if e is not an entity. Thesignal IllegalRelship is generated if r is not a relationship.SegmentOf: PROC[e: Entity] RETURNS [Segment];SegmentOf returns the segment in which an entity is stored. It can be applied to domain, relation,or attribute entities.Eq: PROC [e1: Entity, e2: Entity] RETURNS [BOOL];Eq returns TRUE iff the same database entity is referenced by e1 and e2. This is not equivalent tothe Cedar expression "e1 = e2", which computes Cedar REF equality. If e1 and e2 are indifferent segments, Eq returns true iff they have the same name and their domains have the samename.Null: PROC [x: EntityOrRelship] RETURNS [BOOL];Null returns TRUE iff its argument has been destroyed, is NIL, or has been invalidated by abortionof the transaction under which it was created.GetP: PROC [e: Entity, aIs: Attribute, aOf: Attribute_ NIL] RETURNS [Value];SetP: PROC [e: Entity, aIs: Attribute, v: Value, aOf: Attribute_ NIL] RETURNS[Relship];GetP and SetP are convenience routines for a common use of relationships, to represent"properties" of entities. Properties allow the client to think of values stored in relationships"SfpuFpupuGp _sX+ \& Yps p( V s psp Sut Q+G/ O=7yu+ MOB0 Ka2I Is EsX+ C0 ?ps p. =v3s p) ;Aspsp 7fsX- 3pE 1U -zsX1 (psp/spsptp &sp*spsp $asp3 "- QsX/ psp)sp% m. sXL ]W pspI y%< 2>Q]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL40referencing an entity as if they are directly accessible fields (or "properties") of the entity itself. Seethe figure on page 15 illustrating properties. GetP finds a relationship whose from attribute is equal to e,and returns that relationship's to attribute. The from attribute may be defaulted if the relation isbinary, it is assumed to be the other attribute of to's relation. If it is not binary, the current implementationwill find the "first" other attribute, where "first" is defined by the order of the original calls calls to DeclareAttribute.SetP defaults the from attribute similarly to GetP, but operates differently depending on whetherfrom is a key of the relation. Whether it is a key or not, any previous relationship that referenced ein the from attribute is automatically deleted. In either case, a new relationship is created whosefrom attribute is e and whose to attribute is v. SetP returns the relationship it creates for theconvenience of the client. GetP and SetP can generate the same errors as SetF and GetF, e.g. if eis not an entity or to is not an attribute. In addition, GetP and SetP can generate the errorIllegalProperty if to and from are from different relations. GetP generates the errorMismatchedPropertyCardinality if more than one relationship references e in the from attribute; ifno such relationships exist, it returns a null value of the type of the to attribute. SetP allows anynumber of existing relationships referencing e; it simply adds another one (when from is a key, ofcourse, there will always be one relationship). GetPList: PROC [e: Entity, to: Attribute, from: Attribute_ NIL] RETURNS [LIST OF Value];SetPList: PROC [e: Entity, to: Attribute, vl: LIST OF Value, from: Attribute_ NIL];GetPList and SetPList are similar to GetP and SetP, but they assume that any number ofrelationships may reference the entity e with their from attribute. They generate the signalMismatchedPropertyCardinality if this is not true, i.e. the from attribute is a key. GetPList returnsthe list of values of the to attributes of the relationships that reference e with their from attribute.Cedar has LISP-like list manipulation facilities. SetPList destroys any existing relationships that reference ewith their from attribute, and creates Length[vl] new ones, whose from attributes reference e andwhose to attributes are the elements of vl. GetPList and SetPList may generate any of the errorsthat GetF and SetF may generate, and the error IllegalProperty if to and from are from differentrelations.Note that the semantics of SetPList are not quite consistent with the semantics of SetP. SetPListreplaces the current values associated with a "property" with the new values (i.e., destroys and re-creates relationships); SetP adds a new property value, unless the aOf attribute is a key, in whichcase it replaces the current value. The semantics are defined in this way because this has proven themost convenient in our application programs.Examples of the use of the property procedures for data access can be found in Section 4.3.Properties are also useful for obtaining information about the data schema. For example, GetP[a,aRelationIs] will return the attribute a's relation, and GetPList[d, aTypeOf] will return all thevfpuF pupu pupupXuFpu Mp _Qu \pspsps Zpspsp. Xx3sp u/ VD,Q Tsp spsp. QspDs Opsp$5 Mrsp sp sp spsp, K> spsp!spsp s I psp$spsp Fsp@spspspsp Dsp*spsp Bltp,sp sp @7-sp#sp >1 :'sXX 7S 3Cpspspsp$ 1'sp sp% .spspsp ,sp0sp sp *ru% psp3s (=p spsp spsp & spspspsp #spspspspspsp !  sp0sps tp#9 sptp"sp S#C , oK ;>s  pspsp  M ?Q]LMODEL LEVEL INTERFACE41attributes that can reference domain d.E2V: PROC[e: Entity] RETURNS[v: Value];B2V: PROC[b: BOOLEAN] RETURNS[v: Value];I2V: PROC[i: LONG INTEGER] RETURNS[v: Value];S2V: PROC[s: ROPE] RETURNS[v: Value];The x2V routines convert the various Cedar types to Values. The conversion is not normallyrequired for ropes and entities since the compiler will widen these into the REF ANY type Value.V2E: PROC[v: Value] RETURNS[Entity];V2B: PROC[v: Value] RETURNS[BOOLEAN];V2I: PROC [v: Value] RETURNS[LONG INTEGER];V2S: PROC [v: Value] RETURNS[ROPE];The V2x routines convert Values to the various Cedar types. The MismatchedValueType error israised if the value is of the wrong type. It is recommended that these routines be used rather thanuser-written NARROWs, as the representation of Values may change. Also, NARROWs of opaque typesdon't yet work in the Cedar compiler.3.5 Query operations on domains and relationsIn this section we describe queries upon domains and relations: operations that enumerate entitiesor relationships satisfying some constraint.RelationSubset: PROC[ r: Relation, constraint: AttributeValueList_ NIL] RETURNS [RelshipSet];NextRelship: PROC[rs: RelshipSet] RETURNS [Relship];PrevRelship: PROC[rs: RelshipSet] RETURNS [Relship];ReleaseRelshipSet: PROC [rs: RelshipSet];"SfpuFpupuGp _%sp XsX' V( Ty- RE% Mpsp-sp! Ka)7 GsX$ EQ% C+ @# <8pspsp"sp :); 7 wpsp uwu 5% /Dr- *p#@ (`, $sX "P3  @4 d4 )$ A>QSDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL42AttributeValueList: TYPE = LIST OF AttributeValue;AttributeValue: TYPE = RECORD [ attribute: Attribute, low: Value, high: Value_ NIL -- omitted where same as low or not applicable --];The basic query operation is RelationSubset. It returns a generator of all the relationships inrelation r which satisfy a constraint list of attribute values. The relationships are enumerated bycalling NextRelship repeatedly; it returns NIL when there are no more relationships. PrevRelshipmay similarly be called repeatedly to back the enumeration up, returning the previous relationship; itreturns NIL if the enumeration is at the beginning. ReleaseRelshipSet should be called when theclient is finished with the query.The constraint list may be NIL, in which case all of the relationships in r will be enumerated.Otherwise, relationships which satisfy the concatenation of constraints on attributes in the list will beenumerated. If an index exists on some subset of the attributes, the relationships will be enumeratedsorted on the concatenated values of those attributes. For a StringType, IntType, or TimeTypeattribute a of r, the contraint list may contain a record of the form [a, b, c] where the attributevalue must be greater or equal to b and less than or equal to c to satisfy the constraint. For anytype of attribute, the list may contain a record of the form [a, b] where the value of the attributemust exactly equal b. The Cedar ROPE literals "" and "\377" may be used in queries as aninfinitely large and infinitely small string, respectively. The signal MismatchedAttributeValueTypeis generated by RelationSubset if one of the low or high values in the list is of a different typethan its corresponding attribute.DomainSubset: PROC[ d: Domain, lowName, highName: ROPE_ NIL, searchSubDomains: BOOL_ TRUE, searchSegment: Segment_ NIL] RETURNS [EntitySet];NextEntity: PROC[EntitySet] RETURNS [Entity];PrevEntity: PROC[EntitySet] RETURNS [Entity];ReleaseEntitySet: PROC[EntitySet];vfpuF pupu pupupXuFpu Mp _sX2 \ Z Xx VDG Qp s p5 O`spZ M,s psp(s JpU Hsp*sp F" Asp,sp ?&C =v51 ;A$s psps 9 p spsp'sp 6 spsp$ 4<sp! 2psp sp 0;8s .ps pspsp  +! 'sX % # !Y %  - 9- ]" M?QVMODEL LEVEL INTERFACE43DomainSubset enumerates all the entities in a domain. If lowName and highName are NIL, theentire domain is enumerated, in no particular order. Otherwise, only those entities whose names arelexicographically greater than lowName and less than highName are enumerated, in lexicographicorder. If searchSubDomains is TRUE, subdomains of d are also enumerated. Each subdomain issorted separately. The searchSegment argument is currently only used if d is one of the systemdomains, e.g. the Domain domain. It is used to specify which segment to search.Analogously to relation enumeration, NextEntity and PrevEntity may be used to enumerate theentities returned by DomainSubset, and ReleaseEntitySet should be called upon completion.GetDomainRefAttributes: PROC [d: Domain] RETURNS [AttributeList];This procedure returns a list of all attributes, of any relation defined in d's segment, which referencedomain d or one of its superdomains. The list does not include AnyDomainType attributes, whichcan reference any domain. GetDomainRefAttributes is implemented via queries on the dataschema. GetDomainRefAttributes is useful for application-independent tools; most specificapplications can code-in the relevant attributes.GetEntityRefAttributes: PROC [e: Entity] RETURNS [AttributeList];This procedure returns a list of all attributes in which some existing relationship actually referencese, including AnyDomainType attributes.3.6 System domains and relationsIn this section we describe what one might call the schema schema, the pre-defined system domainsand relations which constitute the data schema for client-defined domains and relations. The typicaldatabase application writer may skip this section, since the schema declaration operations defined inSection 3.3 are adequate when the data schema is completely defined and known at the time aprogram is written. The system domains and relations we describe in this section are most useful forgeneral-purpose tools (e.g. for displaying, querying, or dumping any database), where the tools mustexamine the data schema "on the fly".As noted earlier, the permanent repository for data describing user-defined data in a database is thedatabase's data schema, represented by schema entities and relationships. Schema entities aremembers of one of the pre-defined system domains: DomainDomain, RelationDomain,DatatypeDomain, and so on. Every client-defined domain, relation, or attribute contains arepresentative entity in these domains. Client-defined datatypes are not currently permitted, so theonly entities in the DataType domain are the pre-defined IntType, StringType, and BoolType."SfpuFpupuGp _s pspspsp \H Zspsp Xx spspu VDps p$sp  TP O` s ps p M,s psp  IPsXA Dp4sp Blsp/s p @7sp" >sp; ;1 7sXA 3Cp/8 1sp s p s *r &p4t p $'> "s)< ?G  U Atp % P -tp $ ]?(tp s @ )pD e spsps psp n y>Q]+DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL44The information about the client-defined domains and attributes are encoded by relationships in thedatabase. Domains participate in the system relation dSubType, which encodes a domain typehierarchy:dSubType: Relation; dSubTypeOf: Attribute; -- the domain in this attribute is a super-type of dSubTypeIs: Attribute; -- the domain in this attributeThe dSubType has one element per direct domain-subdomain relationship, it does not contain thetransitive closure of that relation. However, it is guaranteed to contain no cycles. That is, thedatabase system checks that there is no set of domains d1, d2, ... dN, N>1, such that d1 is a subtypeof d2, d2 is a subtype of d3, and so on to dN, and d1=dN. The dSubType may define a lattice asopposed to a tree, i.e. the sSubType attribute is not a key of the relation.The information about attributes is encoded as binary relations, one relation for each argument tothe DeclareAttribute procedure defining properties of the attribute. The names are easy toremember; for each argument, e.g. Foo, we define the aFoo relation, with attributes aFooOf andaFooIs. The aFooIs attribute is the value of that argument, and the aFooOf attribute is [the entityrepresentative of] the attribute it pertains to. Thus we have the following relations: aRelation: PUBLIC READONLY Relation; -- Specifies attribute - relation correspondence: -- [aRelationOf: KEY Attribute, aRelationIs: Relation] aRelationOf: PUBLIC READONLY Attribute; -- attribute whose relation we are specifying aRelationIs: PUBLIC READONLY Attribute; -- the relation of that attribute aType: PUBLIC READONLY Relation; -- Specifies types of relation attributes: -- [aTypeOf: KEY Attribute, aTypeIs: ValueType] aTypeOf: PUBLIC READONLY Attribute; -- the attribute aTypeIs: PUBLIC READONLY Attribute; -- domain or datatype of the attribute aUniqueness: PUBLIC READONLY Relation; -- Specifies attribute value uniqueness: -- [aUniquenessOf: KEY Attribute, aUniquenessIs: INT LOOPHOLE[Uniqueness]] aUniquenessOf: PUBLIC READONLY Attribute; -- the attribute aUniquenessIs: PUBLIC READONLY Attribute; -- INT for Uniqueness: 0=None, 1=Key, etc. aLength: PUBLIC READONLY Relation; -- Specifies length of attributes: -- [aLengthOf: KEY Attribute, aLengthIs: INT] aLengthOf: PUBLIC READONLY Attribute; -- the attribute aLengthIs: PUBLIC READONLY Attribute; -- INT corresponding to attribute's lengthvfpuF pupu pupupXuFpu Mp _N \%sp Z VsX TK Rh8 NpspF LX12 J#$IJ#IJ#IJ#IJ# GGbGGbGGbGGbGGbGGbGsp Esp( A+7 ?sp? =v!spspsp ;Aspsp2sp 9 W 51sXX 2: 0Y .M *+M '3 %8 #N %Q M > X G 1 : U M >Q] MODEL LEVEL INTERFACE45 aLink: PUBLIC READONLY Relation; -- Specifies whether attribute is linked: -- [aLinkOf: KEY Attribute, aLinkIs: INT] aLinkOf: PUBLIC READONLY Attribute; -- the attribute aLinkIs: PUBLIC READONLY Attribute; -- 0=unlinked, 1=linked, 2 =colocated The final set of system relations pertain to index factors. Each index on a relation is defined toinclude one or more attributes of a relation. For each attribute in the index, there is an index factorentity. For each index, there is an index entity. Each index factor is associated with exactly oneindex and exactly one attribute. Indices may have many index factors, however, and an attributemay be associated with more than one index factor, since attributes may participate in multipleindices. The two relations pertaining to indices map indices on to their index factors, and indexfactors to the attributes they index: ifIndex: PUBLIC READONLY Relation; -- Specifies the index factors for each index -- [ifIndexOf: KEY IndexFactor, ifIndexIs: Index] ifIndexOf: PUBLIC READONLY Attribute; -- the index factor ifIndexIs: PUBLIC READONLY Attribute; -- index of the factor ifAttribute: PUBLIC READONLY Relation; -- Specifies attribute index factor corresponds to -- [ifAttributeOf: KEY IndexFactor, ifAttributeIs: Attribute] ifAttributeOf: PUBLIC READONLY Attribute; -- the index factor ifAttributeIs: PUBLIC READONLY Attribute; -- the attribute this factor representsThe relations on attributes, index factors, and domains can be queried with the RelationSubset orGetPList operations. For example, GetP[a, aRelationIs] returns the attribute a's relation.GetPList[r, aRelationOf] returns the relation r's attributes. RelationSubset[dSubType,LIST[[dSubTypeIs, d]]] will enumerate all the dSubType relationships in which d is the subtype.As noted earlier, the data schema (attributes, relations, domains, indices, index factors, and relationspertaining to these) may only be read, not written by the database client. In order to ensure theconsistency of the schema, it must be written indirectly through the schema definition procedures:DeclareDomain, DeclareRelation, DeclareAttribute, and DeclareSubType. Attempts to performupdates through operations such as SetP result in the error ImplicitSchemaUpdate.3.7 ErrorsThe Cedar language provides a SIGNAL mechanism for returning control to the caller of aprocedure when an exceptional condition is identified. When a database system operation invokesan error, the SIGNAL Error is generated, with an error code indicating the type of error that"SfpuFpupuGp _sXL \- Z8 XxO Tp-t p& Rh#E P4d M@ KV I5- Gb% CsXR AR5 ?> D.CMARJ?M=@;K:'P8]J6G4G2R12L/hE-J+H*M(=F&s<$J":!<HL}IE@ Bp3s p  )sp sp+ % \ M ^>QZ474. Application ExampleThis section provides a simple example of the use of Cypress. Section 4.1 introduces the example, adatabase of documents. Section 4.2 is a discussion of database design: the process of representingabstractions of real-world information structures in a database, somewhat specialized to the datastructures available in Cypress. In Section 4.3, a working program is illustrated.Our example is necessarily short; don't expect any startling revelations on these pages. We will tryto consider some of the most common cases, however.4.1 A database applicationWhat are the properties of a well-designed database? To a large extent these properties follow fromthe general properties of databases. For instance, we would like our databases to extend gracefullyas new types of information are added, since the existing data and programs are likely to be quitevaluable.It may be useful to consider the following point. The distinguishing aspect of information stored ina database system is that at least some of it is stored in a form that can be interpreted by the systemitself, rather than only by some application-specific program. Hence, one important dimension ofvariation among different database designs is in the amount of the database that is system-interpretable, i.e. the kinds of queries that can be answered by the system.As an example of variation in this dimension, consider the problem of designing a database fororganizing a collection of Mesa modules. In the present Mesa environment, this database wouldneed to include at least the names of all the definitions modules, program modules, configurationdescriptions, and current .Bcd files. A database containing only this information is little more than afile directory, and therefore the system's power to answer queries about information in this databaseis very limited. A somewhat richer database might represent the DIRECTORY and IMPORTS sectionsof each module as relationships, so that queries such as "which modules import interface Y?" can beanswered by the system. This might be elaborated further to deal with the use of individual typesand procedures from a definitions module, and so on. There may be a limit beyond which it isuseless to represent smaller objects in the database; if we aren't interested in answering queries like"what procedures in this module contain IF statements?", it may be attractive to represent the bodyof a procedure (or some smaller naming scope) as a text string that is not interpretable by thedatabase system, even though it is stored in a database. We shall illustrate design ideas with a database of information about documents. Our currentfacilities, which again are simply file directories, leave much to be desired. The title of a document;[{pSYqO}pbMIMKGHSDpeBC3;r8p\5 Y3 V1s-pb+c#sp2 ).a&"9$L p6(G-4L\ 322tptp8!tpzJF5(  ] (tp9Xt8)Tp=d"E v>Q\DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL48on the printed page does not tell the reader where the document is stored or how to print a copy.Relationships between different versions of the same basic document are not explicit. Retrievals bycontent are impossible. Our goal here is not to solve all of these problems, but to start a design thathas the potential of dealing with some of them.4.2 Schema designEach document in our example database has a title and a set of authors. Hence we might representa collection of documents with a domain of entities whose name is the title of the document, and anauthor property specifying the authors:Document: Domain = DeclareDomain["Domain"];dAuthors: Property = DeclareProperty["author", Document, StringType];Here the authors' names are concatenated into a single string, using some punctuation scheme toallow the string to be decoded into the list of authors. This is a very poor database design because itdoes not allow the system to respond easily to queries involving authors; the system cannot parse theencoded author list.Note that in the above definition authors are strings, so anything is acceptable as an author. Thisweak typing has some flexibility: the database will never complain that it doesn't know the authoryou just attached to a certain document. However, the system is not helpful in catching errors whena new document is added to the database. If "Mark R. Brown" is mistakenly spelled "Mark R.Browne", then one of Mark's papers will not be properly retrieved by a later search. A step in thedirection of stronger type checking is to provide a separate domain for authors.To represent authors as entities, and to allow a variable number number of authors for a document,a better design would be:Document: Domain = DeclareDomain["Domain"];Person: Domain = DeclareDomain["Person"];author: Property = DeclareProperty["author", Document, Person];Incidentally, in the last line above we define a property rather than relation for brevity. Instead ofthe author property declaration we could have written:author: Relation = DeclareRelation["author"]; authorOf: Attribute = DeclareAttribute[author, "of", Document]; authorIs: Attribute = DeclareAttribute[author, "is", Person];)YptF ptpt ptptpXtFptpjQLjO}KjMI\ jK/jDrj@p;&j>\jQ\O&DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL50where a large number of relationships are expected to refer to the same entity, it is also more spaceefficient in our implementation.If an authorOrder attribute is defined, the client may wish to redefine the authorOf attribute sothat links (pointers) are not maintained between the Document entities and author relationships,instead defining a more space-efficient B-tree index on the [authorOf, authorOrder] pair:authorOf: Attribute = DeclareAttribute[ relation: author, name: "of", type: Document, link: FALSE];authorIndex: Index = DeclareIndex[author, LIST[[authorOf, authorOrder]]];The Cypress implementation will use this index to process any call of the formRelationSubset[author, LIST[[authorOf, x]]This call to RelationSubset will therefore enumerate authors of document x sorted by authorOrder.Cypress will also use the index in processing for GetPList[..., authorOf] as GetPList usesRelationSubset.This solution is also somewhat less than perfect, as it depends upon the fact that the Cypressimplementation orders relationships when an index exist; but indices are not intended to change thesemantics of the operations, only to improve performance. Probably the best solution, if theordering is important to the semantics of a database application, is to represent a list by a binary"next" relation connecting the entities in an ordering.Documents have other interesting properties. Some of these, for example the date on which thedocument was produced, are in one-to-one correspondence with documents. Such properties can bedefined by specifying a relation or property as being keyed on the document:publDate: Property = DeclareProperty["publDate", Document, StringType, Key];We are using the convention that domain names are capitalized and relation, attribute, and propertynames are not capitalized, both for the Cedar Mesa variable names and in the names used in thedatabase system itself. If and when the database system is better integrated with Cedar Mesa, the Cedar anddatabase names will be one and the same.We might wish to include additional information for particular kinds of documents, for exampleconference papers. Conference papers may participate in the same relations as other documents.For example, they have authors. In addition, we may want to define relations in which onlyconference papers may participate, for example a presentation relation which defines who presentedthe paper, and where. We can define a conference paper to be a sub-domain of documents, and)]"ptF ptpt ptptpXtFptpjUG_jSjO6pu p- up jMup upjJY{FuX'{D@{@TIjd #>QDTDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL52<==2UD`D`@@D`@CCCD`C@>UC`BCQ4Cl@CCg ClG,BCK4/D@D@> UBCZBpB 3@CڃCڂCZ݀%~CڂBpB ;@ "DbC>CVCHCHCBBh3>CClDhC,B$BP;>%}%~Dq"Cڂ%}>uV)CZBCW2D@D@CQ2@/ CEDrClC >V7C`CQ2+ACh"ClCCBCK27AD@D`>V)CZ,B$3>BmCڂ%}݀CڂCځCHB$;>B] Cl@DbC ?!VCHBhpBClD CECClG,CZBPB%~CڃDq"CZ%~%~?DUBBDD@D@C`@/BCCDrl ?RU+ABD "DbCBCZ7ABDD@D`?DUZ,BmB%}%~Dq"Cڂ%}CHCH*B]BCl@CCg ClG?!U(n BBm CEDrClC CZB BB]CڃCڂCZ݀%~Cڂ>U,BBAD@D`C`A_BBACClDhC>TBBB Cl@DbC CZ,BBB/D@D@>uUBBpB%~CڃDq"CZ%~%~CH*BpBCh"ClCC>CU(CH CBBm3?CDrl B BB];@Cڂ%}݀CڂCځ> UZ,CZCW4AD@D`A_CQ3AClD CECClG>rLwD`D`@@D`@CCCD`C@>RLwC`BCQ4Cl@CCg ClG,BCK4/D@D@>`LBCZBpB 3@CڃCڂCZ݀%~CڂBpB ;@ "DbC>LCHCHCBBh3>CClDhC,B$BP;>%}%~Dq"Cڂ%}>M CZBCW2D@D@CQ2@/ CEDrClC >MC`CQ2+ACh"ClCCBCK27AD@D`?/M CZ,B$3>BmCڂ%}݀CڂCځCHB$;>B] Cl@DbC ?aLCHBhpBClD CECClG,CZBPB%~CڃDq"CZ%~%~?LBBDD@D@C`@/BCCDrl ?Lw+ABD "DbCBCZ7ABDD@D`?L:,BmB%}%~Dq"Cڂ%}CHCH*B]BCl@CCg ClG?aLn BBm CEDrClC CZB BB]CڃCڂCZ݀%~Cڂ?/K,BBAD@D`C`A_BBACClDhC>KBBB Cl@DbC CZ,BBB/D@D@>KBBpB%~CڃDq"CZ%~%~CH*BpBCh"ClCC>LCH CBBm3?CDrl B BB];@Cڂ%}݀CڂCځ>`L:,CZCW4AD@D`A_CQ3AClD CECClGRick CattellMark Brown rT7 rQw2Qw2T7 rT7 T7 QwD@D@ RT7D@D` rQ2QD@D@ rQWD@D`QwT7D@D`RQwD@D@2T rTD@D`2TWD@D@T7QD@D@T7D@D` I GGI I I GD@D@ ID@D` G7G7D@D@ FD@D`rGrID@D`GD@D@I ID@D`ID@D@RIRGWD@D@ID@D`,V,T5T5V,V-V-TD@D@,VD@D`,T75T7D@D@,SD@D`5T5VD@D`5TD@D@5V,VD@D`5VD@D@1rV1rTWD@D@12VD@D`-RJ-RH6H6J-RJ-rJ-rHD@D@-2JD@D`-RH76H7D@D@-RGD@D`5H5JD@D`62HD@D@6J-RJD@D`6JD@D@1J1HWD@D@1JD@D`Cypress DatabaseAn Analysis ofPriority Queues-2P-2M5M5P-2P-RP-RMD@D@-PD@D`-2M5MD@D@-2MD@D`5M5PD@D`6MD@D@5Pw-2PwD@D`5PD@D@1P1ND@D@1rPD@D`ofofofofofisisis198219774IwEsCr,6;2CrC!CRwAC!˲89 zD[Cb C>pݻBM0Am`@ 6S CYlDq2Cb 4OWE(A B4@"@3;&C' OBC 8QDeڼ)sBܩC C9D`B@B`9OEI|Wʌ(?CPCcCDoa8:A@BDODSn'';NsD?K,@Q A~Cr<!UC@Cr6D[@[:`~A CcD`UCO =#MD[oӽB@B@C1A{7`Cο@C)D[B@ %@CDi8Cޫ<4SUEhVqAIB# D@?T@D@0ҿ 95Ce AW<|A DCD]}19RUE*@"=1C[^AR.CAπ0C4AFBz CD`?T.:U8?\B??&߀耽pCDlBEoDBALO@ C9B2fC8*:T8746ZW ཆCZ)D8$CDhBDE̒DVө6PC B]PCB2<sC%S7p;,D{C~'AQ@DAD^FAِCECB88#C.NpdDCAAo>1LƀBpDJ(DXdĀEpc h*AD€A@C䀽<@9B(C*OWDW0HCspCBDU5@DMEmT#jC@D{n̼n8C9@BpC;#P8DpCDk. CoDc@C Es뾶CP LCfCڂC>5B C>-Rxq*CĪ@i, CpDc@C L Ddc5V?C^?Ϋ?FCx@ KCoCSHEjˀC-&CB(߀=܀CһRڻDQCm`'A DCD]݀=Z#HE;DLN`;GDF̗Cc\`C>BoXC?6 CKC@@մfC=C DgxB#(oIE:DLQ:D@軍*NC:Cր<@ZA`CՀ!vCQ9Bz@@?B`3bC_DfdBoPRERQB׬FBRШC@A @0Cـ<׬7,BA/WC)Da@AQR0SEC\G@C]3BQ+]tCBWPC.|b(zB~2&0C}DcA&HE2`CAg?BGW@+CA߀:C.dCX 3D<`B @CDeBa怼IEʈCM]CCƐ B=CiAD4C7eDB]AA漀2:C_Db AisisrwD`D`@@D`@CCCD`C@RwC`BCQ2ClBCC@h ClB,BCK2/D@D@`BCZBqB$3=CڂCڂCZ%} CڂBrB$;= $DbCCHCHCBBh3=CClBgC,B%BP;=%}%}}Dq#@Cڂ%}  CZBCW1D@D@CQ10/ @CEDrClB C`CQ1+ACh`$ClBCBCK17AD@D`/ CZ,B%3=BmCڂ%}}CڂCڂCHB%;=B] @ClBDbC aCHBhnBClB CECClB,CZBPB%}CڂDq#@CZ%} %} BBBD@D@C`0/BACDrg w+ABB $DbCBCZ7ABBD@D`:,BmB%}%}}Dq#@Cڂ%} CHCH*B]BClBCCh ClBamBBm @CEDrClB CZBBB]CڂCڂCZ%} Cڂ/,΀BBAD@D`C`΀A_BBACClBgC΀BBB @ClBDbC CZ,΀BBB/D@D@BBrB%}CڂDq#@CZ%}@%} CH*BrBCh`$ClBCCHCBBm3<CDr g BB#B];=Cڂ%}}CڂCڂ`:,CZCW2AD@D`A_CQ1AClB @CECClB,,::,,,D@D@,rD@D`,:D@D@,D@D`99D@D`:2D@D@:,D@D`:7D@D@00D@D@0D@D`55D@D@5rD@D`R.D`D`@@D`@CCCD`C@2.C`BCQ2ClBCCh ClC,BCK2/D@D@@.BCZBrB$3>CڂCڂCZ݀%} CڂBrB$;> $DbCc/&CHCHCBBh3=CClDgC,B$BP;=%}%~}Dq"Cڂ%~/ICZBCW2D@D@CQ2@/ CEDrClB /WC`CQ2+ACh"ClBCBCK27AD@D`/ICZ,B$3=BmCڂ%}CڂCڂCHB$;=B] Cl@DbC A/&CHBhnBClB CECClC,CZBPB%}CڃDq#@CZ%} %~d.BBBD@D@C`0/BACDrh r.+ABB $DbCBCZ7ABBD@D`d.z,BmB%}%~}Dq"Cڂ%~CHCH*B]BClBCCh ClCA.Hn BBm CEDrClB CZB BB]CڂCڂCZ݀%} Cڂ.%,BBAD@D`C`A_BBACClDgC.BBB Cl@DbC CZ,BBB/D@D@.%BBpB%}CڃDq#@CZ%} %~CH*BpBCh"ClBCc.HCHCBBm3=CDrh BB#B];>Cڂ%}CڂCڂ@.z,CZCW2AD@D`A_CQ1AClB CECClC?1D`D`@@D`@CCCD`C@>1C`BCQ2Cl@CCh ClC,BCK2/D@D@?1TBCZBpB$3>CڃCڂCZ݀%~CڂBpB$;> "DbC?#1CHCHCBBh3>CClDhC,B$BP;>%}%~Dq"Cڂ%~?U1CZBCW2D@D@CQ2@/ CEDrClC ?1C`CQ2+ACh"ClCCBCK27AD@D`?1CZ,B$3>BmCڂ%}݀CڂCڂCHB$;>B] Cl@DbC @1CHBhnBClD CECClC,CZBPB%~CڃDq"CZ%~%~@$1TBBBD@D@C`@/BACDrh @21+ABB "DbCBCZ7ABBD@D`@$0,BmB%}%~Dq"Cڂ%~CHCH*B]BCl@CCh ClC@0n BBm CEDrClC CZB BB]CڃCڂCZ݀%~Cڂ?0,BBAD@D`C`A_BBACClDhC?0wBBB Cl@DbC CZ,BBB/D@D@?U0BBpB%~CڃDq"CZ%~%~CH*BpBCh"ClCC?#0CH CBBm3=CDrh B B#B];>Cڂ%}݀CڂCڂ?0,CZCW2AD@D`A_CQ1AClD CECClC?R'D`D`@@D`@CCCD`C@?2'C`BCQ2Cl@CCh ClC,BCK2/D@D@?@(4BCZBpB"3>CڃCڂCZ@%~CڂBpB";> "DbC?c(fCHCHCBBh3>CClBhC,B$BP;>%}%}Dq"Cڂ%~?(CZBCW2D@D@CQ20/ CEDrClC ?(C`CQ2+ACh$ClCCBCK27AD@D`@(CZ,B$3>BmCڂ%}݀CڂCڂCHB$;>B] ClBDbC @A(fCHBhoBClD CECClC,CZBPB%~CڂDq"CZ%~%~@d(4BBBD@D@C`@/BACDrh @r'+ABB "DbCBCZ7ABBD@D`@d',BmB%}%}Dq"Cڂ%~CHCH*B]BCl@CCh ClC@A'nBBm CEDrClC CZBBB]CڃCڂCZ@%~Cڂ@'e,BBAD@D`C`A_BBACClBhC?'WBBB ClBDbC CZ,BBB/D@D@?'eBBtB%~CڂDq"CZ%~%~CH*BtBCh$ClCC?c'CH CBBm3=CDrh B B!B];>Cڂ%}݀CڂCڂ?@',CZCW2AD@D`A_CQ1AClD CECClCRick CattellMark Brown-2W-/6/62W-2W-2W-/D@D@-2WD@D`-/6/D@D@-/wD@D`6r/6r2WD@D`6/D@D@627-27D@D`62wD@D@2R2W2R/D@D@22WD@D`.,.)W6)W6,.,.2,.2)WD@D@-,D@D`.)w6)wD@D@.)7D@D`6)W6,D@D`6)WD@D@6+.+D@D`6,7D@D@2+2)WD@D@2r+D@D`ofofisis5*E(A B4@"@3;C'$O@BC8QDeڼ)s9BܩC ZC9D`B@B`:*EI|Wƌ(@@~CPCcC@Doa8:A@BDODSo''@<)D?K,@Q A~xCr<!SC@Cr6D[@[;`~A CcD`UCO >)RD[oӽB@B@C1A{7bCο@C)D[B@ %@CDi8Cެ</0?\B??&耽tCDlÀBEoDBALo@C9B|2fC8+/746ZW ཆCZ)E8(CDhBDE̒DVө6PC B]PCB2<sC&v.p;,D{C~'AQ@DAD^FAِCECB88#C.*WpdDCAAo1DƀBpDJ(DXcEpc h*AD€A @C䀽<@9B(C**DW0HCspC|B DU5@DMʀEmT#jC@D{n n8C9@BpC:$+DpCD˽. CoDc@C @Es뾶CP LCfCC>5@B C> -q*CĪ@h., CpDc@C L Ddc5V?I^?Ϋ?CCx@ JCoC@ConferencePapersub-domain of Documentwhich can participate in a "presentation" relation.AuthorrelationPresentationrelation8REpXN CbCB03@C h뻣PD]C`AD@D_@;D{u:C~rBbsLBg,pBAC:C{0l C: Ub[NBb BqAqHCKDl@B0~<sBR Bo--DrC=VDIWDivBRPBMcBN+C]CDAG@C]T6>F HA(B6$,"Dk}Cڀ6`=DsBEIVмCҶC@dJB4CDRTB3꺬 8Cf 6D^C6>s!]DϢEB{xCQ7pBCDSB B; {G?(8gTDQC̀.`Do*rn@\@X )C@$CπCC@$& E޼֘B\BG?CDn:i Q(Dκ;TA CݺCCCu"/ZBCCu!EzDuDJd,_4CDl40 $Ce^ABπAc`CȽCR@CȾ2DihfBNpBxAe @I`CЀDr&.`!,$ D«wCRCW@5C(CC'XDӲB:CKB`b7BCRDs\AyP7B"b"EI@CC-TCm7j@*qCȖCma/D%% CWPCSAf$# CDpy@Cf(#$ EZiA`RAs)"!@Dp(C@)!CuC~pAcPApCϺC#Cz"CϺ1J?AٳBYBB5DsSC&DjDǨQABEBI-XCCClL@!C03ѻ B}hBjA٘ CMDqCg @D8DB伸$BlB(BF0Aw0CUC'A``5؀CU`/eм.CE3BJAHC܀DkBDBC奚3hvCBDVDK @vEgD~=#Aw@Bw0C(!BC# LDUBɜ!'ؽ=!RBDTDO R`E». hBaA`C@CBC+$DZ;' ѐA&BNB:DR] DPEfr2b-A@9`C BCACM SIGMOD 1982The Cedar DBMS: APreliminary Report.87.5w65w687.87.287.25wD@D@-87D@D`.565D@D@.5WD@D`65w687D@D`65wD@D@68.8D@D`68WD@D@28725D@D@2R87D@D`?77D`D`@@D`@CCCD`C@>77C`BCQ2Cl@CCh ClC,BCK2/D@D@?7tBCZBpB$3>CڃCڂCZ݀%~CڂBpB$;> "DbC?#7CHCHCBBh3>CClDhC,B$BP;>%}%~Dq"Cڂ%~?U7CZBCW2D@D@CQ2@/ CEDrClC ?7C`CQ2+ACh"ClCCBCK27AD@D`?7CZ,B$3>BmCڂ%}݀CڂCڂCHB$;>B] Cl@DbC @7CHBhnBClD CECClC,CZBPB%~CڃDq"CZ%~%~@$7tBBBD@D@C`@/BACDrh @277+ABB "DbCBCZ7ABBD@D`@$6,BmB%}%~Dq"Cڂ%~CHCH*B]BCl@CCh ClC@6n BBm CEDrClC CZB BB]CڃCڂCZ݀%~Cڂ?6,BBAD@D`C`A_BBACClDhC?6BBB Cl@DbC CZ,BBB/D@D@?U6BBpB%~CڃDq"CZ%~%~CH*BpBCh"ClCC?#6CH CBBm3=CDrh B B#B];>Cڂ%}݀CڂCڂ?6,CZCW2AD@D`A_CQ1AClD CECClC56E?~BA3A4CAڰB%\C<~A3@BgAFCDacAZ`9Q6EC` CYBL9@CBMpoC$xZ'ByV| %PC`DbAڤ 41ECr8@PAECpC@'C.:H`DUcfBc@"BvDJDW90EЃBuDRBb^CB7(C@0A޻NW@A#CpClD`@@Nori Suzuki/^6Y02BaGA@AS*`gB<Ds C}cg@EDD03Cgs@6̽* B-CCC\P CC@-"5DCd6DC67Ds[CEu.-.$+c4'&$9,"nR "*&D4y"4+*O;:$ 6 $4 Z"0B@/Ddr >Q\ODESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL54InsertData: PROC = BEGIN t: Relship; tty.PutF["Inserting data...\n"]; cedarPaper_ DeclareEntity[ConferencePaper, "The Cedar DBMS"]; cypressDoc_ DeclareEntity[Document, "Cypress DB Concepts & Facilities"]; thesis_ DeclareEntity[Thesis, "An Analysis of Priority Queues"]; sigmod_ DeclareEntity[Conference, "SIGMOD 81"]; rick_ DeclareEntity[Person, "Rick Cattell"]; mark_ DeclareEntity[Person, "Mark Brown"]; -- Note we can create entity and then set name... nori_ DeclareEntity[Person]; ChangeName[nori, "Nori Suzuki"]; -- Data can be assigned with SetP, SetF, or DeclareRelship's initialization list: t_ DeclareRelship[presentation,, NewOnly]; SetF[t, presentationtOf, cedarPaper]; SetF[t, presentationBy, mark]; SetF[t, presentationAt, sigmod]; []_ SetPList[cypressDoc, authorIs, LIST[rick, mark]]; -- the Cedar notation LIST[ ... ] defines a list []_ SetPList[cedarPaper, authorIs, LIST[rick, mark, nori]]; []_ DeclareRelship[author, LIST[[authorOf, thesis], [authorIs, mark]]]; []_ SetP[cypressDoc, publDate, I2V[1982]]; []_ SetP[thesis, publDate, I2V[1977]]; -- the I2V[...] calls needed because Cedar Mesa does not yet coerce INT to REF ANY -- Check that thesis can't be presented at conference: ok_ FALSE; t_ DeclareRelship[presentation]; SetF[t, presentationOf, thesis ! MismatchedAttributeValueType => {ok _ TRUE; CONTINUE}]; IF NOT ok THEN ERROR; END;DestroySomeData: PROCEDURE = -- Destroy one person entity and all frog entities BEGIN flag: BOOL_ FALSE; tty.Put[char[CR], rope["Deleting Rick from database..."], char[CR]]; DestroyEntity[DeclareEntity[Person, "Frank Baz", OldOnly]]; DestroyDomain[Frog]; END;PrintDocuments: PROC = -- Use DomainSubset with no constraints to enumerate all Documents BEGIN doc: -- Document -- Entity; authors: LIST OF Value; es: EntitySet; tty.PutF["Documents:\n\n"]; tty.PutF["Titleauthors\n"];)\rptF ptpt ptptpXtFptp{TuX{R{Q"{O7?{MlJ{KB{I1{H .{FB,{Dw3{B{@"{?S{=M,{;){9"{7${6"7{4X4{2={0I{.,{--({+cV{)8{' {&"{$8{"n={ {{C{y4{{F{={N{{ { $D{ Y{{{{/{d6  M 6]APPLICATION EXAMPLE55 es_ DomainSubset[Document]; WHILE (doc_ NextEntity[es])#NIL DO tty.PutF["%g", rope[GetName[doc]]]; authors_ GetPList[doc, authorIs]; FOR al: LIST OF Entity_ NARROW[authors], al.rest UNTIL al=NIL DO tty.PutF["%g ", rope[GetName[al.first]]]] ENDLOOP; ENDLOOP; ReleaseEntitySet[es]; END;PrintPersonsPublications: PROC [pName: ROPE] = -- Use RelationSubset to enumerate publications written by person BEGIN p: Person_ DeclareEntity[Person, pName, OldOnly]; authorT: --author-- Relship; rs: RelshipSet; first: BOOL_ TRUE; IF p=NIL THEN {tty.PutF["%g is not a person!", rope[pName]]; RETURN}; tty.PutF["Papers written by %g are:\n", rope[pName]]; rs_ RelationSubset[author, LIST[[authorIs, p]]]; WHILE (authorT_ NextRelship[rs])#NIL DO IF first THEN first_ FALSE ELSE tty.Put[rope[", "]]; tty.Put[rope[GetFS[authorRS, authorOf]]]; ENDLOOP; tty.PutF["\n"]; ReleaseRelshipSet[rs]; END;tty.Put[rope["Creating database..."], char[CR]];Initialize[];DeclareSegment["[Local]Test", $Test, 1,, NewOnly];OpenTransaction[$Test];Initialize[];InsertData[];PrintDocuments[];PrintPersonsPublications["Mark Brown"];DestroySomeData[];PrintDocuments[];CloseTransaction[TransactionOf[$Test]];tty.Close[];END.gWptF pt9pO7uXMl$KwI%H DFB8Dw B@=M.;C9736"4X20.;--7+b2))'8&-$8 "m x0 2N   ' # Y' d +oX,575. Data model design issuesA number of decisions were necessary in the design of the Cypress data model described in thispaper. A few of these decisions were made arbitrarily because no solution was obviously best. Forthe most part, however, the criteria of simplicity and utility discussed in Section 1, as we interpretthem for the applications we envision, clearly point towards the model we have developed. Thedecisions that led to the Cypress model chosen are discussed in this section.5.1 Relations and attributesThe Relational model, as described by Codd[1970], was the obvious starting point for the Cypressmodel. The paper by Codd is easily the most referenced paper in the database literature and theRelational model is widely acknowledged as simple yet reasonably powerful. There have actuallybeen a number of interpretations of "Relational model" in various implementations (such as SystemR by Astrahan et al [1976] and INGRES by Stonebraker et al [1976]). However, allimplementations share the most basic idea of a database as a set of relations whose columns areattributes and whose rows are tuples, and most of them share the idea that the tuple attribute valuesmay be strings, numbers, and booleans.The fundamental type constraint, which states that all the relationships in a relation have the samenumber and types of attributes, is almost the same in all data models defined since the Relationalmodel. The wording changes somewhat with the introduction of entities or type hierarchies, andsome additional constraints are added with the introduction of keys, names, and normalization. Wewill discuss these minor differences as we contrast the various data model features.5.2 Entities and domainsCodd [1979] introduced the idea of referential integrity: that attribute values come from domainssuch as "People" or "Parts." However, few implementations or theoretical studies of the Relationalmodel carry this idea through to enforcing that persons or parts with unique names actually existwhen referenced and that all references to these domains in a database are consistent. Thus, weintroduce the concept of an entity to provide the important function of recognizing the objects weare trying to represent in the database. The entity idea frees applications from the integritychecking necessary in the Relational model on strings and numbers that are being used as uniqueidentifiers for objects. We are used to the idea of objects as distinguished from their name,attributes, and relationships to other objects; the introduction of entities thus fulfills the simplicity(understandability) as well as utility criteria.Gfp ^q Zp S X%> Vf TVN QM K>r Gbp9' E- U BQ @Q >,Q <\,3 :'@% 7& 3p-7 19) /hN -3S *T $r ?p#sp  :) [ 00 mD 9Q P M L 0 X =\2DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL58The introduction of entities has the disadvantage that it tends to "cast in stone" the decision todefine a particular real-world object as an entity as opposed to a datum value (or vice versa). Forexample, if it is decided that the publisher attribute of the author relation is a string datum (thename of the publisher), and later it is changed to be an entity instead, then programs referring tothe author relation must be modified. If a client uses the SetFS / GetFS operations, however, it ispossible to obtain or even change the publisher without knowing whether it is an entity or datumvalue. This feature allows an application additional data independence at the expense of ignoringknowledge of entity existence and entity types.The introduction of entities encodes more knowledge about the structure of the underlyinginformation than the Relational data model. We can use this knowledge to provide betterperformance, and we do so in the Cypress Database System. For example: (1) entities indicatewhere physical links are desirable, (2) it is not necessary to store or use artificial numbers to identifyobjects because unique entity names provide this, (3) entities (atoms) may be stored andmanipulated more efficiently than strings or integers (e.g., string matching person-names is moreexpensive than checking entity equality).Note that the Cypress data model includes the Relational model as a special case in two senses:1.If no domains are defined (i.e., all relationship fields are datum values), then we haveexactly the Relational model: there is no domain type checking (e.g., that a value is aperson's name as opposed to just a string). Any correspondence between strings in onerelation and strings in another, and their correspondence to objects, is left to the applicationprograms. 2.The Relational model can be exercised on top of the Cypress data model even in thepresence of entities by using only the translucent attribute operations. These operationstreat all fields as strings as in the Relational model, but entities and any underlying physicalstructures required by the implementation are automatically maintained as well. Forexample, a new entity may automatically be created by using SetFS on a relation tuple.Thus different users may examine and manipulate the same database as datum values(Relational model), links (graph model), or both (our model).The introduction of entities to the Relational model is now fairly widely accepted as a good idea,and Chen [1976] is probably most responsible for spreading this notion. Unfortunately, Chenintroduces entities in a framework which is more complicated than it needs to be. Entities extendthe representational capability of the Relational model in one basic way: they make it possible todistinguish between database items that represent objects and those that represent properties of (orrelationships between) objects. Chen uses entities for this basic function, as unique "atoms."vfptF ptpt ptptpXtFpt Mp _R \@$ ZY XxM VDQ T?! Qb O`/ Kp= IPF G(6 D'C BT @~a >) 9p_U6pS3;1O/hI,pU)>&O $E"sP ? L A= pO H K(: c H g_ M =[DATA MODEL DESIGN ISSUES59However, Chen and others have combined this function with two other data model features:1.Treating entities as relationships, by allowing entities to have attributes just as relationshipsdo. Chen does this, and most others have followed his lead.2.Treating relationships as entities, by allowing attributes of relationships to reference otherrelationships as well as entities. Chen does not do this, but others (McLeod [1978], Codd[1979], Smith [1977b]) do, as do the older hierarchical and network models.Combining the basic idea of an entity as an atom and a relationship as a data record can lead tosome confusion. In the next two subsections we specifically discuss the merits of (1) and (2),respectively. 5.3 Entities as relationships: "attributes" of entitiesChen [1976] and others allow entities to have attributes. That is, they define an entity to have thefeatures of a relationship: entity-valued or datum-valued attributes. This definition divides theinformation about an entity into two classes: the information that is given by its attributes, and theinformation that is given by relationships. For example, age might be an attribute of a person,while the projects he works on might be represented by member relationships between persons andprojects. In many cases, the choice is not clear: should spouse or boss be attributes of persons orrelationships between persons? Should birthday be a quaternary relation between a person, month,date, and year, or should the birthmonth, birthdate, and birthyear be attributes of a person? Thesedecisions must be made at data definition time and changes later would necessitate changingapplication programs or providing some kind of translation mechanism. Applying the simplicity and utility criteria to the entity attribute question suggests that:1.[Utility] Entity attributes provide a short-hand making application programs and userinterfaces simpler: the age of a person can be obtained in one operation instead of two, theextra operation being fetching the relationship. There could also be a performance issuehere, but the database system can store relationships physically co-located with the entityabout as easily as it can co-locate attributes, so this is not an issue in practice.2.[Simplicity] Entity attributes introduce additional complexity, both in understandability andthe parsimony of the representation. For example, there are now two different mechanismsto represent most facts about entities. There are also indirect, but perhaps more annoying,effects: the syntax and semantics of the query language will be more complicated and/orverbose, if we must handle both cases.!dfptFptptptGp ^XUZp/2Xx<UTp^Rh1)OK LpE IV Gb p @r8 ! Osp Kp#: I^ Gb01 E-'< B] @J >6' <\ Z :'C 7 p 37% 1s psp&s p /hs ps psp -3T *rp0 (sp @ &)6 $ag "- U 4 rD 9pe W sp7 16 ' M >\2#DATA MODEL DESIGN ISSUES61The single object type has the attractiveness of simplicity. One still needs the same operationspresent before on relationships and on entities, but there is only one type to deal with.Unfortunately, it falls back a bit on the utility criterion, as we lose some features that theentity/relationship distinction was providing us. In addition to the obvious fact that the user can nolonger tell which data items are thought of as real-world objects, the system cannot tell thedifference and so cannot:1.automatically check that objects intended only as relationships are never referenced 2.perform optimizations that depend on knowing whether a database object can ever bereferenced elsewhere, or3.print the database in a simple linear or human-readable way, as it can when only entitiescan be referenced and all entities have names (or names can be invented).Consider the motivation for dropping the entity/relationship distinction. Some relationships, say apurchase relationship between a customer, part, and quantity of the part ordered, may act as anentity in and of itself. For example, the purchasing may take part in a commission relationshipbetween the purchase, a salesman, and a commission in dollars. If we drop the entity/relationshipdistinction, this kind of relationship can be added with no change to the existing data schema, asthe previously unreferenced purchase relationships can just as easily be treated as entities. Keepingthe entity/relationship distinction, such a change would require modifications to databaseapplication programs or at least the use of some intermediate interface to shield them from suchchanges (such as translucent attributes or views).On the other hand, we can argue that when we choose to treat a relationship as representing anabstract object in and of itself, we should then and only then introduce a domain to represent thatkind of event or transaction. Quite simply, that's what an entity is for: it says that we are thinkingof this database item as an object, that can participate in relationships.In a more practical and perhaps more convincing vein, not allowing explicit references to therelationships in a database permits a much simpler query language, since it need not deal withnesting of relations. Relationships referencing relationships provide a potential spaghetti ofinterconnections. In particular, as noted in earlier sections, we may still use a relational querylanguage in our data model, despite the introduction of the entities, names, domains, subtypes, andso on. This compatibility is possible exactly because of the dichotomy we introduced betweenrelationships (objects that may reference other objects but may not themselves be referenced), andentities (objects that may be referenced but may not reference others). We have a simple, two-levelstructure. !dfptFptptptGp _01 \#6 ZG XxR sp VD+sp SUOpTDpUK$.IPUEtpBBI ?p<( [DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL62In applying the simplicity and utility criteria here, the choice is difficult. Our choice, to keep theentity/relationship distinction, was motivated by the observations that (1) dropping the distinctionresults in no fewer operations either in the database interface or the application program, it merelycondenses two types into one; (2) there is a loss of some integrity checking and data semantics bydropping the distinction; (3) a linear and human-readable notation of entities and relationships,using entity names, is possible; and (4) the query language is simplified by permitting only oneinformation-carrying element, relations.There is a variation on dropping the entity/relationship distinction that allows references torelationships only in a non-cyclic hierarchical fashion; this middle ground seems to be morecomplex and/or less powerful than either of the extreme positions, however.5.5 Lists and setsA few data models permit set values for relationship attributes. We discussed this in the previoussubsection in the dual form of multi-valued attributes of entities. Both have the same drawback inmaintaining consistency of the database. Both violate Codd's [1970] first normal form designed toavoid such inconsistencies.On the other hand, we have found it useful in database applications to have some mechanism torepresent the concept of a list, an ordering upon database entities or relationships. Almost none ofthe data models of which the author is aware provide any help in representing the concept of a list.The standard way to represent ordering in the Relational model is by adding another attribute tothe relation, specifying the ordinal position of the tuple. A list thus constructed can only beenumerated efficiently if an index is built upon the attribute, and it is quite awkward to insert newtuples in the middle of the list.We do not claim to have solved the list representation problem in the Cypress data model, either.However, in practice we have used the list representation our implementation provides: the ordermaintained on references to an entity. When a RelationSubset is executed with an entity-valuedattribute-value pair [a, e] as the constraint, the tuples that reference the entity are returned in aparticular order. This ordering is guaranteed by the group lists we will describe in Section 6. It isthe same order as the entities were specified in the call to SetPList, or in the order they werecreated if SetP or DeclareRelship calls were used. We are experimenting with an optional extraargument to SetF which specifies where in the list the new tuple should be inserted when a newreference to an entity is established:vfptF ptpt ptptpXtFpt Mp _52 \B" Ze XxC VD&; T$< Q( Mp T KQ I K Blr >pZ <\H :'D 7 3p6' 1sp 9 /h&> -3A *-3 (I &O! "spO ?"s p  u p" upJ I m#up 9 upu p-  upN & MB>WDATA MODEL DESIGN ISSUES63SetF: PROC[t: Relship, a: Attribute, v: Value, after: Relship_ NIL];Using this mechanism, our applications can define binary member relations as connectors to entitiesthat represent ordered (or unordered) sets. Elements can easily be added and removed from a setwith SetF or SetP, and the sets can easily be enumerated with RelationSubset.However, we have not advertised this feature in the description of the data model or the DBModelinterface because it is only a tentative solution. Not only does this solution depend upon theimplementation of the entity references via linked lists, but the feature is not accessible at all from ahigher-level Relational query language. More work is badly needed, to extend data models andquery languages to cover the concept of ordering in a clean and integrated way.5.6 Entity namesThe introduction of entity names to the data model is a comparatively easy choice. They easilysatisfy the criteria of simplicity and utility. What is less clear is the form in which they should beintroduced.For the applications we have encountered, nearly all real-world objects have human-meaningfulnames that are unique within their domain or can easily be made to be so: people, organizations,articles, programs, calendar years, or electronic messages. The natural real-world name is preferredto an internally generated identifier for the object, if the real-world name is guaranteed to beunique, because the name is the way in which a user will typically identify an object to thecomputer and represent the object outside the database.A number of applications use names that logically have multiple parts. The name of a person, forexample, might consist of the concatenation of a first, middle, and last name. Some applicationshave dependent names. A dependent name is a multiple-part name one of whose parts is anotherentity. The name for a file system subdirectory, for example, might be a two-part name consistingof the entity for its super-directory plus the string name of the subdirectory. The name of a wine,e.g. a 1979 Sebastiani Zinfandel, has three parts: the vintage year, the winery entity, and thevariety.The ubiquity of multi-part and dependent names easily satisfies our utility criteria. They probablyalso satisfy our simplicity criteria, but this is less obvious. Closer examination reveals that theimplementation of names must be reasonably complex in order to satisfy typical applications' needs.Different separators for the name parts and different conventions for their combination and sortorder may be required: e.g., "Lastname, FirstName Initial" for persons, or"SubDirectory>" for file system directories. Also, the data schema mechanism!dfptFptptptGp _uXD [:p /sp$ Y` Vupup-u p Rp3- Pz3, NFi LR IO Br ?p? <42 :n 6p0- 4^"? 2)e /)7 -.. +E7 'ip V %5 R #s p0 I .6 c U   p5/ ); L oK ;; ] >]LDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL64must define the name in terms of independent existing relations. That is, one probably does notwant to think of the name components as "attributes" of an entity in the sense that relationshipshave attributes. The relation between wines and wineries or the relation between organizations andtheir parent organizations are queried independently but are also used in derivation of wine andorganization names.Our conclusion is that the data model should allow for a procedural specification of the namederivation so that names can automatically be updated when data change, rather than allowing somefixed name derivation primitives. This allows an arbitrary definition of names, just as views mayallow arbitrary relation definitions. It also removes the need for the database system to understandnames at all, since it treats the name function as a "black box." Until storage of proceduralinformation is easier in the Cedar programming environment, however, the name function has beenomitted from the implementation. One could imagine applications that use integers or dates as entity names rather than strings.Again, this variation could be handled by a procedural specification of the name derivation, and wefound insufficient demand to explicitly incorporate this into Cypress.Real-world objects can have more than one name, so it might be reasonable to allow this in thedata model as well. Multiple names were omitted from the Cypress model since the user can buildan alias relation on top of the system that is checked whenever names are looked up. The onlycommon example of aliases we have encountered are for person entities, so name aliasing does notsatisfy the utility criteria. However if multiple names are used widely it does make more sense toaugment the model. In that case it is still desirable for every entity to have a primary name toallow for correct operation of GetFS and SetFS.Another more difficult design decision for names in the data model is whether to allow null namesfor entities (Codd [1979], Kent [1978], Date [1981]). Without the guarantee that every entity has aunique identifier, a number of database operations become more complex in both semantics andimplementation: expression of queries, printing of entities, dumping of databases, cross-databasereferences (augments), data entry, and maintaining the correspondence between relational andentity-based database access (GetFS, SetFS). Unfortunately some applications do not requirenames. For example the transistors in an LSI layout may be represented as entities accessible onlythrough their relationships to connected entities in the layout. The best solution seems to be torequire that entities have names, but provide a default system facility for generating unique names.The GENSYM facility for generating unique atoms in LISP does this. The DeclareEntityprocedure in our implementation acts similarly. Thus the data model's invariants remain simple butadd no additional complexity to application programs not requiring names. vfptF ptpt ptptpXtFpt Mp _A \S Z@# XxW U R"p8% O-4 MX K)< IPH GV D" p @#; >6- <F 89pZ 6M 3sp8 1D /hK -3& "sI ?3/  A upup/ L mc 9-7 Gu p9* UI:r  M>Y0DATA MODEL DESIGN ISSUES655.7 Keys, normalization, and dependenciesAny kind of key mechanism is only an approximation to a description of arbitrarily complexdependency relationships between entities and datum values involved in relationships. Functionaland multi-valued dependencies and their implications for relational normalization have beenextensively studied in the literature (Codd [1970], Fagin [1977, 1981], Sadri & Ullman [1980]). Keyshave the advantage of simplicity over a more general, open-ended scheme requiring constraintchecking on a list of axioms on each database update: 80% of the checking can be handled with20% of the complexity. We are forced once more to apply our utility constraint: how complex akey mechanism would most database applications use? Uniqueness of entity names in domains?Single-attribute primary keys for relations? Multiple-attribute primary keys? Optional keys (uniqueif present)? We include all of these mechanisms in the Cypress data model, but not arbitraryassertions. Normalization is typically not enforced by a database system. The database administrator attemptsto design the data schema in a manner to minimize various anomalies, through normalization. Thesubject of database normalization, as noted in the previous paragraph, has been extensively studiedin the literature. It is my opinion that arbitrarily complex kinds of normalization and dependencieswill continue to be discovered. The easiest solution, therefore, is to take the extreme position:adopt the convention of normalizing databases as much as possible at the conceptual level. Wedefine functionally normalized to mean that relationships in the model are single irreducible facts.The result in practice is that most relations are binary; the physical database representation may infact join relations in storage for efficiency, but this optimization is not visible at the client'sconceptual level.Hall et al [1976], Biller [1979], and Schmidt & Swenson [1975] argue that the irreducible relationalform we are advocating is preferable to joining unrelated functional dependencies into a singlerelation. There are several arguments: (1) this is easier to understand because the relationalrepresentation matches the real-world dependencies, (2) if there is efficiency in actually joiningthem, this can be achieved by storing the join in the physical representation and defining the realrelations by views that are projections of this underlying relation, (3) this representation is more"restrictive" than any normal form that could be derived. The irreducible form has therefore beenused by convention, although not enforced, in the Cypress system applications. Features of thedatabase system and associated tools make this form easy to use.Our definition of irreducibilty is not the same as irreducibility to relations whose join can reproducethe original relation, as in Hall et al [1976] but rather what Biller [1979] calls s-irreducibility. Billerproves that our definition is more restrictive.We backed off one step from irreducibility to what we called "functional irreducibility:" in this!dfptFptptptGp ^r) ZpC X3. V[ TV%@ R"G O^ M*5 K.- IP] G> D (p @R >W <\F :'C" 7,6 58( 39+ 1U14 /!S , (pd &X $aP "-H 12 R L [[ @ p#sp 5 >. U/ ypZ : 2>]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL66form we allow two or more relationships to be combined into a single relationhip when they mustlogically all be present or all be absent for each external (real world) entity. In practice we findthat functional irreducibility differs from irreducibility only when we separate components of asingle real world value into its constituent parts. For example, the birthday relation given as anexample of a functionally irreducible 4th-order relation in Section 2.7 would have been a binaryrelation had the month, day, and year been combined into a single "date" value. Kent [1982]discusses the issue of combining or separating components of values or entity names.The last topic we cover in this section is "existence dependencies" between entities andrelationships, in particular the fact that DestroyEntity is defined to include the destruction ofrelationships that reference the entity. One could imagine other semantics:1.When an entity is destroyed, the attributes of relationships that reference it could becomeNIL.2.We could disallow the destruction of entities altogether, destroying only relationships.3.When two database applications share entities of a particular domain, e.g. both a messagesystem and a phone directory share information about people, deletion of an entity by oneapplication should only delete relationships pertaining to the entity for that application, notthe entity itself. Alternative (1) was ruled out on the basis of our utility criterion. For the database applications wehave and envision, the desired semantics is to destroy all information about an entity when theentity is destroyed. Alternative (2) is attractive, as it simplifies the model by reducing the numberof operations we must understand and implement. However, it was also ruled out on the basis ofour utility criterion; there are cases where an application program really needs to remove an entitycompletely from the database, e.g. when old data is purged. Furthermore, the author knows nosimple implementation for this alternative not requiring "garbage collection" of entities created butno longer used.Alternative (3) is an important issue, as the support of applications with overlapping databaseschemas is crucial. Of course, applications that share domains will have to be in agreement as towhich relations belong to whom anyway, and could delete only their own data. But a morepowerful mechanism that could delete the relevant data in one operation is desirable. The solutionto that problem, in the opinion of the author, is to use the data segmentation mechanisms weconsider further in Section 5.9. These mechanisms not only allow physical independence of theapplications, but deletion of the entity representative of a real-world object in one segment has noeffect on the entity or its relationships in another segment.vfptF ptpt ptptpXtFpt Mp _D \T Z` Xx3sp VD` TP QT Mp !P KR I LUE-p;BU?XU;A2'9 =6_4 0f .[ ,_(> *+/0 'S %] #L !Y }8' I?# 7! Q ": w^ CN = X M =ZDDATA MODEL DESIGN ISSUES675.8 Generalization and type hierarchiesA wide variety of mechanisms have been forwarded in the literature for specifying types andsubtypes of data objects, or, equivalently, allowing the same object to participate in more than oneclass of relationship. These mechanisms have variously been called type hierarchies, type hierarchieswith multiple inheritance, type lattices, roles, and flavors.There are two major data modelling questions to address: how powerful a type-subtype mechanismis desirable, and whether it should apply to domains, relations, and/or datatypes. Both of thesechoices are difficult, and were made on the basis of the applications and schemas of the applicationswe envision. At least some sort of type lattice ranks high on our utility criterion, as many applications definerelations that are constrained to a subset of the entities of a type to which more general relationsalso apply. A database of documents for example, would separate those documents that areindividual works (books, articles, technical reports) from those that are collections (journals,conference proceedings, or collected books).Although we can construct examples where more complex type mechanisms such as roles or flavorsare desirable, for example treating a person entity as either a homeowner or employee or both, noneof the applications we envision would use this generality and the more complex mechanisms areharder to understand as well as more complex in their implementation. The addition of multiplesupertypes to the basic type hierachy mechanism, on the other hand, is a relatively simple conceptto understand and implement, and handles most data schemas the more complex mechanismshandle. Thus we arrive at the type hierarchy with multiple supertypes.Yet another variation on the type-subtype mechanism is to define subtypes by a predicate computedat run-time from data in the database (McLeod [1978]). For example, an entity would be defined tohave type mother if it has type person and also has a parent relation to one or more children.Computing types at run-time unfortunately introduces much of the complexity of arbitraryconstraint satisfaction tests earlier rejected in our database normalization disucssion. It also providesmore power than called for by our applications and the utility criterion. Subtypes computed bypredicate were therefore omitted from the Cypress data model. In their well-known paper on generalization (type hierarchies), Smith and Smith [1977] apply theconcept to all kinds of database objects, and we would naturally apply it to relations, domains, anddatatypes. This idea certainly satisfies our simplicity criterion, as it makes sense to applygeneralization universally in the data model if at all. In our implementation, however, only typehierarchies of relations are not allowed: these do not seem particularly useful to applications. Sinceuser-defined datatypes are not currently permitted, there is no need for a subtype mechanism forthem, either. However a subtype mechanism for datatypes does satisfy our simplicity and utility!dfptFptptptGp _r' [:p!: Y Y VH T= PG N)8 LX Z J#  FH<' D?% A7" ?:& =v, 9^ 7fQ 51W 2J 002 .-) ,_G (@sp &O7+ $D ! 9 19 }6) I=3g m` 9D  S !A  Z g<$ 3!sp# , >^$DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL68criteria when user-defined datatypes are defined. The hierarchy of relation types is difficult toimplement when multiple supertypes are allowed and attributes of relationships are stored in aphysically contiguous memory, because the multiple supertypes may define overlappingincompatible fields. Furthermore, the hierarchy of relation types seems to be of less utility.The effect of subdomains could be achieved by allowing relation attributes to accept one of a list oftypes instead of by using the predefined SubType relation to define a lattice of types, specifying the[one] least common domain for each attribute. These two alternative representations do not differmuch in flexibility of the type mechanism. The former might be better if we found it necessary inthe latter scheme to define a large number of "artificial" intermediate types in order to achieve thetype constraints we desire. However this has not been the case in our experience. The single typefor attributes might be simpler to understand, but the argument is not a convincing one. Thereforethe choice we made, to use the Subtype relation, was arbitrary.5.9 Access primitivesMost database management systems (INGRES and System R, for example) provide an interface totuples (relationships or entities) at a coarser grain than ours. They do not allow tuple-valuedvariables in a client program, on which operations such as GetF and SetF can be invoked to extractor assign particular values. Instead tuples are explicitly copied from a database into buffersallocated by the client, and explicitly copied back on an update. The difference between ourapproach and theirs is subtle, but it may have a major impact on the design of an application.Because the "handles" for the tuples in our scheme are garbage-collected, the client need not keeptrack of the number, logical source, allocation, or size of variables referencing database objects. One additional complexity is introduced by the granularity of database access in Cypress.Specifically, the problem is not a result of our choice of "call by reference" over the "call by value"in other systems, but that all of the updates to a particular tuple are not necessarily encapsulated ina single database call. In examining the interface to the implementation described in Section 3.4,the reader will note that relationship attributes can either be assigned by calling DeclareRelshipwith a list of attribute values, or by successive calls to SetF. The former is preferable, as itsometimes allows the database system to more efficiently update the data. In addition, the lattercomplicates the implementation and/or the rules for database system use, for example in the use ofthe surrogate relation optimization we discuss in Section 6. For now, however, we have retainedSetF in the implementation as a convenience to the client. In addition to the call by reference/value distinction, INGRES and System R differ from our designin providing a higher level of access to database clients: a query language. Of course, we plan theaddition of a Query level to Cypress (or multiple different query levels). The difference remains,vfptF ptpt ptptpXtFpt Mp _L \\ Ze2f" Xx_ T X RhC# P4b MC Ke IS GbO E-? >r :p3( 8I 6 /upup 4^\'DATA MODEL DESIGN ISSUES69however, that we are willing to have clients access the database at the lower "navigational" level aswell. Our introduction of entities into the Relational model makes a navigational interfacecompatible with concurrent use at a higher level. The introduction of entities also makes possiblequery languages of a different style, e.g. more like predicate calculus, or using a "chain" of domainswith relations as connectors.5.10 Views, segments, and augmentsThere is little question that views are quite useful and also relatively easy to understand, and thussatisfy our utility and simplicity criterion. A more difficult question in the design of views is thepower of the view definition language: should an arbitrary programming language be allowed, or amore restricted specialized query language? We have essentially avoided this issue, by separatingthe definition of the query language from the definition of the data model itself. A particularchoice will have to be made in the second phase of development of the Model Level of Cypress.In addition to the utility of relational views as discussed in the literature, e.g. for data independence,they can be used to encode operations on entities in an object-oriented form. The operations ofhiring or firing an employee or sending a message can be invoked by storing a tuple into a relationwhose attributes are the arguments of this function. For example, if the operation of sending amessage takes two arguments, the message and the recipient, the actual operation would be invokedby storing a tuple in the Send view (relation) with the appropriate message entity and recipiententity as attributes. In general, we have taken a reasonably conservative philosophy in the design of the Cypress datamodel, integrating ideas that have separately been studied theoretically if not implemented instudies in the literature. This philosophy increases the probability of a practical implementation ofthe model. The introduction of augments is an area where we have violated the conservativephilosophy, however, because the functionality of augments seems so important to applications. Augments and segments have roots in the segments of System R (Astrahan et al [1976]). Oursegments are slightly more powerful than theirs, as we allow references across segment boundaries;augments are of course more powerful still. System R segments cannot be used for encapsulatingupdates to data or to combine private and public data. Relations cannot cross System R segmentboundaries.The use of augments to represent versions of data makes them similar to that of the "layers" ofGoldstein and Bobrow [1980] and the "hypothetical databases" of Stonebraker [1980]. None of thesemechanisms have been implemented on a database system scale at the time of this writing. Thesemantics of layers, hypothetical databases, and augments differ slightly at the conceptual level.Augments have the additional feature that they have semantics at a physical level: they can be used!dfptFptptptGp _b \B Z*9 XxR VD Or" Lp=( I_ G10 Et4. C@R A ] =/M :N 8c 69' 4^P 2)R / ,&: )[ 'T %|? #G^G? ?I  <& M M m e,3 1\ O P 'sp L?Q\DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL70for separating data that is physically independent in the face of software and hardware failures, andthe data may be distributed over machines.Transactions provide a mechanism to encapsulate updates to a database. A natural approach toproviding the capabilities of augments in a database system might be to transform transactions intopermanent, full-fledged objects. Perhaps in terms of the simplicity criterion, the elaboration oftransactions has some merit. However, the implementation of long-term and short-termencapsulation of data updates must be quite different and the system would at least require somehint as to which physical model to use for a particular update. We chose in the Cypress data modelto make the permanent versus temporary distinction show through to the client, because there seemsto be little or no utility in avoiding the distinction. The applications we envision requiretransactions and/or augments, but not interchangeably.We currently chose to implement segments rather than augments. A segment allows new data(entities or relationships) to be inserted, possibly referencing entities in other segments. However,data may not be deleted or modified in a different segment than it resides. Furthermore,applications must simulate the effect of relations crossing segment boundaries where desired (usingthe "namesakes" correspondence). Segments handle many of the applications we envision,combining orthogonal databases from different applications. A few applications require augments,however (e.g. a database of different versions of systems). Others are simplified by augments (e.g.,several databases of people with overlapping information).Augments add little or no conceptual complexity: they simply make an entity with a given nameunique throughout instead of existing on a segment basis. Subtractive augments also have someutility even where they currently seem unnecessary: e.g., we might want to make personalmodifications rather than simply additions to a public database. We conclude that both the utilityand simplicity criteria are satisfied by augments. The implementation of augments is considerablymore complex than segments, so was deferred. 5.11 SummaryIn this section we have explained the rationale for our choice of data model. The criteria for ourselection are simplicity and utility, viewed from the perspective of existing or envisioned databaseapplications. Surprisingly, the choice of data model features in Cypress has been reasonably clearcut. Examining the needs of actual applications rather than postulating potential uses has greatlysimplified the choice of model.We started with the relational data model, adding the concept of entities. Entities insure referentialintegrity: the constraint that relations be between objects of specified types. We introduce entitiesvfptF ptpt ptptpXtFpt Mp _J \* YG VD T!A Rh4*5+ P4=# M?$ KN I*3 Gb6 C> ARL ?2' <(; : 1 & 8;& 6K.7 4: 0;S .^ +K )R 'i8s p %5,){ r p30 M R e7sp 1 UJ P b M >Q\2DATA MODEL DESIGN ISSUES71in a way different from most other data models. Some models treat entities as relationships: thatis, they allow entities to have fields just like relationships. We rejected this approach as it providestwo redundant representations of information and thereby violates the simplicity criterion as wedefine it. Some models treat relationships as entities; that is, they allow references to relationshipsfrom other relationships. We rejected this approach because it results in a network "spaghetti" towhich a high-level query language cannot be applied.We augmented the basic entity-relationship model with features of high utility and little additionalcomplexity: relational keys, unique entity names, segments, and a hierarchy of types. We alsoprovide a limited mechanism to represent existence dependencies between data: relationships aredeleted when an entity they reference is deleted, but relationships and entities in separate segmentsare independent. The basic features of Cypress are present in one or more other data models,although this is probably the first model combining all of these ideas.Other data model features appear to have high utility, but more work is needed to provide a usefulimplementation of such features. Augments fall in this category. More complex (multi-part, multi-type) entity names would also be useful, but we know no sufficiently general and efficientimplementation. Cypress has a limited mechanism for representing lists and sets using relationshipsto represent membership, but a better mechanism would be desirable.The Query level of Cypress will provide relational views and a higher level access language. A fairamount of research work already exists in these areas, and Cypress provides the same basis as otherdatabase systems for query and view construction. The Query level has been deferred for now,however.!dfptFptptptGp _] \@) ZC XxF" VD` T4 P45/ M+4 KN I Y GbB E-G AR?# ?c <= :14 8C 4J 2p6- 0;F .d -=:K736. Data model implementation issuesIn this section we describe important aspects of our implementation of the Cypress data modeldefined in the previous sections. In the first subsection, we give an overview of the Cypress designand the storage primitives with which we are dealing. Then we discuss a number of specificoptimizations we have found useful in the implementation. In the last subsections, we describe theimplementation of each of the major operations in more detail and summarize our results.The Cypress DBMS runs on personal computers, in the Cedar programming environment developedin the Computer Science Laboratory. It is written in the Cedar language, a descendent of Mesa(Mitchell et al [1979]) with changes aimed at making Cedar particularly convenient for developingprototype systems.Cypress is built in four levels: the Model level, which implements the data model; the Storage level,which provides basic storage allocation and access structures; the Segment level, which manages thedatabase system's virtual memory; and the File level, which provides transaction-based file access.The File level can be one of a variety of file systems that provide page-level transaction-based accessto disk files. We currently use the Pilot system for files on the database client's machine, and theAlpine file server for remote files. In the latter case, the Model, Storage, and Segment levels run onthe user's personal machine, and the File level on an Alpine file server.The Segment level may communicate with the File level by local or remote procedure call. The Filelevel provides procedures to:1.open, close, and abort transactions,2.open disk files under transactions, and3.read and write file pages.The Segment level provides an LRU cache of database pages from the various segments, reducingnetwork traffic when the File level resides on a different machine. The Segment level implementsthe segment abstraction, including a mapping from 32-bit database addresses to underlying filepages. The lower levels of Cypress (the Segment and File levels) will not be of central concern to us for thepurposes of this report. In the next subsection, we describe the Storage level interface, and in theremainder of Section 6 we discuss the Model level implementation.Gfp ^q$ Zp4) XC" VT TV"A R"X NF/, L*4 I-4 G CS Ac ?dD ;N 9Te 7 \ 4I 1M .U*$U'#'U#G k> 7I ,2  [ e A B>QWDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL746.1 Storage level structuresThe underlying Storage level provides the physical data structures used by the Model level:recordtypes, group lists, and indices. There is little that is particularly new in the Storage level, ituses reasonably well-understood mechanisms (the design is much like the RSS level of System R,Astrahan et al [1976]).Recordtypes are the basic storage mechanism for data. A recordtype is a set of records with fieldsthat can be (1) fixed length blocks of words, (2) variable-length byte strings, or (3) references toother records. Recordtypes are manipulated using recordtype and field handles that describe whatthe Storage level needs to know to perform an operation on a recordtype or one of its fields,respectively. (Specifically, we need to know the byte length of records in a recordtype, and the byteposition and size for its fields.)Both entities and relationships are represented as records. Entities are represented as records whosefirst field is the name. Relationships are normally represented as records whose fields are theattributes of the relation. Each domain or relation maps to a particular recordtype.Every record has a unique 32-bit tuple identifier or TID, which uniquely defines it. The CypressTID for a record contains its segment number, the page number where it is stored in that segment,and its record number on that page. TIDs can be used to find a record relatively efficiently,normally in one page access. The record number on a page is mapped to a record through an arrayat the beginning of the page. A record could be moved to another page without changing allreferences to its TID, simply by entering a forward reference in this array, to its new location on anoverflow page. We do not currently use this feature, however. Groups are doubly-linked lists maintained through records, also known as parent-child sets (SystemR). A "parent" record resides at the head of the group. The other records in the list contain threepointers:next: the TID of the next record in the groupprev: the TID of the previous record in the groupparent: the TID of the parent recordGroups are declared by defining a group field of a recordtype. Every record referenced by one ofthese group fields is the parent of a group list containing all records which reference that record inthat group field.The Model level currently uses groups as follows. If an attribute a is an entity-valued attributedeclared with link=Linked (see DeclareAttribute), it is defined as a group field at the Storage level.Thus, a group list is maintained for each entity e referenced by attribute a of one or more records invfprF prpr prprpXrFpr Mp _s [:p7$ YtptpC V^ T Pt p-+ N Y LX+t ptp J#N Gf E" A Y ? U =vU 95tp 7fP 51I 2O 0D .33 ,_r-p5  (tpCt p &O_ $ ?- 1$  t p4 W  %up u pup' M0upup M ?Q] DATA MODEL IMPLEMENTATION ISSUES75a's relation. Those records (the relationships which reference e) are linked together, with the entitye at the head of the list. For example, an entity of type "person" might be at the head of a numberof group lists, one for the first attribute of the "author" relation, two for the first and secondattributes of the "friend" relation, one for the second attribute of the organization-member relation,and so on. We can quickly find all the author relationships referencing a particular person byfollowing the group list associated with the first author attribute. Group lists are updated by theSetF operation on entity-valued attributes, and are used by the RelationSubset operation to processa query when an attribute value list contains an entity-valued attribute.Indices are B-Trees with variable-length string keys indexing TID values. More than one index entrymay be present with the same key. Indices are maintained by the Model level for the names ofentities in a domain, using the TID to reference the record representing the entity. The B-Treeindices on names are updated by the ChangeName and DeclareEntity operations, and may be usedby the DomainSubset operation. Indices are also maintained as requested by the client forrelations; these are updated by the SetF operation, and are used in the implementation of theRelationSubset operation. We describe DomainSubset and RelationSubset in Section 6.8.6.2 Entities and relationshipsIt is worth noting that Cypress application programs reference database items via handles rather thanthe cursors provided by most database systems. A handle is returned by Cypress whenever anapplication program retrieves a relationship or entity from a database. Application programs mayhave any number of entity-valued or relationship-valued variables, each containing a handle thatuniquely identifies the database object it currently references. Each call to NextRelship, GetF, etc.,produces another such handle. In other database systems (e.g. INGRES or System R), references totuples are made through cursors, with a higher overhead than handles. Client programs may copywhole blocks of data from a client buffer into the database tuple that is at the current cursorposition, or vice versa.We do not claim that handles are any better than cursors; we emphasize the difference because iteffects the design of Cypress. Handles have the advantage of tighter coupling between theprogramming language and DBMS: database items simply appear as values in the programminglanguage. Handles have the disadvantage of requiring garbage collection to determine whichdatabase references are no longer in use. The client indicates this explicitly in a cursor-basedDBMS. As already discussed, the Model level represents both entities and relationships as records. Therecords may be of three internal types: virtual, surrogate, or stored. Virtual records are systemschema entities, e.g. DomainDomain or StringType. Stored records are schema or data records;MfprFprpXr FprGp _up.up& \[ Z.4 Xxf VD\ Td Qup+u p OI Ktp] IF Gb#= E-$u pu p Bu pG @$up >u pu pu p 89s 4^pMtp 2)tp9 /] -@ +Ku pup )W': '#H $2- " ] E uX A6&  )8  L C u pu p L>Q\.DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL76these are the Storage level records we described in Section 6.1. They may represent relations,attributes, domains, relationships, or entities. Surrogate records are used in conjunction with thesurrogate relation optimization, which we describe in Section 6.6.Note that a TID can uniquely define an entity, just as a [segment name, domain name, entity name]triple does. However, TIDs are used only inside Cypress and are never provided to clients. TheEntities (and Relships) returned to the client are pointers to records that contain the TID and otherinformation; the client may not access these data, and the database system keeps a pointer to thisrecord so that the record may later be modified.Keeping TIDs a "secret" in this way has a number of advantages. Note that we can find allreferences to records in running programs, as we keep a list of the record handles in use by clientsof the system. We can find all references to records within databases, as the group lists and indicescan quickly be traversed to find references to a record, and we do not store TIDs outside of groupsand indices. As a result, we can:1.invalidate references to a record when it is deleted,2.move a record and update all references to it (we do not yet do this), and3.re-use TIDs formerly belonging to deleted records.Among other things, these features allow us to get by with TIDs as short as 32 bits, as we need onlyas many TIDs as there are records in the largest possible database. (Actually slightly more TIDs,since there is internal fragmentation due to the fixed maximum number of records per page).We never use TIDs for references across segment boundaries, although they could in theory be usedfor such as TIDs contain a segment number. The lack of cross-segment references makes segmentsindependent, so that one segment can be destroyed or rebuilt without affecting others. It also avoidsthe needs for a common network database that maps segments to database address space.Destruction or inaccessibility of such a common database would make all segments inaccessible.6.3 ValuesThe choice was made to represent values as REF ANYs in Cedar. A REF ANY is a pointer to anobject whose type is determined at run-time, via a type code stored with the object. REF ANYshave some inconveniences and inefficiencies, particularly the overhead of using a pointer when it isnot required (e.g., with INTs) and the overhead of the run-time NARROW operation which coercesthe REF ANY to a particular type. However REF ANY seems the best solution since conveniencevfprF prpr prprpXrFpr Mp _0/ \C! Ztp0 VG T*tp, Rhupup6tp P4Z M0 J#3' G?% E-9 Cc AR"U=v5U9JU52 1(< /M -z[ )Y 'iE %5<* #7 Dtp As ep+upup  1L up =' up$up up up* > M L?Q\DATA MODEL IMPLEMENTATION ISSUES77and simplicity of the interface is important.Because only a few of the many datatypes made available to alpha clients of the original CedarDBMS were widely used, we implement only REF INT, REF BOOL, REF GreenwichMeanTime,ROPE, and Entity values. Some other types, e.g. pointers to Mesa records, are still used by internalprocedures but are not currently exercised by client programs. Note that in our system particular Relship or Entity variables all have the same Cedar type; i.e.,there is not a different programming language type for each domain or relation. Run-timeoperations are provided to fetch the domain or relation to which an entity or relationship belongs.In the associative programming language LEAP (Feldman and Rovner [1969]), this run-time typing is taken to theextreme of treating a relationship's relation not as its type, but as simply another field of the record: relationships arenot typed. If this approach were taken in our implementation, for example, RelationSubset would not need the first(relation) argument: the user could specify the desired relation as part of the AttributeValueList or omit it. The typelessrelationships would provide little or no additional utility in our system, however, and in fact may complicate the worldfor a typical application. The cost of complete typelessness does not seem to justify any utility it may have.The run-time typing of relationships, entities, and values means that a fair amount of the code inthe dozen Model level operations is concerned with error checking. If at the Query level weintroduce a compiler for database operations, we can save execution time by performing many ofthese type checks at compile-time. Fortunately the caching of schema information avoids databaseaccesses for most of this checking, so the current overhead is not overwhelming.The Storage level implements strings of arbitrary and variable length. The length hint supplied toDeclareAttribute is used to decide how much space to allocate in the record itself, but thecharacters are stored elsewhere if the field overflows. It is currently not efficient to store stringswith thousands of characters in Cypress if the strings are frequently updated, because the storageallocation function is not designed to deal efficiently with data chunks larger than a page or so.We are implementing a supplementary package to Cypress which provides efficient storage of theselarge text strings, such as the body of a document or program. The actual text is stored inconventional files; the file, position, and size of a text string is stored in the database where it isreferenced. The supplementary package also implements a log of all database updates so that adatabase client may replay or back up database actions, independent of the database transactionmechanism. 6.4 Caching the data schemaIn order to improve the performance of the database system, we cache information about the client'sdata schema in a quickly-accessible form. On the first reference to an attribute, for example, thesystem saves the type of value the attribute demands, its relation, and its uniqueness. On subsequentMfprFprpXr FprGp _- [:O Yu) VpM T>3g P#upup N; LX$? JGr^ HYj Fk*I D}@= BB6 @~o Q])DATA MODEL IMPLEMENTATION ISSUES79groups provide an efficient way to find all references to an entity via a particular attribute. Thereare three other possible values for the link hint: Unlinked, Colocated, and Remote.Colocated is the same as Linked, but an attempt is made to store the relationship which referencesan entity on the same file page as the entity that it references. It provides even faster access to anentity from a relationship or vice versa. Only one attribute of a relation can be declared colocated.Relationships which reference an entity in this attribute are colocated with the entity and with eachother.Colocation must be performed by DeclareRelship when it creates a new relationship for which avalue has been specified for the colocated attribute. (SetP calls DeclareRelship, so it achieves thesame result; but SetF causes no colocation, as the record has already been allocated.) WhenDeclareRelship is called to create a relationship with a colocated attribute a with value e:1.DeclareRelship attempts to store the new relationship on the same page where e is stored;2.if that page is full, DeclareRelship attempts to store the new relationship on the same pageas the most recently created relationship referencing e in a.3.if that page is also full, or there are no other relationships referencing e in a, then therelationship is stored anywhere.One can imagine a number of variations on this algorithm; we have not experimented with them atthe time of this writing. One might profitably reserve pages for relationships referencing an entityon the basis of an estimate of the space that will be required.NOTE: colocation is not yet implemented at the time of this writing. Relationships are simply placed on pages in theorder they are created. We currently achieve some approximation to colocation by dumping a segment in textual form,with all relationships that reference an entity grouped together, and then reconstructing the segment from the dump file. The string name of referenced entities are stored in Unlinked attributes, rather than using a grouplist. Thus, groups cannot be used to find an entity from an unlinked reference to it, nor vice versa.When an attribute is declared Unlinked, the database client typically declares an index on theattribute with DeclareIndex, thereby making access to a relationship reasonably fast (logarithmic inthe size of the relation instead of a single page access, but rarely more than a few page accesses). Inother cases, an index is better than groups. Indices could use less space if entity names are short:groups require three 32-bit TIDs per record. If TIDs are simply being compared for equality, e.g.for a relational join, the TIDs may be compared directly in B-tree pages rather than fetching thepages containing the records themselves.Note that when GetF is performed on an unlinked attribute, a B-Tree lookup must also beMfprFprpXr FprGp _f \(upupup Yupup8 VR T"D Rhe P4 LXu p* J#7upu p Gup#up Eup>up upUAu p?upU>u p8;6upuU7pKupup5 1W /U -z? )rpr0? 'N& %|^ !p5up kf 7up7 u p/ T ?& e-5 1#> ( up( d ?Q\23DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL80performed, to map the entity name onto the entity record that is returned. If GetFS is used, thislookup is bypassed.Remote indicates that an attribute may reference entities in other segments (in a different segmentthan the relation). Remote attributes are currently treated much like Unlinked: the name is storedinstead of a physical link (remember we do not want links, which are based on TIDs, to crosssegment boundaries). Unlike Unlinked attributes, the segment and domain name must be stored inthe reference as well as the entity name. The Remote attribute link hint is not technically necessary,as the Model level could determine how to store an entity-valued attribute "on the fly." It greatlysimplifies the implementation, however.If an index is not declared on an unlinked or remote attribute, the RelationSubset operation willtake time linear in the size of the relation to process a query whose attribute value list asks for aparticular entity value for the attribute. An unlinked and unindexed entity-valued attribute wouldbe useful only if the relation is known to be of a small fixed size, or queries are never performed onthe attribute.A time comparison of three simple cases of entity-valued attributes follows: an attribute declaredlinked but not indexed, an attribute declared indexed but not linked, and an attribute that is neitherlinked nor indexed. We show the times for the three common operations of fetching the attribute,setting the attribute, and doing a DeclareRelship operation with the OldOnly option to find therelationship with a particular value for the attribute (this is equivalent to a RelationSubset andNextRelship). These and all other measurements were performed on the Xerox Dorado computer; see Section 8.2 fordetails. Operationlinkedindexedunlinked, unindexedGetF1 ms4 ms1 ms + N*.5 ms, N = domain sizeSetF7 ms40 ms4 msFetchRelship4 ms32 msabout N*.5 ms, N=domain size These measurements were performed on relationships whose pages were already in the Segmentlevel's cache, thus the "linked" column is near the ideal time one would obtain if records werecolocated. The difference between linked and colocated times depends on a particular application'sdata schema, of course. A page read and write both require approximately 60 ms. Thus GetF andSetF could take as much as 60 ms in the worst case. For typical applications, the payoff fromcolocation is much smaller than that, but still seems to produce more than a factor of two speedupin both GetF and SetF.If the surrogate relation optimization is applicable, the basic operations are still faster. GetF on alinked entity-valued attribute, for example, averages about 0.3 ms. This compares to 1 ms in thevfprF prpr prprpXrFpr Mp _:up \ YupN Vup,up T$8 Rhup: P4up2 MY K' Gtp2u p EN C;( AR*< ? ;Ac 9 L 6V 4u pup 2pMu p 0;u pr\ .p*+s#,&Oup#,$up#,!u p#,  R L  X m$3up 9up#7 V upup ^up ] M y>Q]ODATA MODEL IMPLEMENTATION ISSUES81above table. Note that the introduction of entities to the relational model allows the DBMS to make someintelligent guesses as to where to use colocation and links. By default, an attempt is made to linkand colocate relationships with the entities they reference. This may be overridden by the client, ofcourse.Our current database applications most frequently retrieve together all or most of the relationships,from a variety of relations, which reference a particular entity. Most implementations of theRelational model store the records of a relation contiguously. The relational arrangement results inmany more page reads than ours for the entity-centric access we find most common.6.6 Surrogate relationsThe surrogate relation optimization is used when DeclareAttribute is called to define the firstattribute of a relation, and: (1) the attribute is entity-valued, (2) it is a primary key of its relation,(3) its type is a domain with no sub-types, and (4) that domain does not yet have any entitiesdefined in it. The relation is then specially tagged as being a surrogate relation, and the 2ndthrough Nth attributes will be added to the domain's recordtype, i.e. the data will be stored in theentities rather than separate records. The relation itself will not exist as a recordtype, and anyrelationships in the relation will be represented as surrogate records. Thus the records that represententities may have fields in addition to the entity name, containing surrogate relation attributes.The data schema must record the relations for which the surrogate relation optimization has beenperformed, for internal use. This information can also be useful to select clients such as the dataschema update tool. When compared to colocation (i.e. first attribute declared Colocated), the surrogate relationoptimization is an improvement in two ways. First, a surrogate relationship uses less space (noadditional record and link overhead); this may allow information to be stored on the same page.The overhead of an additional page access (60 ms) can make a considerable difference in anoperation that takes less than 1 ms (GetF). Also, accessing the field directly rather than followingthe link to another record is somewhat (currently about .4 ms) faster even when the other record ison the same page.The effect of the surrogate relation optimization on Cypress performance is of course dependentupon its applicability to a particular application's data schema. In general, we find that the surrogateoptimization can be performed on about half the relations in the schema. For the electronic maildatabase application, construction of database entries runs about 30% faster with the surrogaterelation optimization enabled (instead of just colocation). Other operations, e.g. displaying aMfprFprpXr FprGp _  [:8# Y13 VB$ T P14 N"< LX>' J#Q Cs ?p& up =L ;H 9TJ 7tp0 4@# 2T 0b ,7) *rA# (=? $a;up "-!? F < %up% [J ' K U E$ A %: yB  2?Q]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL82message, have equal or better performance improvement.We could allow database clients to specifically indicate when the surrogate relation optimizationshould be performed, as we do with links. The optimization appears desirable whenever possible,however, so this unnecessary. Note the restrictions mentioned earlier on the application of the surrogate relation. The restrictionthat a domain have no subtypes is made to simplify the problem of assigning field positions in therecordtypes. The restriction that the domain be empty is made to avoid reconstructing or markingexisting entities which would not have the field. The latter restriction is similar to one on relations,that attributes may only be added to a relation while it is empty. The schema editing tool described inSection 7 makes such restrictions invisible by copying the domain or relation when necessary.6.7 Basic operationsIn this subsection, we discuss more details of the basic operations on entities and relationships.DeclareEntity and DeclareRelship simply create a record in the underlying domain or relationrecordtype, updating any associated index entries. The DeclareEntity operation uses theDomainSubset operation to determine whether an entity with the given name already exists.DeclareRelship similarly uses RelationSubset to determine whether the given relationship alreadyexists, unless version is NewOnly.A small complication is introduced by the interaction of the surrogate relation optimization and theuse of indices. The attributes of relationships of a surrogate relation are stored as "attributes" ofentities in the target domain. Index entries reference the entities. This would suggest that we mustcreate index entries when the entities are created. The semantics of null values can be used toimprove efficiency, here: we need make no entries until values are actually assigned. This shouldnot be done until surrogate relationships are created and their attributes assigned values. GetF and SetF must make special checks for the surrogate relation optimization; when thisoptimization has been performed, the storage level "field handles" for attributes of the relationactually reference fields of the domain, in which the relation's attributes are stored. With SetF, aparticular constraint has currently been added: the key attribute of a record must be assigned before others, so that theModel level knows which entity in the domain is being referenced. Note this would not be a problem if we disallowedSetF, requiring that whole relationships be created or destroyed as units.Type checking is necessary only for one operation, SetF. If the domain sub-type hierarchy is notused, i.e., an entity stored in an entity-valued field is precisely of the domain required by theattribute, the check is fast. Furthermore, if type-checking is thwarted altogether, i.e., the attribute isvfprF prpr prprpXrFpr Mp _6 [:Y YW V$ RU P6, N># LX[ J#*r& G] As =pb 9u pu p< 718u p 2 5xu p6 3Cu pup ' 1up -3<( *H (Y &tp - $a21 "-[E Q L L Wr .pr"' t J pup* Z Mh M ?Q]DATA MODEL IMPLEMENTATION ISSUES83of type AnyDomainType, the check is even faster: the system need only check that the base type(i.e., entity-valued, string-valued, etc.) is correct. In the case that a check on the attribute's domaintype fails, SetF recursively searches up the hierarchy from the actual entity type looking for theattribute's type. It fails upon finding no more super-types. (Searching upward is faster thansearching downward for randomly-selected examples in the domain type lattice in the largestapplication schema we have so far.) More efficient schemes could be implemented if necessary, e.g., pre-computing the super-domains and redundantly storing the list with each domain.SetF must update group lists when they are being maintained for entity-valued attributes, asdescribed earlier. Conversely, if links are not maintained for an entity-valued attribute, GetF mustperform an index lookup to map the stored entity name into the entity handle returned to the client(GetFS need not do this lookup, however, as it is only required to return the name). SetF usesRelationSubset to determine whether key constraints are violated by the new attribute value.The algorithm for DestroyRelship is relatively simple. The Storage level operation to destroy therecord automatically deletes the record itself and removes it from the linked list maintained throughall records of its recordtype. Variable-length strings which have overflowed the space allocated inthe record (the length argument to DeclareAttribute) are stored in separate pseudo-records thatmust also be de-allocated. Also, any index entries for the destroyed Relship must be deleted, andthe relationship must be removed from any group lists in which it participates.DestroyEntity is similar to DestroyRelship, but any relationships that reference the entity must alsobe destroyed. There are three special cases of relationships referencing an entity:1.Surrogate relationships stored in the entity record itself must be destroyed; the commentsabout deleting variable-length strings and updating groups and indices apply as inDestroyRelship.2.Relationships linked to the entity via a Linked attribute may be found and deleted quickly.3.Relationships referencing the entity via an Unlinked attribute must be found by aRelationSubset operation on attributes that can reference an entity of the kind we aredeleting. A group list of these unlinked attributes is maintained in the database for eachdomain, to make this operation more efficient.The Eq operation returns TRUE iff its arguments have identical TIDs. If its arguments are fromdifferent segments they will have different TIDs, of course, as the segment numbers are stored as thehigh order bits of the TIDs. The correspondence between segment numbers and atoms ismaintained to implement the SegmentOf operation.MfprFprpXr FprGp _u p7 \S Z uptp: Xx_ VDL T#r. QN Mup; K-tp+up IL Gbup0up E-u pN ARu p, ?F <7- :up up, 8 :up 6KO 2pu p up: 0;TU,_<*+R'u pU$)upU ?,up u p!'V. up upB S ]9 )up  & >QY)+DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL84The procedure GetAllRefAttributes is implemented using a special Storage level operation thatreturns all the groups in which an [entity] record participates. As unlinked attributes will not befound by this operation, they must be added to the list returned by GetAllRefAttributes, as well.6.8 Aggregate operationsIn this section, we discuss details of the DomainSubset and RelationSubset algorithms that havenot already been covered.The algorithm for DomainSubset is relatively simple. If a name or range of names is specified inthe call, the index for the domain is used; otherwise the linked list the Storage level providesthrough all the records of the domain recordtype is used. If it is requested that subdomains besearched, a DomainSubset operation is invoked on each in sequence. Whatever the method, theinformation about the enumeration is packaged up in the EntitySet returned so that NextEntity cansequence through the indices, lists, or domains as required.The algorithm for RelationSubset is somewhat more complex than DomainSubset. We are givenan AttributeValueList, composed of triples of the form [attribute, low, high] constraining the givenattribute to have a value greater or equal to low and less than or equal to high. The high value istypically omitted and is then assumed to equal low. The high value must be omitted for entity-valued attributes, since a range of values makes no sense in that case. RelationSubset must choosean efficient manner in which to find the relationships satisfying the list.The following strategies are used by RelationSubset, in this order:1.If the surrogate relation optimization has been performed on the relation, then (a) if a key-valued attribute is contrained to equal a particular entity in the attribute-value list passed toRelationSubset (the constraint list), we may return a single-element RelshipSet; (b)otherwise, we set a boolean in the RelshipSet (the one we will return) to indicate thatwhichever of the strategies 2, 3, and 4 are used below, NextRelship must convert theentities we obtain to surrogate relationships.2.If one of the attributes in the constraint list is entity-valued, then we use the group list forthe given entity value. That constraint is removed from the list, and the remaining reducedlist, is saved to be serially checked by NextRelship against records in the group list.3.If there is an index on one or more of the attributes in the constraint list, the indexcontaining the largest subset of the attributes is used to find the relationships satisfying thatsubset of the list. The remaining reduced list is checked by NextRelship, to filter therelationships enumerated from the index.vfprF prpr prprpXrFpr Mp _ up7 \@$ ZDup TVs Pzpu pu p NF Jjpu pC H6` F00 C u p/ A8upu p ?d< ;pu pu p 9Tup"u p 7.upupup 4upuptp 2< u p 0K ,p%u pU(pD&W $au pu p'u p"-#u p'u p.Up u p6>tp%u p#Up8u po?" ;/u p  'u M >Q]LDDATA MODEL IMPLEMENTATION ISSUES854.If all of the above strategies fail, the underlying storage-level operation to enumerate the listthrough all records of the relation's recordtype is invoked, and NextRelship serially filtersout the relationships satisfying the constraint list.The NextRelship operation enumerates the appropriate group, index, or recordtype as specified bythe RelshipSet that RelationSubset produced. As all three of these enumerations are bi-directionalat the Storage level, PrevRelship is a straightforward inverse of NextRelship. Note the slightcomplication in NextRelship and PrevRelship introduced by the surrogate relation optimization. Inthat case, the indices, group lists, or recordtypes can be scanned as usual, but refer to entities ratherthan the surrogate relationships which have been mapped to them. Surrogate relationships arecreated on-the-fly.The time required by the RelationSubset operation is of course dependent upon the path throughthe above algorithm taken, whether an index or group is scanned, and so forth. A typicalapplication averages about 7 ms for a variety of RelationSubset calls. Thanks to the schema cache,only about half of this time is spent examining the data schema to check the validity of the queryand determine which physical access path to use to process it.The time for NextRelship is typically 0.4 ms, and is nearly the same for a group, index, orrecordtype enumeration. Currently most of this time is for the Cedar allocation of the Relshiphandle itself! However, the NextRelship time will be much larger for a database applicationwithout a working set in the database page cache: the time to fetch pages will dominate.We will not discuss the MultiRelationSubset operation and Query level here, as they have not yetbeen constructed. One possible plan would be to (1) build a parser at the Query level to map textqueries into a tree representation, (2) make a major extension of the RelationSubset operation tooptimize access path selection over the entire query, (3) make a minor extension to RelshipSets(and NextRelship) to allow them to act upon nested relations (still producing a single relation at thetop level), and (4) build a Query-level formatter which prints the RelshipSet enumeration for theclient. 6.9 SummaryThe Cypress DBMS is built in four levels: the Model, Storage, Segment, and File levels; in thissection we have concentrated on the implementation of the Model level. We summarized theoptimizations made in that implementation:1.caching the data schema,2.use of B-tree indices upon entity names and relation attributes,MfprFprpXr FprGpU_3.\'u pZ%u pu Vpu p< Tu pu pA Rhu p!u p P4u pu p4 MB' K7& I Epu p! C%4 AR1u p ?B <> 9 p u p8 6Wu 4p u p4 2pY .purp" ,_P *+ 8u p 'Eu %pu pR #(u p !Yp s 'pO 5$ *UpU p@ 8 >Q]L-DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL863.use of "group" links through relationships referencing an entity4.co-locating relationships which reference the same entity on the same page, and5.storing relationships which reference an entity as surrogates in the entity itself.Caching the data schema is crucial to good performance; almost all operations must examine thedata schema to check the legality of the client's request or to decide how to perform the operation.The payoff can be as much as 90%.The payoff from the use of indices is variable, but they are crucial upon entity names and are oftenbeneficial upon relations. There is a trade-off between using indices and links on relations; indicescan be used on multiple attributes and use less space if the keys are short, but links are typically afactor of three faster.It is enlightening to observe the progressively better performance for the GetF operation on anentity-valued attribute that is (1) indexed, (2) linked, (3) colocated, or (4) surrogate. As just noted,links are typically a factor of three faster than an index, as at most one page (other than the page onwhich the relationship resides) is accessed. Colocation typically produces another factor of three orso, though this varies widely with database application. The surrogate relation optimization providesanother factor of two or so, by avoiding another tuple lookup (on the same page). The best time fora GetF is currently about .2 ms.In Section 5, we showed how the Cypress data model was designed to provide a higher-levelconceptual model, simplifying the development of database applications. In this section we haveshown that a higher-level data model can also be used to improve the performance of a databasesystem, by capitalizing on the additional knowledge of data semantics and interconnectivity. InSection 7 we will concentrate on the third advantage of a higher-level model: the more powerfultools and integration of database applications it enables.vfprF prpr prprpXrFpr MpU_@U[:pOUW^pS Sp^ QN(< O$p K>P I 8. Ff D @Jup >b <\16 :'f 79- 5E 3up / L -zO +ED )I &C $: M$a?QC877. Database environment and applications7.1 Database environmentAlong with the development of the Cypress DBMS, adding the Model level facilities to the formerCedar DBMS, we built a system named Squirrel to provide some general-purpose database tools andsome level of integration between database applications. Several applications of the database systemhave been developed. These applications include Walnut, a database of electronic messages, andHickory, a calendar/appointment database.Squirrel provides the end user with tools to perform a number of basic database operations. Thesetools include:1.A browsing window to examine database entities and relationships.2.An editing window to modify database entities and relationships.3.A simple query window to search databases.4.A facility to dump databases in textual form, and subsequently reload them. The externalform is human-readable text, and contains the database schema as well as the data.5.A facility for examining and modifying the database schema, automatically re-organizingexisting data for simple schema changes.These five basic functions may provide all a user requires for simple kinds of database applications.For example, for some personal databases (a wine database, and a database of phone numbers andaddresses) we have found the Squirrel tools useful in and of themselves. Squirrel is also used bydatabase application programmers to examine and modify a database in the process of testing anddebugging application programs.In addition to the facilities directly provided the end user, Squirrel provides facilities for applicationprograms. Squirrel's function in this regard is to encourage a consistent and integrated userinterface among the various applications. The goal is a confederation of database facilities asopposed to monolithic independent programs. Squirrel provides some convenient procedures forcommon database operations and Cedar Viewers operations, common conventions for the userinterface, and a mechanism for applications to communicate with one another when data isdisplayed and updated. Gfp ^q) X2r TVpB R"K O*; M[ K>) GbpQ D U@pAUZDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL88The user interface paradigm we use provides one Cedar Viewer on the screen per database entity,an object-oriented menu of commands upon a database entity, and a consistent interpretation ofuser selections on the screen. The paradigm is based upon three basic types of applicationwindows: displayers, editors, and queryers. A displayer displays a database entity, provides a menuof commands upon it, and shows the information the database contains about the entity. Forexample, for an electronic message displayer, the message's header and body would be displayed,along with commands such as Answer, Forward, and Delete. An editor looks much like a displayer,but allows the user a form with which to modify the information about the entity, and usually has adifferent set of commands. For example, a message editor is created in response to a NewForm orAnswer command in the Walnut system; the user fills in the fields and invokes the message editor'sSend or File command. A queryer also looks like a displayer and editor, but the user may fill inthe fields with values, boolean expressions of values, or other application-interpreted information;the queryer represents all entities in its domain which satisfy the constraints. When the userinvokes the Query command, the entities satisfying it are displayed and may be browsed or printed.Each database application program, such as Walnut or Hickory, registers itself with Squirrel bypassing procedures to be called when an event of interest to the application occurs. The applicationmay register a displayer, editor, or queryer procedure to be called when the user requests one for aparticular domain. Walnut registers such for the message domain, for example. An application mayalso register itself to be called when certain database relations are updated, so that it may update itsdisplay.Figure 7-1 illustrates a variety of types of displayers. Note how each application takes bestadvantage of the display for the particular type of entity involved. Figure 7-1a is a messagedisplayer, implemented by Walnut. Figure 7-1b shows a month displayer, implemented by Hickory.Figure 7-1c shows a picture displayer, implemented by a simple image application. Figure 7-1dshows a relation displayer, implemented by a data schema display package. Figure 7-1e shows a"whiteboard" displayer, used for spacially organizing and accessing entities. Figure 7-1f shows a"default" displayer which is invoked when no particular application deals with the given domain (inthis case, the Organization domain). Squirrel implements the default displayers, whiteboarddisplayers, and schema displayers.Figure 7-2 contrasts a displayer, editor, and queryer for messages.Figure 7-3 illustrates one way in which the collection of database applications may appear as anintegrated collection of facilities to the user, showing successive windows on the screen as the userbrowses from a "Whiteboard" display, to a "Message Set" on that whiteboard, to a "Message" inthe set, to the sender of the message. The user crosses application boundaries in going from onekind of entity to another, but the action to browse (pointing with the mouse and pushing its centerbutton) is the same throughout. vfpsF psps pspspXsFps Mp _C \? Z< Xx t ptptptp# VD"9 T-2 Q*tp OI MrS K>F I tpA F.6 D*5 BlH >>tp <\Y :'$@ 7^ 571 3 /> -zT +E)6 )R &J $Z "s$? ?%7  " /C S+5 N )4 +6 "A M$ M >]>DATABASE ENVIRONMENT AND APPLICATIONS89Squirrel, Walnut, and Hickory will be more fully described in a future report. To put the databasework in context, however, we will summarize Squirrel in Section 7.2 and the applications in Section7.3. 7.2 Database toolsA Squirrel window on the screen provides the user with the basic database functions, such ascommiting or aborting a transaction, dumping or loading a database, or erasing portions of adatabase. The Squirrel window also allows the user to explicitly invoke a displayer, editor, orqueryer on a particular domain or entity. If no application has registered itself for the givendomain, Squirrel invokes its default application-independent displayer, editor, or queryer when theuser requests one.The default entity displayer shows the domain and name of the entity displayed at the top of aCedar Viewer window, and all the relationships in the database which reference that entity areshown as the main body of the window. An example of such a window is in Figure 7-3d. Therelationships are diplayed in the formrelation attribute1: value1 attribute2: value2 ... attributeN: valueNwhere relation is the relationship's relation and the attribute: value pairs specify the values of itsattributes. The attribute which references the displayer's entity is not shown, as it is normallyredundant (all of the relationships in the displayer for an entity have at least one attribute whichreference the entity or they would not be displayed). The other attributes are displayed in theobvious way for string, integer, and boolean values. For entity values, the name of the entityreferenced is displayed. In addition, for entity values, the user may select the entity with the center(yellow) button on the mouse, causing a displayer to be created on the screen for the selected entity.This selection with the yellow-button provides the basis for browsing, and is supported by Squirreldisplayers as well as application displayers.The default editor window also shows the domain and name of the entity being edited at the top ofthe window. However it differs from the default displayer in that an editable form is provided inthe body of the window for the entity, showing not only those relationships which already referencethe entity, but "blank" relationships for relations which could potentially reference an entity of itsdomain. A "blank" relationship appears as a relationship in a displayer, but the values are given asblank fields that can be filled in by the user by selecting the blank with the red mouse button andtyping. If the editor window is on a new entity (one that previously did not exist in the domain),then all the relationships shown will be blank ones. Special combinations of mouse buttons invokerfpsFpsps Gp _?$ \4/ Z R"r NFp= L9# IU GK EtG C@ ?dQ =/F :4& 8&4uG 1pup(u p .V ,&> *r:& (=$; & 17 #U !.5 k- Z [Otp '=& U Q c U5. %= >\2 DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL90special commands on the blank relationships: creating a new blank relationship for a particularrelation, explanding the blank relationship to show the types expected of the fields, or specifyingwhether the system should automatically create an entity when a field value is filled in with anentity name that does not exist in the appropriate domain. Menu buttons at the top of the editorcan be used to change the name of an entity or merge it with another entity.The default queryer window has a form similar to an editor window, except that only a domainname is shown at the top of the window. The queryer window, unlike an editor window, representsany number of entities in a domain which satisfy a query. The query is specified by filling in theform fields with expressions that specify the desired values of the fields. The expression mayinclude boolean ORs (written as "|"), and also ranges of values (written using "~", e.g. "5~11").After the form has been filled in and the user uses the window's Query command, the queryeropens a new window on the screen displaying the names of the entities which satisfy the query.The client may scroll through the entities, or select one of them with the yellow mouse button toopen a displayer on it. Note that except for the form in which the answer is examined, this queryerwindow is similar to the Query-by-Example system of Zloof [1975].In addition to displayers and editors on ordinary data items, Squirrel implements special displayersand editors for domains and for relations. The displayer for a domain shows the attributes in thedata schema which can reference an entity from that domain, the sub-types and super-types of thatdomain, and the entities of the domain sorted by name. The user can scroll or yellow-select theentities. The editor for a domain shows only the sub-types and super-types, and allows the user tochange these. The editor will automatically copy entities and rename domains if necessary toachieve the effect of a requested change (because the underlying database system does not allowdefining sub-types of a domain that already contains entities). The displayer for a relation shows atable whose columns are labelled with the names, types, and uniqueness constraints of the relation'sattributes. An example of a relation displayer is shown in Figure 7-1d. The rows of the table showall of the relationships in the relation. Again, the user may scroll through the relationships, and mayyellow-select attribute values that are entities to open a displayer window. The editor for a relationshows only the attribute names, types, and uniquenesses, and allows the user to change these. Theuser may also delete attributes of the relation or create new ones; the relation editor willautomatically copy all the relationships into a new relation with the same name, since the underlyingdatabase system does not allow changing attributes of a relation after relationships exist.Squirrel and application programs support a common interpretation of mouse buttons for the user.This common interpretation is as follows:vfpsF psps pspspXsFps Mp _+up ( \.5 ZN XxB VDL RhB P4U MG KC I%< Gb%6 E-I BD @9+ >A :13 8S 6KP 4K 1 U /] -z T +Ee )R &8, $P "sI ?^  ": J [ R ) \ MJ?QTDATABASE ENVIRONMENT AND APPLICATIONS91ButtonInterpretation YellowOpen a new displayer on the entity at the current positionRedSelect the entity or datum value at the current cursor positionBlueExtend a selection made by red to the current positionCtrl-RedDelete the relationship at the cursor positionCtrl-BlueExtends the action of Ctrl-Red over a number of relationshipsShift-RedCopy the entity or datum value at the cursor into the current input focusShift-YellowExpand the selected entity in place (show more information)Shift-BlueExtend selection with Shift-RedOthersCurrently application-definedThe yellow button was dedicated to the "show me" interpretation because this was deemed the mostcommon operation upon a selection, to open up a new window on an entity. The user does notalways want to open a new window, however; this tends to proliferate windows on the screen. TheSquirrel windows and database applications currently follow the following convention for openingnew windows when a user yellow-selects an entity: 1.If the user has yellow-selected an entity in this window before, and the new displayerwindow thus created is still on the screen, we re-use that window, replacing the entity thatused to be displayed in that window with the selected one.2.Otherwise, we create a new entity displayer on the screen for the selected entity, in non-iconic form on the left-hand side of the user's display screen.The interpretation of the mouse buttons was chosen to be as compatible as possible with existinginterpretations in the Cedar Viewers environment. There is in fact some overlap. For example, onecan select an entity by selecting the entire window (or icon, if in iconic form) for its displayer. Itcan then be used by a command, e.g. to add it to a set of entities in another window.All of the operations afforded the user by Squirrel are also available to client programs, through aCedar Mesa interface called "Nut." The Nut interace is also the one through which applicationprograms register themselves with Squirrel, through the Register operation:Register: PROC[ domain: ROPE, display: DisplayProc_ NIL, -- to make displayer for entity in domain edit: EditProc_ NIL, -- to make editor for new or old entity in domain query: QueryProc_ NIL, -- to make queryer on domain create: CreateProc_ NIL, -- to create viewer for above (defaults to container) update: UpdateProc_ NIL, -- to call when database is updatedrfpsFpsps Gp `_r` p `[:`: `Y`? `V`6 `T`. `Rh`= `P4`I `M `; `K ` `I` E00 CD ARtp+ ?X <0u,up 9 H 6/tp'4: 0 M.? *K (c &OJ $U ?P  7' 8up uF]I)5 0 > y>Q]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL92 ];7.3 Database applicationsOur first application of the Cypress DBMS was Walnut, a collection of facilities for sending,receiving, filing, and querying electronic messages. Walnut provides four kinds of Cedar Viewersvisible to the user:1.Control window: This viewer window is the one that appears when Walnut is started. Itprovides menu commands to fetch new mail, create a new message form (item 4 below)and create a new "message set" (item 2 below). The control window also reports whennew mail is available, or an error is recognized by the Walnut system.2.Message-set displayer: This window displays a list of messages, their senders, dates, andsubjects. There is a special "Active" message set into which new mail is normallyinserted. The viewer for this message set is automatically put on the screen, along withthe Walnut control window, when Walnut is started. The user may display many messagesets on the screen at the same time. Yellow-selecting one of the messages produces aninstance of item 3 below, a message displayer. A menu of commands at the top allowsremoving messages from the message set and/or adding them to other message sets.3.Message displayer: This window displays a message in the database. The menu ofcommands at the top includes Answer and Forward, that produce an appropriatelyinitialized instance of a message form (item 4 below), and Erase, which deletes themessage itself and removes it from all message sets. The user may yellow-select amessage field to see a database item named there. For example, selecting one of the"Categories" items causes the corresponding message sets to be displayed (whichenumerate messages on that subject or in that message set, which can in turn be selected).Selecting persons in the "To," "From," and "cc" fields causes the person to be displayed(phone number, office number, picture, or whatever the database happens to contain).4.Message editor: This window displays a message form which the user can edit. Some ofthe fields will be initialized according to whether the message editor window was createdby the "Answer," "Forward," or "New Form" commands. 5.Message queryer: This window also displays a message form which the user can edit. Itrepresents all the entities with the values the user specifies in the fields he fills in. Valueranges may be specified. For example, the user might ask for all the messages withsender "Cattell" from last week, containing the string "Cypress" somewhere in thevfpsF psps pspspXsFps Mp_u Vr RpT Pz<% NF Jjt upDH6CF;CF ?tp1==;B9T;78492P .tp&,@ *r.%(=9& 2"#2!Ek6"7T [tp1'M 35 tp*N 5 yM M 2>Q]DATABASE ENVIRONMENT AND APPLICATIONS93message body. When the user invokes the Get query command on the queryer, a newmessage set displayer appears on the screen containing the set of messages which satisfiedthe query. Although the messages in this message set are not actually stored in adatabase message set, the user may select and browse through messages in the windowjust as with any other message set. Walnut uses a special text logging package, for two purposes. The current Cypress implementationdoes not deal efficiently with large string-valued attributes that are frequently created and deleted(the storage allocator is oriented towards objects less than a page in size). The text logging packageallows Walnut to store the bodies of messages in a file, storing a pointer to the text body as arelationship connected to the message entity in the database. The text logging package is also usedto maintain a record of every database-updating operation performed by Walnut, e.g. receiving anew message or adding a message to a message set. The log may be replayed to reconstruct adatabase after a crash, or in the unlikely event that an entire segment is destroyed. The ability toreplay the log makes it unnecessary for Walnut to commit a database transaction after everyoperation. The text logging package used by Walnut is now being rewritten to incorporate it as anextension to the common database software, making it useful to all database applications.Another database application now in preliminary use is Hickory, a Calendar/Clock databaseapplication for recording appointments and reminding the user of them. Hickory implements"event," "event-set," "day," "month," and "year" entity displayers. Hickory makes use of thedatabase system hierarchy of domains to define a number of types of events, such as meetings,seminars, and trips. The user may enter particular events, or sets of events such as periodic events(e.g., a seminar that Hickory will automatically enter every Tuesday). When the client views aparticular day or month he can see the events scheduled for that day or month.A third application, "Whiteboards," has actually been integrated into Squirrel itself, because thefunctionality it provides is useful in conjunction with all applications. The Whiteboard packageprovides a mechanism to lay entities out in a tree-structured, spatial dimension. Consider threeways of moving around in a database by yellow-selection:1.Browsing in the network by selecting adjacent entities, e.g. selecting the sender of amessage to get a displayer on that person, or selecting some day in a month to get adisplayer on that day. This basic browsing functionality is provided by convention inSquirrel and all applications. Let's call this "ground" level browsing (Ground Squirrel).2.Browsing in the network by moving up and down a hierarchy of whiteboards whoseultimate leaf nodes are non-whiteboard entities. This adds a vertical dimension to (1),and is currently provided by the addition of Whiteboards to Squirrel. We might call this"tree" level browsing: although strictly the hierarchy could be a directed acyclic graph,rfpsFpsps Gp_J\ZZ9XxE VD#' RhB P423 MQ K S I,8 GbH E-> BH @.- >@" <\Y 86# 6K*0 47& 10- / X -zO +EN 'iN %57* #[ 8 $27H S< w<CEI -- T =\x#DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL94it's easier to understand as trees (Tree Squirrel).3.Browsing in the network by moving in a two dimensional continuum rather than thediscrete vertical "chunks" provided by (2), as in Herot [1981]. All the database entitieswould be laid out in a two-dimensional plane, probably automatically according to their"closeness" (= the number or importance of the relationships between them). Zoomingwould be continuous rather than discrete as in (2). Let's call this kind of browsing"flying": unlike climbing in a tree, there are no discrete horizontal or vertical levels(Flying Squirrel).Only the first two kinds of browsing have been implemented, although some experimentation withthe third kind might prove it valuable.In addition to the applications already described, several database applications are underconsideration for future development:1.Whitepages. A database of people, organizations, phone numbers, addresses, and theirphotos. Note there are both public and private versions of whitepages, and the userwould like to see both public and private information superimposed.2.Program database. A database of Cedar Mesa program modules and interfaces, in whichthe user could query the locations of definitions and uses of procedures, types, andvariables. This application would be interfaced with the program editor and compiler toautomatically maintain the consistency of the database.3.Super notebook. A database of one-sentence to one-page ideas, interconnected andindexed in a database of logical dependencies, bibliographic references, project notes, etc.Again, this could be both a private notebook or a shared notebook. 7.4 SummaryWe have described some tools and applications built upon the Cypress DBMS. Applications havebeen in regular use for over a year, and have guided as well as demonstrated the utility of theCypress data model and implementation. In the design of database applications, we have tried toachieve some uniformity so as to present the user with a confederation of information managementfunctions rather than independent programs.The reader may see how the data model has influenced Cypress applications and tools. Theintroduction of entities to the data model enabled the object-based user interface paradigm. ThevfpsF psps pspspXsFps Mp_3 [:@Y+/V-*TDRh3"P4 OM J#F G' D-2.( A% =/t pI:;8C 4tp>1</<-z7 (t p 6&P $aB=Sr  /pW $; .2 I ]+ G M\  M =]DATABASE ENVIRONMENT AND APPLICATIONS95information about the types of relationship attributes and their relation keys is used in the defaulteditor to check input and automatically create entities where required. The presence of entitynames and the distinction between entities and relationships greatly simplifies dumping andreloading databases in a linear and human-readable form.rfpsFpsps Gp _:+ \G Z.- Xx8~ V=^6DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL96vfpsF psps pspspXsFps MpF M^1i <%DATABASE ENVIRONMENT AND APPLICATIONS97rfpsFpsps Gp( ^q <DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL98vfpsF psps pspspXsFps MpF M^1i <%DATABASE ENVIRONMENT AND APPLICATIONS99rfpsFpsps Gp( ^q <DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL100vfpsF psps pspspXsFps MpF M^1i <%DATABASE ENVIRONMENT AND APPLICATIONS101rfpsFpsps Fp( ^q <1018. Status and conclusions8.1 SummaryThis document has described the Model level of the Cypress Database Management System,including the data model, the background and motivation for choosing that model, the clientinterface to the system, its implementation, and the database environment and applications for whichit was implemented. The guiding principles in the design and implementation have been simplicityand utility for the set of applications we envision.The result is a model that includes features of the relational model and distillations of desirablefeatures of more recent semantic models. The model includes the concept of entities with uniquenames, a hierarchy of types of entities, types and uniqueness constraints on relation attributes,relational views, and a logical segmenting mechanism that can be used to facilitate physicaldistribution and independence of databases. This report discusses a number of issues in theimplementation of these features, which are not present in any existing database system to theauthor's knowledge. The Cypress data model alleviates the problems motivating its development,reducing the quantity of data modelling mechanism built anew for each application, and simplifyingthe sharing of databases between applications. The Cypress model has also enabled the developmentof general-purpose tools formerly impractical due to the lack of type information and integritychecking in the Cedar database system.8.2 Some resultsSome overall statistics on the performance of the initial implementation may be helpful here. Wedeveloped two benchmark programs, one write-intensive and one read-intensive, to examineperformance. Average times for the most common operations are roughly: 1 ms: GetF 1 ms: NameOf 0.5 ms: NextEntity 0.5 ms: NextRelship 5 ms: DomainSubset 7 ms: RelationSubset10 ms: SetF10 ms: ChangeNameThese times are approximately in order of decreasing frequency of calls by the benchmarks, andFfp ^q Xxr TpE RhI P4L MN K4 GT EP CF ARE ?U <Z :L 8V 6KJ 41. 1& +r 'pW %|X #GGk 7 e1  > >Q\2DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL102include overhead at all levels of the database and file systems. The times were taken on the XeroxDorado, a personal super-computer with a micro-cycle time of about 60 ns. Note that since most ofthe Cypress operations are disk-limited, the times increase only somewhat for slower processors. Thetimes shown above vary widely with a particular application's data schema and access patterns, sothese numbers should be regarded as very rough averages. Some effects of particular optimizationsand schema changes were enumerated in Section 6. A significant result of our work is that the type checking required by the data model is not a largeoverhead in the Cypress implementation. Because we do not compile database accesses, we mustcheck in the implementation of every operation, e.g. SetF, that the arguments passed are of theproper and coordinated types. On a SetF, for example, we must check that: (1) the arguments are arelationship, attribute, and value, respectively; (2) the attribute is of the same relation as therelationship; (3) the value is of the same type as the attribute; and (4) that a key value constraintwould not be violated by the new value. The caching of information about attributes improves theperformance of the first three of these considerably. Without this caching, the GetF operation takesapproximately 8 times as long.A closer analysis of the time spent in a typical read operation, e.g. GetF, is enlightening. For ourbenchmark programs, the time breakdown was roughly as follows: 10% model level consistency checking and access path selection 20% storage level operation: actual read or update of data 50% waiting for disk operation (cache miss) 20% other overhead (page faults, garbage collection)These figures suggest that work on co-location strategies might be fruitful, since a fair portion of thetime is in disk operations. Note again, however, that statistics can vary widely with the particularapplication.8.3 Status and plansThe first implementation of the Model level was completed in December of 1981, and was exercisedand debugged through 1982. Approximately six man-months went into its development. Thisimplementation includes essentially all data model features except views and augments. Views havebeen deferred to the development of the Query level.Plans for the near future include development of more applications, continuing the work sketched inSection 7.vfpsF psps pspspXsFps Mp _A" \4. ZK XxB VD8* T2 P4Etp MF K S Ia Gb^ E-6/ BA @e > :#B 8>4?2p<0;,.5 *+S '(= % kr pH [U ';' 4 R   M>QYpSTATUS AND CONCLUSIONS103AcknowledgementsNori Suzuki and Mark Brown participated in the design and implementation of the original CedarDatabase System. Peter Deutsch and Jerry Popek assisted in the design phase. Eric Bier assisted inthe initial implementation of the Model level, as well as providing a useful sounding board for theseideas. Willie-Sue Haugeland co-implemented Walnut. Jim Donahue developed Hickory. Willie-SueHaugeland, John Maxwell, and Jim Donahue have all helped with the Squirrel system. Mark Brownhas maintained and elaborated the Cypress Storage level and was simultaneously the central designerand implementor of the Alpine file system which Cypress depends on for remote data storage. Samuel Feldman, Dennis McLeod, Jim Donahue, Mark Brown, John Maxwell, Butler Lampson,William Kent, Ken Keller, Dushan Badal, and Peter Deutsch provided useful feedback on drafts ofall or part of this document as it evolved over last year. Kathi Anderson and Subhana Menis helpedprepare this report for publication."fpsF ps Fp _r [:pX YS VU T8' Rh1- P4F MF J#F G_ E] C$ C?>Q$m1059. Annotated bibliographyAbrial, J. R. Data Semantics, in Klimbie, J. and Koffman, K., Database Management, North-Holland, 1974.Describes a data model based on binary relations. Relations that are N-ary even in functionally irreducible formmust be represented by the introduction of artificial entities to represent relationships. See also Bracchi et al [1976]and Senko [1976] for binary relation models.Armstrong, W. W. Dependency structures of data base relationships, Proceedings IFIP Congress,North-Holland, 1974.This was one of the first studies of functional dependencies and relational normalization after Codd [1970]. LeDoux& Parker [1982] summarize more recent work.Astrahan, M. M., et al. System R: Relational Approach to Database Management, ACMTransactions on Database Systems, 1, 2 (June 1976).This is one of the most widely known and successful implementations of the relational data model. Much of thedesign of the original Cedar DBMS was derived from "RSS level" of this system.Bachman, C. W. The Role Concept in Database Models, Proceedings 3rd International Conferenceon Very Large Databases, Tokyo, October 1977.Concerned with behavioral properites of database objects, the treatment of entities in multiple roles, and integratingdata models with programming languages. "Roles" for entities are actually more powerful than the sub-typehierarchy, although they are somewhat more complex.Beeri, C., Bernstein, P. A., and Goodman, N. A Sophisticate's Introduction to DatabaseNormalization Theory, Proceedings 4th International Conference on Very Large Data Bases, WestBerlin, West Germany, 1978.A discussion of relational normalization.Biller, H. On the notion of Irreducible Relations. Database Architecture, G. Bracchi and G.Nijssen, Eds. North-Holland, 1979.Proposes a semantic definition of the irreducible relational form described in this paper. Contrasts the definitionused in this paper with an earlier definition in terms of loss-less relational joins, showing our definition is morerestrictive.Bjorner, D. Formalization of Data Base Models. TR ID811, DCS, Tech. U. of Demark, Decemeber1978.The relational, hierarchical, and network models are sequentially formally defined. Both the data definition and datamodification/query are considered.Ffp ^q Z p?rp W U*sq S[ Q, Mp/rp J HsY F$+ Aup 6 r >!ptp <sF( :N 5p5r 3gp 0sj /hA) -z3 (pK &r;p $ !s) p5rp # dsK) o \ p r"p  sb " y=]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL106Bobrow, D., and Goldstein, I. Representing Design Alternatives, Prooceedings of the Conference onArtificial Intelligence and the Simulation of Behavior, Amsterdam (July 1980). Also available as partof Xerox Palo Alto Research Center Report CSL-81-3.A description of the layer concept in the PIE system. Layers are similar in concept to the augments discussed inthis paper. However the basic data primitive is a Smalltalk object rather than relationships and entities, so the resultdoes not provide the power of a database system, e.g. a query language. An implementation of layers was built, andfeedback from that experience and the authors' applications built on it were invaluable in the implementation ofaugments.Bobrow, D. An Overview of KRL, a Knowledge Representation Language, Cognitive Science 1, 1,1977.Describes a knowledge representation language KRL, intended for artificial intelligence applications. Many of theideas in knowledge representation languages, most of which are present in KRL, are now being incorporated indatabase models (see Wong & Mylopoulos [1977]). With an implementation of these ideas such as Cypress, theresult is a system with much of the representational flexibility of knowledge representation languages without thedatabase system features they normally lack: performance, shared access, persistent storage, or large quantities of data.Bracchi, G., Paolini, P., and Pelagatti, G. Binary Logical Associations in Data Modelling,Proceedings IFIP TC-2 Working Conference on Modelling in Databa Base Management Systems,North-Holland, 1976.Another discussion of binary relation models, similar to Abrial et al [1974]. Presents a number of arguments in favorof binary over N-ary relations, although these arguments are in reference to 3rd normal form rather than theirreducible form described in the present paper.Brodie, M. L. Data types and databases, IFSM TR #37, University of Maryland, December 1978.Relates data types and data models. We do not directly include this idea in Cypress, although relational views canbe used to represent operations on object-oriented entity "data types", and the Squirrel interface is also orientedaround the entity types. The paper references Brodie's thesis on data types and other topics.Brodie, M. L., and Zilles, S. N., Eds. Proceedings of Pingree Park Workshop on Data Abstraction,Databases, and Conceptual Modelling, SIGMOD Record 11, 2 (February 1981).A collection of papers relating data modelling from the points of view of programming languages, artificialintelligence, and databases. The same concepts (e.g. hierarchies of types) have independently been studied in thesethree areas. The Pingree Park workshop provided a useful interchange of ideas. Brown, M., Cattell, R., Suzuki, N. The Cedar Database Management System: A PreliminaryReport, Proceedings ACM SIGMOD Conference 1981, Ann Arbor (April 1981).An overview of the Cedar DBMS as it existed prior to the introduction of the Cypress data model, Squirrel tools,and Pilot and Alpine file systems.Buneman, P., Frankel, R. FQL: A Functional Query Language, Proceedings of ACM SIGMODInternational Conference on the Management of Data, Boston, (May) 1979.vfpsF psps pspspXsFps Mp _$r! \p0 Z3 X2susW Ve U*5> SW R" Mp+rp Ka HsJ( Gb>. E/< DZ` BH1 >Jp N <rDp 9 7fsB4 5c 4^0 /p O -Vss +X *N^ %p(r' ##p& !s38 A3  Q ~p-* Jr&p sL$ K" pr 2p M C=[0BIBLIOGRAPHY107A "functional" data model and query language (see also Shipman[1981]).Cattell, R., Donahue, J., Haugelan, W., Maxwell, J. Integrating Database Applications in a PersonalComputing Environment, submitted to IFIP Conference on High-Performance Personal Computers,(July) 1983. This paper provides a more thorough discussion of the ideas behind integrating the database applications built on topof Cypress, using Squirrel.Chan, A., Danberg, S., Fox, S., Lin, W., Nori, A., Ries, D. Storage and Access Structures tosupport a Semantic Data Model, Proceedings International Conference on Very Large Data Bases.This is the only other implementation effort of which the author is aware under way to implement some of thefeatures of recent semantic data models. The implementation is of an adaptation of DAPLEX to the programminglanguage ADA (see Shipman [1981], Smith et al [1981]). Although the implementation was still under way at thetime of this writing, the design includes a number of the ideas used in our implementation, e.g. clustering of data onthe basis of entities. They do not maintain back-pointers to assist in updating entity references when data is movedor the entity is destroyed, instead using a level of indirection in an "entity directory".Chen, P. The Entity-Relationship Model - Towards a Unified View of Data, ACM Transactions onData Base Systems 1,1 (January 1976).This was the earliest widely-referenced work on extensions to the relational model. It introduced entities and entitytypes (domains), and showed how they could be integrated with relationships, relations, and values.Codd, E. F. A Relational Model of Data for Large Shared Data Banks, CACM 13, 6 (June 1970).This was the earliest widely-referenced work defining the relational model. It introduced relationships, relations(relationship types), values, and value types.Codd, E. F. Extending the Database Relational Model to Capture More Meaning, ACMTransactions on Data Base Systems 4,4 (December 1979).Codd assembles several proposed extensions to the relational model. The biggest contribution of this paper is inrecognizing that the operations on data are at least as important a component of the data model as the structuraldata definition, a fact most of the literature has ignored. Codd suggests such operations.Demers, A., Donahue, J., and Skinner, G. Data Types as Values: Polymorphism, Type-checking,Encapsulation, Conference Record 5th ACM Symposium on Principles of Programming Languages,1978.Propose some more powerful mechanisms for types in programming languages. Russel, the authors' language,supports these mechanisms (but is not yet implemented).Fagin, R. Multivalued dependencies and a new normal form for relational databases. ACMTransactions on Database Systems 2, 3, 1977.Fagin, R. A Normal Form for Relational Databases that is based on Domains and Keys. ACMTransactions on Database Systems 6, 3, 1981.%fps Fp _9sF Zp40 Xxr6p VD (p Ss1D RE Mp(5 Kr=p I s\ G b FU D}+K B=8 AuZ =p!)r ;ptp 9 s,J 7c 3p1rptp 1Us^ /. ,p ? r )!ptp 'is*G %'J $a[ pD tr6 @p sR A7 p Kr T!tp xp:r D!tp  x =\"DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL108The above two papers by Fagin describe two normal forms for databases, the second more restrictive than the first,and the first more restrictive than Codd's [1970].Feldman, J., & Rovner, P. An Algol-based Associative Language, CACM 12, 8 (August) 1969.The LEAP language described here has many of the properties of more recent data models, and some interestinglanguage-integrated ideas about access to the data (looping constructs and user-defined procedures for matchingtuples). LEAP uses binary relations.Hall, P., Oulett, J., & Todd, S. Relations and Entities, Proceedings of the IFIP TC-2 WorkingConference on Modelling in Data Base Management Systems, 1976.An earlier discussion of entities and the irreducible relations which Biller [1979] further formalizes.Hammer, M. & McLeod, D. The Semantic Data Model: A Modelling Mechanism for Data BaseApplications, ACM Transactions on Database Systems 6, 3 (September) 1981.A more recent summary of the SDM data model described in McLeod's PhD disseration [1978].Herot, C. The Spatial Data Management System. Proceedings International Conference on VeryLarge Data Bases, Rio de Janeiro, 1981.Describes a novel user interface to a database system allowing browsing in a spatial layout of entities.Israel, J., Mitchell, J., Sturgis, H. Separating Data from Function in a Distributed File System,Technical Report CSL-78-5, Xerox Palo Alto Research Center, 1978.Describes Juniper, the transaction-based file system on which the Cedar DBMS was constructed.Kapp, D., & Leben, J. IMS Programming Techniques, Van Nostrand, 1978.Describes IBM's IMS, the most widely used commercial database system, and it's access language DL/I. IMS uses aform of hierarchical data model, and comes in a number of versions (IMS/VS, IMS360) with somewhat differentfeatures.Katz, R. and Wong, E. Heterogenous Data Models - Part 1: Semantic Issues, ERL memoUCB/ERL m79/56, UC Berkeley, 1979.Discusses issues in providing an integrated database system composing database system components based on differentdata models. This is a hard and generally unsolved problem.Kent, W. Data and Reality, North-Holland, 1978.This book has an excellent thorough discussion of philosophical and epistemological aspects of entities, entity types,relations, and issues only briefly mentioned in most recent data model literature. There are little or no new ideashere, the purpose is to revisit questions inadequately studied elsewhere, e.g. "What is an entity?".Kent, W. Limitations of Record-based Information Models, ACM Transactions on DatabaseSystems, 4,1 (March 1979).A good summary of limitations of the relational model, and some of the advantages of more recent data models.vfpsF psps pspspXsFps Mp _9sU ]2 Yp@rptp WsQ Uus$% Ty% Pp:r  N7p Lsg HYpU F$p r$prp CsY ?pr, =p ;Ash 7p;' 5UA 2s] /!prp ,sN" +"us'< ) %p#0 #" !6ss < p rp ~s[ ;9 vd p:r prp sm M =ZgBIBLIOGRAPHY109Kent, W. Choices in Practical Data Desgin, Proceedings 8th International Conference on Very LargeDatabases, Mexico City, (September) 1982.Discusses some important unsolved issues in data modelling and schema design. One issue is whether an entityname composed of multiple parts or a relation with multiple attributes should be collapsed into a singlename/attribute or kept separate (examples are month/day/year, or city/state/country). Another issue is is whether toencode information as type or as data (for example, whether to include sex as a property of persons, or to declaredifferent subtypes of Person for men and women). Other issues are "nameless" entities and uniqueness of entitynames when there are several subtypes of a domain.Kent, W. The Entity Join, Proceedings 5th International Conference on Very Large Databases, 1979.Discusses the distinction between entities and their names.Kerschberg, L., Klug, A., Tsichritzis, D. A Taxonomy of Data Models, Systems for Large DataBases, P. Lockemann & F. Neuhold (eds), North-Holland, 1976.This paper is a brief survey of 23 data models along some global dimensions: graph-theoretic versus set-theoretic,mathematical foundation, terminology, and level of abstraction.King, R., McLeod, D. Semantic Database Models, Database Design, S. B. Yao, Ed., 1982.A recent survey of data models (see also Lochovsky & Tsichritzis [1982]).LeDoux, C., and Parker, D. Reflections on Boyce-Codd Normal Form. Proceedings 8thInternational Conference on Very Large Data Bases, Mexico City, 1982.A recent study of relational normalization.Lindgreen, P. Basic operations on information as a basis for data base design, Proceedings IFIPCongress, North Holland, 1974.A binary-relation data model.McLeod, D. A Semantic Database Model and its Associated Structured User Interface, PhD Thesis,EE & CS, MIT, 1978.Describes the SDM data model, including most of the ideas in the Cypress data model and some other ones, such asthe introduction of events.Mitchell, J., Maybury, W., Sweet, R. Mesa Language Manual. Xerox Palo Alto Research CenterReport CSL-79-3, 1979Describes Mesa 5.0, the language in which the Cedar database system was originally implemented. No reference isexternally available at the time of this writing for Cedar Mesa, the language in which the Cypress database system isimplemented. The Cedar Mesa language of course strongly influenced the design of the programmer's interface tothe Cypress data model.Mylopoulos, J., Bernstein, P., and Wong, H. A Preliminary Specification of TAXIS: A Languagefor Designing Information Systems, TR CCA-78-02 (January 1978).Integrates a number of the concepts discussed in recent data models, including generalization hierarchies andspecialization.%fps Fp _#r6 \p Zfs _ X;- W^)L U%M TV$K R2 Npr@p LXs; Hp-r Fkp6 Cs99 Bl? >p0rp <8sI 8p*r 6K1p 3s+ 0pB r -p +is 'p-2 %| #s@0 !}us p3)  sY 4A  N!  pN ? si  y=]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL110Pirotte, A. "The Entity-Property-Association Model: An Information-Oriented Data Base Model",MBLE Report R343, (March) 1977.Another early data model.Quillian, M. R. Semantic Memory, in M. Minsky, (ed.), Semantic Information Processing, MITPress, 1968.This is the earliest widely-referenced work with semantic nets, which have been used widely in Artificial Intelligence.Semantic nets, as they are generally used, allow entities and relationships but lacked some capabilities the relationalmodel allows (e.g. queries on relations).Redell, D, et al. Pilot: An Operating System for a Personal Computer, Proceedings 7th Symposiumon Operating Systems Principles, Asilomar, (ACM order #534790) 1979.Describes the Pilot operating system, on which the Cedar programming environment and Cedar DBMS depend. ThePilot file system provides a local transaction-based access to data for the Cedar DBMS (Juniper, described by Israelet al [1979], provides a remote transaction-based file system).Rissanen, J. Independent Components of Relations, ACM TODS 2, 4 (December) 1977.A study of what Biller [1979] calls a-irreducible relational form, which is less restrictive than the irreducible form weuse.Roussopoulos, N., and Mylopoulos, J. Using Semantic Networks for Database Management,Proceedings 2nd International Conference on Very Large Databases, September 1975.Discusses a data model based on semantic networks (Quillian [1968]).Sadri, F., Ullman, J. The Interaction between Functional Dependencies and TemplateDependencies. Proceedings ACM SIGMOD Conference 1980, Santa Monica.Describes more complex kinds of dependencies and relational normal forms.Schmidt, H. A., & Swenson, J. R. On the Semantics of the Relational Data Model, in King, F.(ed.), Proceedings ACM SIGMOD Conference, San Jose, 1975.An early paper discussing shortcomings of the relational model.Senko, M. E. Data Structures and Data Accessing in Data Base Systems Past, Present, Future, IBMSystems Journal 16, 3 (1977a).A binary-relation data model with separate physical, conceptual, and user levels.Shipman, D. W. The Functional Data Model and the Data Language DAPLEX, ACM Transactionson Database Systems 6, 1 (March) 1981.Describes a data model, the functional data model, which is superficially quite different but actually very similar tothe Cypress model. The Functional data model takes an attractively simple point of view, decomposing relationshipsinto functions mapping values to values. Thus all the notation and terminology of mathematical functional notationis available for use. The "properties" described in this paper have some similarities to this idea, but Shipman makesthis the basis for the model.vfpsF psps pspspXsFps Mp _L \ Zfs Vprp Ty Qsi Pzl N) K>p3r I p% FsT E C1 C? ?p3rp =Ss o ; 8pA 5rAp 3CsD /hpEF9 -3r(p *sI &p2* $r"p !s?  p=#  8sQ \p =r (tpr s+K ) h us(= 'O y M 2=]BIBLIOGRAPHY111Shoshani, A. CABLE: A Language Based on the Entity-Relationship Model, Lawrence BerkeleyLab, Berkeley, California, 1978.CABLE is a query language for Chen's Entity-Relationship model, based on chains of relationships between entities.For example, one might ask for all the people that are managers of departments that produce parts that are sold toCompany X. Because the chains seem very natural for expression of queries, we are considering a similar kind oflanguage for use in Cypress.Smith, J.M., and Smith, D.C. Data Base Abstractions: Aggregation and Generalization, ACMTransactions on Database Systems 2, 2 (June) 1977a.A discussion of the idea of hierarchies of types.Smith, J.M., and Smith, D.C. Integrated Specifications for Abstract Systems, TR UUCS-77-112,Computer Science, University of Utah, September 1977b.Concerned with combining structural and behavioral specifications.Smith, J., Fox, S., Landers, T. Reference Manual for ADAPLEX, Technical Report, ComputerCorporation of America, January 1981.Describes ADAPLEX, embedding the DAPLEX (Shipman[1981]) language in the Ada programming language.Stonebraker, M., et al. The Design and Implementation of INGRES, ACM Transaction onDatabase Systems 2, 3 (September 1976).INGRES is one of the two most widely known implementations of the relational data model (the other is System R,Astrahan et al [1976]).Stonebraker, M. Hypothetical Databases as Views, ACM Proceedings ACM SIGMOD 81, AnnArbor (April 1981).Further work on INGRES referenced above has suggested ways to define integrity constraints and hypotheticaldatabases through the use of views, although there may be problems with doing this as an "add-on". Hypotheticaldatabases are described in this reference.Sundgren, B. Conceptual foundation of the infological approach to data bases, Data BaseManagement, J. Klimbie and K Koffeman, Eds, North Holland, 1974.Early paper describing the infological data model, which has not been fleshed out to the degree of most of the othermodels discussed. The basic primitives are objects (entities) and their properties.Tsichritzis, D.C. LSL - A link and selector language, Proceedings ACM SIGMOD 76, 1976.A query language for a network model database.Tsichritzis, D., and Klug, A. (Eds). The ANSI/X3/SPARC DBMS Framework. Report of theStudy Group on Database Management Systems. Information Systems 3, 4 (1978).A comprehensive data model in the Entity-Relationship family.%fps Fp _C \ Zfsb X?3 W^R U QpEr Op Ms1 I-p Q F6 DZsB @~pP >J% ;sa 8p/r 5tp 3gs*E 1 -p2rp + )4sD' '2> & * "-pMr  p5 }su sN psA p7rp \s. pH L-rp s= g=[DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL112Tsichritzis, D., and Lochovsky, F. Data Models, Prentice-Hall, 1982.This book is probably the most extensive and up-to-date discussion of data models currently available. It deals withconceptual models rather than physical models and complete systems, since many of the models described have notbeen implemented. The book covers the relational, network, hierarchical, entity-relationship, binary, semanticnetwork, and infological data models.Wiederhold, G. Database Design, McGraw-Hill, New York, 1977.A good description of database systems from the point of view of implementation and physical access structures asopposed to the conceptual data model.Wiederhold, G., El-Masri, R. The Structural Model for Database Design, Proceedings of theInternational Conference on the Entity-Relationship Approach to Systems Analysis and Design, LosAngeles, (December) 1979.Describes the "structural" data model, further described in El-Masri's PhD dissertation. The structural modelclassifies relations according to their semantic use, much as in Codd's [1980] extensions of the relational model.Wong, H., and Mylopoulos, J. Two Views of Data Semantics: A Survey of Data Models inArtificial Intelligence and Database Management. INFOR 15, 3 (October 1977).Contrasts knowledge representation languages and data models.Zloof, M. M. Query by Example: The Invocation and Definition of Tables and Forms.Proceedings First International Conference on Very Large Data Bases, 1975.The first widely-referenced work on querying databases by filling out forms, an idea used in Squirrel queryerwindows.vfpsF psps pspspXsFps Mp _$r p \sf [&I Y&I W% Tprp Qs\ O% Lp.r I?p G E-sn Cr ?pR =v2rp :s= 6p3 4rCp 2Ls94 0x M0^>711310. Appendix: Axioms and formal definitionsThe definition of a database includes a set of domains D (the domain domain), a set of relations R(the relation domain), a set of attributes A (the attribute domain), and a set of datatypes T (thedatatype domain). We also define a subdomain function sd mapping D to D. The function sd mustdefine a partial ordering of D.We define the fundamental type constraint for a database:The Attribute Type Constraint. (a) Every relationship in a relation R must have the samenumber of elements. We assign identifiers for corresponding elements of the relationshipsbelonging to R, the attributes a1, a2, a3, ... aN. (b) For any of these attributes ai, the valueassociated with ai for every relationship in R must be an element of the same datatype ordomain di, or of a domain si in the transitive closure of sd applied to di.Because every relationship in a relation R has the same attributes, we may unambiguously refer tothese as simply R's attributes. Every relationship in R has a value of the same type associated withan attribute A, so we define a function at mapping A to the union of D and T, and call at(A) A'stype. Relation keysEach data schema includes a uniqueness function au mapping A to the set {none, keypart}. If arelation R has M attributes, of which some subset a1, a2, ... aN (where N < M) have au(ai) =keypart, then no more than one relationship in R is permitted to have the same value associatedwith attributes a1, a2, ... aN..NormalizationFor a relation R of order N, denote the predicate whose truth in the represented world correspondsto the presence of a relationship in the relation by Rp[a1, a2, ... aN]. The arguments ai correspond tothe attributes of R. We then define:A relation R is in irreducible form iff it cannot be split into relations R1, R2, ... Rk such thatfor all possible states of the represented world:Rp[x1, x2, ... xN] <-> R1p[xi1, ... xik] & ... & Rkp[xj1, ... xjl]where{xi1, ... xik} I {x1, x2, ... xN} ...{xj1, ... xjl} I {x1, x2, ... xN} Ffp ^q, [p.rp)r Yp rp0rp W^7rprprprp Trp Qp9NFsp(sp L<I spsIPtIsIPtIsIPtIsIPtIspsIPtIp GsG?tGpsp(EtsDtEtpsDuEtprp sDuEtp Bp sp7 ?spsp- = sprprprprp rpspsp ;spsp 6s 3prprp" 12s12t1s12t1s12t1p rp12u1p /E -zs,t-zs,t-zs,t-zp ){s &spR $>( $u$>p#u$>p#u$>p#u$>p#u$>p "-%ps p-QupQupQup 1~p u~pu~pu~pu~prpu ~pu~pu~pvpvpu ~pu~pu~pSKuKpuKpvpuKpuKpuKpf ^ u ^p u ^pvp u ^p u ^p u ^p >Q[]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL114A relation is in functionally irreducible form iff (1) it is in irreducible form or (2) it is the naturalrelational join on a key attribute of two or more binary relations, and there is a dependency betweenthe other (non-joined) attributes a1, a2, ... aN such that the semantics of attribute ai is dependentupon the value of ai+1, for i = 1, 2, .. N-1.AxiomsBelow are the beginnings of an equational specification of the DB interface developed by JimDonahue of the Computer Science Lab. Currently missing are complete descriptions of the schemaoperations (DeclareDomain, DeclareRelation, DeclareAttribute, . . .), some of the detailed descriptionof the behavior of operations on erroneous arguments (the type-checking of RelationSubset, forinstance), and any discussion of segments, transactions and initialization. A complete description willappear in a forthcoming paper.In this style of specification, we present the semantics of the operations in terms of behaviorallyequivalent programs composed of sequences of Cypress operations. Since the only way you caninterrogate entities is through DomainSubset, the only important property of DeclareEntity is that itmay affect the value of a later invocation of DomainSubset. The equations specify this by statingthat certain sequences of Cedar statements are equivalent.In the following, we will use the abbreviation(s1; . . .; sn) THEN e1 ~ e2where s1, ..., sn are Cedar statements and e1 and e2 are Cedar expressions to mean that{ s1; ...; sn; RETURN[e1] } ~ { s1; ...; sn; RETURN[e2] }(Quite often in what follows, the equivalences will be between Cedar procedure bodies; if Cedarwere an expression language then the equivalences would be somewhat simpler to state.)Basic Operations on EntitiesThe fundamental property of entities is that no two entities with the same name and same domaincan exist (this is not true for relationships). The operations that manipulate entities include:DeclareEntity -- add (or look up) an entity in a particular domainDestroyEntity -- remove an entity from a domainNameOf -- give the name of an entityvfpuF pupu pupupXuFpu Mp _spsp3 \8- Z#Z uZpZ uZpZ uZpZ uZp XXuXp Ts QpU N7( Lf Jj> H6;- F B%p V ?\ =14 ;b 9T: 4p./p/hu/p /hu/p/hu/pvp/hu +ip*u+ip*u+ip*u+ip*u+ip#&p&Ou&p&Ou&p &Ou&pvp&Ou&p&Ou&p &Ou&p K V s pS afB2/ $r M >Q\U5APPENDIX115DomainOf -- give the domain of an entityEq -- compare two entitiesNull -- has an entity been destroyed?DomainSubset -- produce the subset of a domain with names in a given rangeNextEntity -- sequence through an entity setReleaseEntitySet -- free the storage used by entity set (affects performance only)DeclareEntityDeclareEntityThe basic operation on entities is DeclareEntity, which either creates a new entity in a particulardomain or returns a previously existing one. Its semantics is given in the following equations:1. Declare produces an entity with the right name and domain:(e: Entity = DeclareEntity[d, name, NewOnly]) THEN [ DomainOf[e], NameOf[e] ] ~ [ d, name ]2. NewOrOld is only a convenience:DeclareEntity[d, name, NewOrOld] ~{ e: Entity = DeclareEntity[d, name, OldOnly]; IF e = NIL THEN RETURN[DeclareEntity[d, name, NewOnly]] ELSE RETURN[e] }3. Declare OldOnly is the same as domain subset:DeclareEntity[d, name, OldOnly] ~ { es: EntitySet = DomainSubset[d, name]; e: Entity = NextEntity[es]; ReleaseEntitySet[es]; RETURN[e] }4. Declare NewOnly fails if the entity already exists:IF DeclareEntity[d, name, OldOnly] # NIL THEN DeclareEntity[d, name, NewOnly] ~IF DeclareEntity[d, name, OldOnly] # NIL THEN SIGNAL Error[AlreadyExists]5. Declare doesn't work on system domains:DeclareEntity[DomainDomain, name] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareEntity[RelationDomain, name] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareEntity[AttributeDomain, name] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareEntity[DataTypeDomain, name] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareEntity[IndexDomain, name] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareEntity[IndexFactorDomain, name] ~ SIGNAL Error[ImplicitSchemaUpdate]6. DeclareEntity doesn't work for NIL or null arguments:DeclareEntity[NIL, name] ~ SIGNAL Error[NILArgument]IF Null[d] THEN DeclareEntity[d, name] ~IF Null[d] THEN SIGNAL Error[NullifiedArgument]DestroyEntityDeclare may put something in the database; Destroy removes it.1. An entity that is destroyed will not be found:( d: Domain = DomainOf[e]; name: ROPE = NameOf[e]; DestroyEntity[e] ) THEN'fpuFp_(\Z%XxJVD,T6Q Ms Jp"A H` `F$=`DZ2`Bvp `?"`>&!v`<\p.`:H `70`6(v`4^p)`2@ `/6`.*>v`,_pI `)*`'"vp#`&,$vp#`$a%vp#`"$vp#` !vp#`'vp# `c8`vp`'v`p/ (s  p> ` 1` JV p>Q\DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL116DeclareEntity[d, name, OldOnly] ~ NIL2. All relships that reference an entity are destroyed when the entity is destroyed:(SetF[rship, attr, e]; DestroyEntity[e]) THEN Null[rship] ~ TRUE3. Once destroyed, an entity becomes Null:(DestroyEntity[e]) THEN Null[e] ~ TRUE(DestroyEntity[e1]) THEN Eq[e1, e2] ~ Null[e2]4. Destroy doesn't work on NIL or null arguments (can't destroy the same entity twice):DestroyEntity[NIL] ~ SIGNAL Error[NILArgument]IF Null[e] THEN DestroyEntity[e] ~IF Null[e] THEN SIGNAL Error[NullifiedArgument]DomainSubsetDomainSubset returns an EntitySet that, if cycled through using NextEntity, contains all of theentities in the domain having names in the specified range, and only those entities.1. DomainSubset finds only those entities having names between low and high:( es: EntitySet = DomainSubset[d, low, high]; e: Entity = DeclareEntity[d, name, NewOnly] ) THENDomainSubset[d, low, high] ~IF name < low OR name > high THEN es ELSE DomainSubset[d, low, high]2. DomainSubset finds all those entities having names between low and high:( e: Entity = DeclareEntity[d, name, NewOnly] ) THENname >= low AND name <= high ~{ es: EntitySet = DomainSubset[d, low, high]; FOR next: Entity _ NextEntity[es], NextEntity[es] UNTIL next = NIL DO IF Eq[next, e] THEN {ReleaseEntitySet[es]; RETURN[TRUE]} ENDLOOP; ReleaseEntitySet[es]; RETURN[FALSE] }Equality of entitiesTwo entities are equal iff they are both null or their names are the same and they are in the samedomain.Eq[e1, e2] ~{ IF Null[e1] OR Null[e2] THEN RETURN[ Null[e1] AND Null[e2] ]; RETURN[ Rope.Equal[NameOf[e1], NameOf[e2]] AND Eq[DomainOf[e1], DomainOf[e2]]] }Basic Operations on RelationshipsRelationships are tuples; the basic operations on relationships include setting and fetching the valuesof fields and searching the database for all tuples whose fields have values in specified ranges. Theoperations that manipulate relationships include:vfpuF pupu pupupXuFpu Mp`_vp `\wT`Z:vp `X*`VDvp`Ty$vp `QW`Pvp`NF!v`L{p/ Hs Dp9& BT `?L`>&-`<\4`:v`8pD `6(K`4^4`2v`0p-`.G`-3<`+i `)' %s !p@" @ vup?0$ s! p\ @& 1 M L>Q\APPENDIX117DeclareRelship -- add (or look up) a relationship in a particular relationDestroyRelship -- remove a relationship from a relationRelationOf -- give the relation of a relationshipEq -- compare two relationshipsNull -- has a relationship been destroyed?RelationSubset -- produce the subset of a relation with fields in a given range of valuesNextRelship -- sequence through a relationship setReleaseEntitySet -- free storage used by a relationship set (affects performance only)Unlike entities, it is possible to have the several tuples in the database with the same set of fieldvalues. (One can protect against this by declaring some of the attributes for a relation to be keys.Then checking is performed when declaring new relationships or setting the fields of existingrelationships to guarantee that no other tuple exists with the same value for a key attribute. Theequations given below ignore this additional complexity.) Allowing multiple tuples with the samevalues appears in the lack of an equation of the form "DeclareRelship followed by DeclareRelshipwith the same arguments and version OldOnly fails."DeclareRelshipMost of these equations closely parallel the specification of DeclareEntity.1. Declare produces a relationship with the proper values for each specified field:( init: AttributeValueList = LIST[ [a1, v1], ..., [an, vn] ]; rs: Entity = DeclareRelship[r, init] ) THEN GetF[ rs, ai ] ~ vi2. Declare NewOrOld is a convenience:DeclareRelship[r, list, NewOrOld] ~{ rs: DeclareRelship[r, list, OldOnly]; IF rs = NIL THEN RETURN[DeclareRelship[r, list, NewOnly]] ELSE RETURN[rs] }3. Again, Declare OldOnly is the same as the subset operation this time, though, itmay fail because of a multiple match:DeclareRelship[r, list, OldOnly] ~{ rset: RelshipSet = RelationSubset[r, list]; rs: Relship = NextRelship[rs]; IF NextRelship[rset] = NIL THEN { ReleaseRelshipSet[rs]; RETURN[rs] } ELSE { ReleaseRelshipSet[rset]; SIGNAL Error[MultipleMatch] } }4. Declare doesn't work on system domains:DeclareRelship[aRelation, list] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareRelship[aType, list] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareRelship[aUniqueness, list] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareRelship[aLength, list] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareRelship[aLink, list] ~ SIGNAL Error[ImplicitSchemaUpdate]DeclareRelship[dSubType, list] ~ SIGNAL Error[ImplicitSchemaUpdate]'fpuFp_J\7Z1XxVD*TAQ2O6 LX JjD H6> Fc CS AC ?d3 ;s 8pL `5pS`4p=`2L=vp `/%`-p"v`,p'`*NL `'8'Y(='`%%`$p!v`"Pp-` `H`B `R*`pvp#`vp#`"vp#`(vp#`]vp#`vp# x K?QXDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL1185. DeclareEntity doesn't work for NIL or null arguments:DeclareRelship[NIL, list] ~ SIGNAL Error[NILArgument]IF Null[r] THEN DeclareRelship[r, list] ~IF Null[r] THEN SIGNAL Error[NullifiedArgument]DestroyEntityBecause duplicates are allowed, the effect of Destroy is somewhat different. Declaring a relshipNewOnly and destroying it has no effect if the relation is in the schema (note that the initializationof the fields of the relship is described below as applications of SetF).1. Destroying a newly created relationship has no effect:{ rs: Relship = DeclareRelship[r, [], NewOnly]; DestroyRelship[rs] } ~[] _ DeclareRelation[ NameOf[r], SegmentOf[r], OldOnly ]2. If an existing relationship is destroyed, then it will not be found:( rs: Relship = DeclareRelship[r, list, OldOnly]; DestroyRelship[rs] ) THENDeclareRelship[r, list, OldOnly] ~ NIL3. Once destroyed, an relationship becomes Null:(DestroyRelship[rs]) THEN Null[rs] ~ TRUE(DestroyRelship[rs1]) THEN Eq[rs1, rs2] ~ Null[rs2]4. Destroy doesn't work on NIL or null arguments (can't destroy the same entity twice):DestroyRelship[NIL] ~ SIGNAL Error[NILArgument]IF Null[rs] THEN DestroyRelship[rs] ~IF Null[rs] THEN SIGNAL Error[NullifiedArgument]RelationSubsetRelationSubset returns a RelshipSet that, if cycled through using NextRelship, contains all of therelationships in the relation having field values in the specified ranges.1. RelationSubset finds only relships satisfying the AttributeValueList:( init: AttributeValueList = LIST[ [a1, v1], . . ., [an, vn] ] search: AttributeValueList = LIST[ [a1, l1, h1], . . ., [an, ln, hn] ]; rset: RelshipSet = RelationSubset[ r, search ]; [] _ DeclareRelship[r, init, NewOnly ] ) THEN RelationSubset[ r, search ] ~ IF v1 < l1 OR v1 > h1 OR . . . vm < lm OR vm > hm THEN rset ELSE RelationSubset[ r, search ]2. And RelationSubset finds all relships satisfying the AttributeValueList:( init: AttributeValueList = LIST[ [a1, v1], . . ., [an, vn] ] search: AttributeValueList = LIST[ [a1, l1, h1], . . ., [an, ln, hn] ]; rs: DeclareRelship[r, init, NewOnly]; rset: RelshipSet = RelationSubset[ r, search ] ) THEN v1 >= l1 AND v1 <= h1 AND . . . vm >= lm AND vm <= hm ~ { FOR next: Relship _ NextRelship[rs], NextRelship[rs] UNTIL next = NIL DO IF Eq[ next, rs ] THEN { ReleaseRelshipSet[rs]; RETURN[TRUE] } ENDLOOP; ReleaseRelshipSet[rs]; RETURN[FALSE] }vfpuF pupu pupupXuFpu Mp `_8`]Kpvp`[(v`Yp/ Vs SpA Qq=) O=I `Lp9`JpEv`I p8 `FkG`DpK`B!vp `@70`>mp#vp`<(vp `:W`89vp`6o$v`4p0 1s .pb ,_J `)H`'>`&,I`$a1`"/` v`p3`7, `K`>`I`9'`n7`8v`p)$`B` D ` y) M 2>Q]APPENDIX119SetF and GetFSetF changes the fields of a tuple, GetF reads them. Here are also the axioms that describe thetype-checking performed on assignment.1. Setting the field of one relship changes its value for all equal relships (Eq for relships isessentially alias):{ SetF[rship1, attr, val]; RETURN[ GetF[rship2, attr] ] } ~{ oldVal: Value = GetF[rship2, attr]; SetF[rship1, attr, val]; RETURN[ IF Eq[rship1, rship2 ] THEN val ELSE oldVal ] }2. Setting a field with its current value is a no-op (except for errors):SetF[ rship, attr, GetF[rship, attr] ] ~{ IF Eq[RelationOf[rship], aRelation] THEN SIGNAL Error[ImplicitSchemaUpdate]; IF Eq[RelationOf[rship], aType] THEN SIGNAL Error[ImplicitSchemaUpdate]; IF Eq[RelationOf[rship], aLength] THEN SIGNAL Error[ImplicitSchemaUpdate]; IF Eq[RelationOf[rship], aUniqueness] THEN SIGNAL Error[ImplicitSchemaUpdate]; IF Eq[RelationOf[rship], aLink] THEN SIGNAL Error[ImplicitSchemaUpdate]; IF Eq[RelationOf[rship], ifIndex] THEN SIGNAL Error[ImplicitSchemaUpdate]; IF Eq[RelationOf[rship], ifAttribute] THEN SIGNAL Error[ImplicitSchemaUpdate]; [] _ GetF[rship, attr] }3. Declare NewOnly can be defined in terms of setting the fields one at a time:DeclareRelship[ r, LIST[ [a1, v1], . . ., [an, vn] ], NewOnly ] ~{ rship: Relship = DeclareRelship[ r, [], NewOnly ]; SetF[ rship, a1, v1 ]; . . .; SetF[ rship, an, vn ]; RETURN[rship] }4. GetF fails if the attribute belongs to the wrong relation:[] _ GetF[rship, attr] ~{ IF rship = NIL OR attr = NIL THEN SIGNAL Error[NILArgument]; IF Null[rship] OR Null[attr] THEN SIGNAL Error[NullifiedArgument]; { rel1: Relation = RelationOf[rship]; rel2: Relation = V2E[ GetP[ e: attr, aIs: aRelationIs, aOf: aRelationOf ] ]; IF NOT Eq[rel1, rel2] THEN SIGNAL Error[ IllegalAttribute ] } }The type-checking that goes on when a SetF is performed is fairly straightforward: If the value tobe assigned is from one of the primitive types, then the type of the attribute (found in the aTyperelation) must be identical. In the case of assigning entity values, the domain of the value must be asubtype of the domain that is the type of the attribute. We define this by a set of similar equationsfor each of the primitive types and one for the entity-valued case.5. Integer type-checking:{ rship: Relship = DeclareRelship[r, [], NewOnly]; SetF[ rship, attr, I2V[integer] ]; DestroyRelship[rship] } ~{ [] _ GetF[rship, attr]; { type: Entity = V2E[ GetP[ e: attr, aIs: aTypeIs, aOf: aTypeOf ] ]; IF NOT Eq[type, IntType] THEN SIGNAL Error[MismatchedAttributeValueType] }}'fpuFp _s \p` Y& `W;+5`Up`S:v`Qp%`P`NF: `KI`I'v`HpN`FHJ`D}L`BP`@J`?L`=SP`; `8O`7@v`5Up4`3G `0=`/!v`-Vp>`+D`)'`'Q`&,D #$Z N 61 !E RC ``2`>v`Tp`F`)'  x>QYDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL1206. Boolean type-checking:{ rship: Relship = DeclareRelship[r, [], NewOnly]; SetF[ rship, attr, B2V[boolean] ]; DestroyRelship[rship] } ~{ [] _ GetF[rship, attr]; { type: Entity = V2E[ GetP[ e: attr, aIs: aTypeIs, aOf: aTypeOf ] ]; IF NOT Eq[type, BoolType] THEN SIGNAL Error[MismatchedAttributeValueType] }} 7. String type-checking:{ rship: Relship = DeclareRelship[r, [], NewOnly]; SetF[ rship, attr, S2V[string] ]; DestroyRelship[rship] } ~{ [] _ GetF[rship, attr]; { type: Entity = V2E[ GetP[ e: attr, aIs: aTypeIs, aOf: aTypeOf ] ]; IF NOT Eq[type, StringType] THEN SIGNAL Error[MismatchedAttributeValueType] }} 8. Time type-checking:{ rship: Relship = DeclareRelship[r, [], NewOnly]; SetF[ rship, attr, T2V[time] ]; DestroyRelship[rship] } ~{ [] _ GetF[rship, attr]; { type: Entity = V2E[ GetP[ e: attr, aIs: aTypeIs, aOf: aTypeOf ] ]; IF NOT Eq[type, TimeType] THEN SIGNAL Error[MismatchedAttributeValueType] }}9. Entity type-checking (AnyDomainType is a wild card):{ rship: Relship = DeclareRelship[r, [], NewOnly]; SetF[ rship, attr, E2V[entity] ]; DestroyRelship[rship] } ~{ [] _ GetF[rship, attr]; { type: Entity = V2E[ GetP[ e: attr, aIs: aTypeIs, aOf: aTypeOf ] ]; IF NOT (Eq[type, AnyDomainType] OR SubType[DomainOf[entity], type]) THEN SIGNAL Error[MismatchedAttributeValueType] }}RelationOfRelationOf simply gives the relation of a relationship. RelationOf[ DeclareRelship[ r, list ], NewOrOld ] ~ { [] _ DeclareRelship[ r, list, NewOrOld ]; RETURN[r] }Operations on the database schemaThe database schema is stored in a set of system domains and relations. The operations on theseobjects include:DeclareDomain -- add (or look up) a domainDeclareRelation -- add (or look up) a relationDeclareAttribute -- add (or look up) a field name of a relationDestroyDomain -- destroy a domain (and all of its entities)DestroyRelation -- destroy a relation (and all of its relationships)DeclareSubType -- define one domain to be a subtype of anotherDestroySubType -- remove the two domains from the dSubType relationvfpuF pupu pupupXuFpu Mp `_`]K2`[>v`Yp`WG`V!*' `S`Q2`O=v`N#p`LXG`JP `G`F$2`DZ;v`Bp`@G`>O `<\7`:2`8=v`6p`51G`3gH`17 .s +p7 )6v 'ip; #s! pC Q*.?;KD>CT M>QYpAPPENDIX121These operations behave much like DeclareEntity, DeclareRelship, DestroyEntity andDestroyRelship; it's just that they work on the system domains and relations.DeclareDomain/DestroyDomain1. DeclareDomain adds a new (empty) domain:( d: Domain = DeclareDomain[ name, segment, NewOnly ] ) THEN{ es: EntitySet = DomainSubset[d, NIL]; e: Entity = NextEntity[es]; ReleaseEntitySet[es]; RETURN[s] } ~ NIL2. Again, NewOrOld is a convenience:DeclareDomain[name, segment, NewOrOld] ~{ d: Domain = DeclareDomain[name, segment, OldOnly]; IF d = NIL THEN RETURN[DeclareDomain[name, segment, NewOnly]] ELSE RETURN[d] }3. Declare OldOnly is the same as domain subset on DomainDomain:DeclareDomain[name, segment, OldOnly] ~{ es: EntitySet = DomainSubset[d: DomainDomain, lowName: name, searchSegment: segment]; e: Entity = NextEntity[es]; ReleaseEntitySet[es]; RETURN[e] }4. Declare NewOnly fails if the entity already exists:([] _ DeclareDomain[name, segment, NewOrOld]) THENDeclareDomain[name, segment, NewOnly] ~ SIGNAL Error[AlreadyExists]5. Declare NewOnly changes DomainDomain:(d: Domain = DeclareDomain[name, segment, NewOnly]) THENDeclareDomain[name, segment, OldOnly] ~ d6. A domain that is destroyed will not be found:(name: ROPE = NameOf[d]; seg: Segment = SegmentOf[d]; DestroyDomain[d]) THENDeclareDomain[name, segment, OldOnly] ~ NIL7. Destroying a domain causes all of the entities in the domain to be destroyed:( es: EntitySet = DomainSubset[d, NIL]; DestroyDomain[d] ) THEN{ e: Entity = NextEntity[es]; ReleaseEntitySet[es]; RETURN[e] } ~ NIL8. Destroying a domain causes entities to become Null:( e: Entity = DeclareEntity[d, name]; DestroyDomain[d] ) THENNull[e] ~ TRUE9. Destroying a domain causes relationships that reference it to be destroyed:( SetF[rship, attr, e]; DestroyDomain[d] ) THEN Null[rship] ~ TRUE10. Destroy doesn't work on NIL or null arguments (can't destroy the domain twice):DestroyDomain[NIL] ~ SIGNAL Error[NILArgument]IF Null[d] THEN DestroyDomain[d] ~IF Null[d] THEN SIGNAL Error[NullifiedArgument]'fpuFp _0" \M Ys `Vgp+`T<`R'`Q`O=%vp `L$`J'v`I p4`G??`Et `B@`A &v`?Ap/`=v.`;@ `9 6`7B2`5x&vp `2(`18`/D&vp `,0`* A`)&vp `&sP`$?`"@vp ` ?6`t=`vp ` N`AQXDESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL122DeclareRelation/DestroyRelation1. DeclareRelation adds a new (empty) relation:( r: Relation = DeclareRelation[ name, segment, NewOnly ] ) THEN{ rs: RelshipSet = RelationSubset[r, NIL]; rship: Relation = NextRelship[rs]; ReleaseRelshipSet[rs]; RETURN[rship] } ~ NIL2. Again, NewOrOld is a convenience:DeclareRelation[name, segment, NewOrOld] ~{ r: Relation = DeclareRelation[name, segment, OldOnly]; IF r = NIL THEN RETURN[DeclareRelation[name, segment, NewOnly]] ELSE RETURN[r] }3. Declare OldOnly is the same as domain subset on RelationDomain:DeclareRelation[name, segment, OldOnly] ~{ es: EntitySet = DomainSubset[d: RelationDomain, lowName: name, searchSegment: segment]; e: Entity = NextEntity[es]; ReleaseEntitySet[es]; RETURN[e] }4. Declare NewOnly fails if the entity already exists:([] _ DeclareRelation[name, segment, NewOrOld]) THENDeclareRelation[name, segment, NewOnly] ~ SIGNAL Error[AlreadyExists]5. Declare NewOnly changes RelationDomain:(r: Relation = DeclareRelation[name, segment, NewOnly]) THENDeclareRelation[name, segment, OldOnly] ~ r6. A relation that is destroyed will not be found:(name: ROPE = NameOf[r]; seg: Segment = SegmentOf[r]; DestroyDomain[r]) THENDeclareRelation[name, segment, OldOnly] ~ NIL7. Destroying a domain causes all of the relships in the domain to be destroyed:( rs: RelshipSet = RelationSubset[r, NIL]; DestroyRelation[r] ) THEN{ rship: Relation = NextRelship[rs]; ReleaseRelshipSet[rs]; RETURN[rship] } ~ NIL8. Destroying a domain causes relships to become Null:( rship: Relation = DeclareRelship[r, [], NewOnly]; DestroyRelation[r] ) THEN Null[rship] ~ TRUE9. Destroy doesn't work on NIL or null arguments (can't destroy the relation twice):DestroyRelation[NIL] ~ SIGNAL Error[NILArgument]IF Null[r] THEN DestroyRelation[r] ~ IF Null[r] THEN SIGNAL Error[NullifiedArgument]DeclareAttributeThe important properties of DeclareAttribute are the changes made to the system relations aType,aLength, aUniqueness and aLink when a new attribute is declared. If a type, length, uniqueness orlink is specified when the attribute is looked up (Declare with version OldOnly), then theseadditional arguments are checked for consistency with the values stored in the relations. Note thatthere is no corresponding DestroyAttribute.vfpuF pupu pupupXuFpu Mp _s `\wp/`Z@`XM`W*vp `Ty$`R)v`Pp8`OA`MO `JB`H(v`Gp1`EQ)`C? `@6`?4`=S(vp `:*`8<`7(vp `42`2 A`0(vp `.MP`,D`*Lvp `(6`&OM`$ vp `!T` vp`Q#vp us mpA 9L ? L + * MT>QYAPPENDIX1231. DeclareAttribute adds a new attribute:( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )THEN DeclareAttribute[r: r, name: name, version: OldOnly] ~ a2. DeclareAttribute changes the aType relation:( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] ) THENV2E[ GetP[ e: a, aIs: aTypeIs, aOf: aTypeOf ] ] ~ type3. DeclareAttribute changes the aUniqueness relation:( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )THEN V2I[ GetP[ e: a, aIs: aUniquenessIs, aOf: aUniquenessOf ] ] ~ LOOPHOLE[uniqueness]4. DeclareAttribute changes the aLength relation:( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )THEN V2I[ GetP[ e: a, aIs: aLengthIs, aOf: aLengthOf ] ] ~ length5. DeclareAttribute changes the aLink relation:( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )THEN V2B[ GetP[ e: a, aIs: aLinkIs, aOf: aLinkOf ] ] ~ link6. Again, NewOrOld is a convenience:DeclareAttribute[r, name, type, uniqueness, length, link, NewOrOld] ~{ a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, OldOnly]; IF a = NIL THEN RETURN[DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly]] ELSE RETURN[a] }7. Declare OldOnly is the same as domain subset on AttributeDomain:DeclareAttribute[r: r, name: name, version: OldOnly] ~{ es: EntitySet = DomainSubset[d: AttributeDomain, lowName: name, searchSegment: SegmentOf[r]]; e: Entity = NextEntity[es]; ReleaseEntitySet[es]; RETURN[e] }8. Declare OldOnly checks any supplied type, length, link, uniqueness:( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )THEN DeclareAttribute[r: r, name: name, type: type', version: OldOnly] ~IF NOT Eq[type', type] THEN SIGNAL Error[MismatchedExistingAttribute]ELSE DeclareAttribute[r: r, name: name, version: OldOnly]( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )THEN DeclareAttribute[r: r, name: name, uniqueness: uniqueness', version: OldOnly]~ IF NOT uniqueness = uniqueness' THEN SIGNAL Error[MismatchedExistingAttribute] ELSE DeclareAttribute[r: r, name: name, version: OldOnly]( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )THEN DeclareAttribute[r: r, name: name, length: length', version: OldOnly] ~IF NOT length = length' THEN SIGNAL Error[MismatchedExistingAttribute]ELSE DeclareAttribute[r: r, name: name, version: OldOnly]( a: Attribute = DeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] )'fpuFp `_)`]KU`[:vp `X/`W*0`UM0vp `R5`PU`OAv`MOp `J1`HU`G9vp `D}/`BU`@5vp `>J$`<Dv`:pT`8 `7Q`5U `2C`05v`/!p2`-V/`+@ `(F`'#U`%XGv`#pE`!9`.U`cR`vp%`+`;`nU`Kv`pF`9` yU < ` 2:*]DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODEL124THEN DeclareAttribute[r: r, name: name, link: link', version: OldOnly] ~IF NOT link = link' THEN SIGNAL Error[MismatchedExistingAttribute]ELSE DeclareAttribute[r: r, name: name, version: OldOnly]9. Declare NewOnly fails if the entity already exists:([] _ DeclareAttribute[r, name, type, uniqueness, length, link, NewOrOld]) THENDeclareAttribute[r, name, type, uniqueness, length, link, NewOnly] ~SIGNAL Error[AlreadyExists]10. Declare doesn't work on NIL or null arguments:DeclareAttribute[NIL, name, type, uniqueness, length, link] ~SIGNAL Error[NILArgument]IF Null[r] THEN DeclareAttribute[r, name, type, uniqueness, length, link] ~IF Null[r] THEN SIGNAL Error[NILArgument]vfpuF pupu pupupXuFpu Mp`_Gv`]KpB`[9 `X6`WO`UMCv`Sp `P2`O%"6>9 /> <(#< /< ;A.&:/;A 9.9C/9 7%!G7y/7 6K:-5/6K 4,%4M </4 2b2^/2 1U0/1U //W// ./&- /. ,_/),/,_ *+#q*a O/* )f(Z/) 'ir'N/'i %1(%k/% $3+p#P/$ "s-( "/"s 7+X uh/ %/% }# &/} " / /8+// '$0 / ,& / 9)%v J/9 :!/ %!/ /[ HELVETICA HELVETICA  HELVETICA HELVETICAMATH  HELVETICA  TIMESROMAN LOGO TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN HELVETICA  HELVETICA  TIMESROMAN HELVETICA  HELVETICA  TIMESROMAN  TIMESROMAN  TIMESROMAN  HELVETICA  TIMESROMAN  TIMESROMAN HELVETICA  HELVETICAGACHA  TIMESROMAN TIMESROMAN  TIMESROMAN  TIMESROMAN TIMESROMAN  TIMESROMAN  HELVETICA  TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN TIMESROMAN  HELVETICA  TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN TIMESROMAN  TIMESROMAN TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN TIMESROMANMATH J n  !<&- o7G&mt{2Vfh   e%$*19AI!OW_gov} VTP;NI(P '/7?GOW ` iq z P* 7'  $*2 ;BDFHJLNP7U[_e nv~ R<j/jModelLevelDesign.pressCattell18-May-83 10:05:48 PDT: