Page Numbers: Yes   X: 530   Y: 10.5"   First Page: 3Columns: 1   Edge Margin: .6"   Between Columns: .4"Margins:   Top: 1.3"   Bottom: 1"Line Numbers: No   Modulus: 5   Page-relativeEven Heading:qk40(635)\gCYPRESS DOCUMENTATIONy756qck40\g1f1 7f0 1f1Odd Heading: Not-on-first-pageqk40\gCYPRESS DATA MODEL CONCEPTSy756qck40\g1f1 6f0 2f1 3f0 2f1 4f0 2f12. Cypress data model concepts	z18697x6e36jk48(2116)\f5bIn this section, we give an informal description of the Cypress data model.  We describe the particulars of the Cedar interface in Section 3. x6e12j(635)2.1 Data independencex6e30jk80\b21BiWe deal here with the conceptual data model: the logical primitives for data access and data type definition.  This should be carefully distinguished from the physical data storage and access mechansisms.  The physical representation of data is hidden as much as possible from the database client to facilitate data independence, the guarantee that a user's program will continue to work (perhaps with a change in efficiency) even though the physical data representation is redesigned. x6e12j\22i21I161f1 2f0 105i17IFor any particular database using our conceptual data model, the actual specification of the types of data in the database, using the primitives the model provides, is termed the data schema.  Note that a mapping must be provided between the conceptual data model and the physical representation, either automatically or with further instruction from the client; we will do some of both.  The logical to physical mapping is intimately associated with the performance of the database system as viewed by the user performing operations at the conceptual level.x6e12j\179i11I2.2 Basic primitivesx6e30jk80\b20BThree basic primitives are defined in the model:  an entity, datum, and relationship.x6e12j\53i14I5i12IAn entity represents an abstract or concrete object in the world:  a person, an organization, a document, a product, an event.  In programming languages and knowledge representation entities have variously been referred to as atoms, symbols, and nodes.  A datum, unlike an entity, represents literal information such as times, weights, part names, or phone numbers.  Character strings and integers are possible datum types.x6e12jIt is a policy decision whether something is represented as an entity or merely a datum:  e.g., an employee's spouse may be represented in a database system as a datum (the spouse's name), or the spouse may be an entity in itself.  The database system provides a higher level of logical integrity checking for entities than for datum values, as we will see later:  unique entity identifiers, checks on entity types, and removal of dependent data upon entity deletion.  We shall discuss the entity/datum choice further in Section 4.2.x6e12j\8i7IWe will use the term value to refer to something that can be either a datum or an entity.  In many programming languages, there is no reason to distinguish entity values from datum values.  Indeed, most of the Cypress operations deal with any kind of value, and some make it transparent to the caller whether an entity or datum value is involved.  The transparent case makes relational operations possible in our model, as we will see in Section 2.5.x6e12j\21i5IA relationship is a tuple whose elements are [entity or datum] values.  We refer to the elements (fields) of relationships by name instead of position.  These names for positions are called attributes.x6e12j\2i12I176i10INote that we have separated the representatives of unique objects (entities) from the representation of information about objects (relationships), unlike some object-oriented programming languages and data models.  Therefore an entity is not an "object" (or "record") in the programming language sense, although entities are representatives of real-world objects. x6e12j\116i5I117i3I80i3IWe also define entity types, datum types, and relationship types.  These are called domains, datatypes, and relations, respectively.  We make use of these three types through one fundamental type constraint:   every relationship in a relation has the same attributes, and the values associated with each attribute must be from a pre-specified domain or datatype.  One might think of a relation as a "record type" in a programming language, although relations permit more powerful operations than record types.x6e12j\84i7I2i9I6i9IAs an example, consider a member relation that specifies that a given person is a member of a given organization with a given job title, as in the following figure.  The person and organization might be entities, while the title might be a string datum.  We relax the fundamental type constraint somewhat in allowing a lattice of types of domains:  a particular value may then belong to the pre-specified domain or one of its sub-domains.  For example, one could be a member of a University, a Company, or any other type of Organization one chooses to define.  Other relations, e.g. an "offers-course" relation, might apply only to a University.x6e12j\26i6I287i7I86i2I<==<MLDFig2.1.press<x6e100jk100Relationships represent facts about the world.   A relation (a set of relationships) represents a kind of relationship in which entities and values can participate.  We will schematically represent that John Smith is a member of Acme company with title "manager" by drawing lines for the relationship (labelled with the relation and its attributes), a circle for each entity, and a box for the datum:x6e12j\98i4Ix6e12jk100<==<MLDFig2.2.press<x6e12jk100One should normally think of relations as types, not sets.  A relation can be defined as the set of all relationships of its type, however, and thus can be treated as a relation in the normal mathematical sense.  Note that we differ from mathematical convention in another minor point:  the use of attributes to refer to tuple elements by name instead of position.  Reference by position is therefore not necessary and is in fact not permitted.  We can often omit attribute names without ambiguity since the types of participating entities imply their role in a relationship.  However, they are necessary in the general case; e.g., a boss relation between a person and a person requires the directionality to define its semantics.x6e12j\634i4I20i6I7i6IWe can summarize the six basic primitives of the data model in tabular form.  Familiarity with these six terms is essential to understanding the remainder of this document:x6e12jType	Instance	Instance Represents l3168x6e12jk60(0,5216)(1,7936)\ub4U1u8U1u19UBDomain	Entity	physical or abstract object	Datatype	Datum	numerical measurement or symbolic tag	Relation	Relationship	logical correspondence between one or more objects and values	l3168x6e12j\41f5b1f0B53f5b1f0B84f5b1f0BOur terminology might be more consistent if we called a domain an "entity type," and a relation a "relationship type."  Instead we have compromised on the terms most widely used in the literature for all six of the basic concepts.  The reader will find the remainder of this document much more understandable if these six terms are commited to memory before proceeding.x6e12j(635)\230i139I2.3 Names and keysx6e30jk80\b18BiThe data model also provides primitives to specify uniqueness of relationships and entities.  For relationships, this is done with keys, and for entities, with names.  A key for a relation is an attribute or set of attributes whose value uniquely identifies a relationship in the relation.  No two relationships may have the same values for a key attribute or attribute set (a relation may have more than one key, in which case relationships must have unique values for all keys).  A name acts similarly for a domain.  The name of an entity must uniquely specify it in its domain.x6e12j\131i4I25i5I5i3I311i4IConsider a spouse relation, a binary relation between a person and a person.  Both of its attributes are keys, because each person may have only one spouse (if we choose to define spouse this way!).  For an author relation, neither the person nor the document are keys, since a person may author more than one document and a document may have more than one author.  The person and document attributes together would comprise a key, however:  there should be only one tuple for a particular person-document pair.x6e12j\11i6I190i6Ix6e12jk100<==<MLDFig2.3.press<x6e12jk100We have labelled entities with names in the figures.  Names are most useful when they are human-sensible tags for the entities, e.g. the title for a document, or the name of a person.  However, their primary function is as a unique entity identifier, so non-unique human-sensible names must be represented as relations.  If entities of a domain have more than one unique identifier, e.g. social security numbers and employee numbers, then one identifier must be chosen as the domain's entity names and the other represented as a relation connecting the entities with the unique alternate identifier (a key of that relation).x6e12j\182i1IWe require that every entity have a unique name, although the name may automatically be generated by the database system.  Thus every entity may be uniquely identified by the pair:x6e12j[domain name, entity name]. l4269x6e12j\28iSome authors in the database literature use the term entity to refer to a real-world object rather than its representation in the database system.  When we use the term entity, we refer to the internal entity, the entity "handle" returned by database operations and stored in entity-valued variables, not the external entity that the internal entity represents or the entity identifier [domain, name] that may be used to uniquely identify an internal entity.  The three are interchangeable, however, since they must always be in one-to-one correspondence.  x6e12j\193i15I101i15I44i17IThe reader may find it simple to think of entity-valued attributes of relationships as pointers to the entities, in fact bi-directional pointers, since the operations we provide allow access in either direction.  This is a useful analogy.  However, there is no constraint that the model be implemented with pointers, and the relationships of a relation could equally well be conceptualized as rows of a table whose columns are the attributes of the relation and whose entries for entity-valued attributes are the string names of the entities.  For example, the author relationships in the previous figure could also be displayed in the form:x6e12j\87i8I466i6IAuthor:l3520x6e12jk160(0,7520)(1,12544)\iPerson	Book George	Backgammon for Beginners	John	Backgammon for Beginners	John	Advanced Backgammon	Tom	Advanced Backgammon	l5708d5696x6j\u6U1u4UThus our introduction of entities to the Relational model does not entail a different representation than, say, a Network model might imply, but simply additional integrity checks on entity names and types, and new operations upon entities.  This compatibility with the Relational data model is important, as it allows the application of the powerful Relational calculus or algebra as a query language.  We return to query languages in Section 2.5.x6e12j(635)\63i3INote that the only information about an entity associated directly with the entity is the name; this contrasts with most other data models.  A person's age or spouse, for example, would be represented in the Cypress data model via age or spouse relations.  Thus the relationships in a database are the information-carrying elements:  entities do not have attributes.  However the model provide an abbreviation, properties, to access information such as age or spouse in a single operation.  We will discuss properties later.  In addition, the physical data modelling algorithms can store these data directly with the person entity as a result of the relation key information (since a person can have only one age, a field can be allocated for that field in the stored object representing a person.)x6e12j\58i8I345i10I2.4 Basic operationsz18697x6e30jk80(2116)\b20BiThe data model provides the capability to define and examine the data schema, and perform operations on entities, relationships, and aggregates of entities and relationships.  In this section we discuss the basic operations on entities and relationships.  In Section 2.5, we discuss the operations on aggregate types, i.e. domains and relations.  We defer to Section 2.6 the discussion of "convenience" operations built upon the basic and aggregate operations.x6e12j(635)Four basic operations are defined for entities:x6e12j1.	DeclareEntity[domain, name]:  Returns a new or existing entity in a domain.  An entity name must be specified.l4269d3669x6e12j\3f6 27f0 2f6 1f02.	DestroyEntity[entity]:  Destroys an entity; this also destroys all relationships that refer to it.l4269d3669x6e6j\3f6 21f0 2f6 1f03.	DomainOf[entity]:  Returns the domain of an entity (its type).l4269d3669x6e6j\3f6 16f0 2f6 1f04.	NameOf[entity]:  Returns the string name of an entity.l4269d3669x6e6j\3f6 17f0 37bFive basic operations are defined for relationships:x6e12j1.	DeclareRelationship[relation, list of attribute values]:  Returns a relationship with the given attribute values in the given relation.l4269d3669x6e12j\3f6 58f02.	DestroyRelationship[relationship]:  Destroys a relationship.l4269d3669x6e6j\3f6 36f03.	RelationOf[relationship]:  Returns a relationship's relation.l4269d3669x6e6j\3f6 27f04.	GetF[relationship, attribute]:  Returns the value associated with the given attribute of the given relationship.l4269d3669x6e6j\3f6 32f0 80b5.	SetF[relationship, attribute, value]:  Sets the value of the given relationship attribute.l4269d3669x6e6j\3f6 39f0The operations upon relationships recognize a specially-distinguished undefined value for an attribute.  Unassigned attributes of a newly-created relationship have this value.  A client of the data model may retrieve a value with GetF and test whether it equals the distinguished undefined value, and may set a previously defined value to be the distinguished undefined value with SetF.x6e12j\70i9I151f6 4f0 147f6 4f0Other "convenience" operations are built on top of the basic operations on entities and relationships:  properties and translucent attributes.  They are described in Section 2.6.  Although these operations are not essential to the basis of the Cypress model, they do furnish a fundamentally different perspective on the model.  They provide a mechanism to associate information directly with entities (instead of through relationships) and to write programs largely independent of attribute types.x6e12j\104i10I5i25IThe reader will also note that we have ignored issues of concurrent access and protection in the basic operations.  We will see later that an underlying transaction, file, and protection is associated with the relation and domain handles used in the basic operations.  This convenience allows us to treat concurrency, protection, and data location orthogonally.x6e12j2.5 Aggregate operationsx6e30jk80\b24BiThere are two kinds of operations upon domains and relations, the aggregate types in our model:  the definition of domains and relations, and queries on domains and relations.  We first discuss their definitions.x6e12j\66i15ISchema definitionx6e12jk80\iAs in other database models and a few programming languages, the Cypress model is self-representing:  the data schema is stored and accessible as data.  Thus application-independent tools can be written without coded-in knowledge of the types of data and their relationships.z18697x6e6jk40(2116)Client-defined domains, relations, and attributes are represented by entities.  These entities belong to special built-in domains, called the system domains:z18697x6e6jk40\142i6Ithe Domain domain, with one element (entity) per domainl4269d4256x6e6j(635)\4f6 13f0the Attribute domain, with one element per attributel4269d4256x6e6j\4f6 16f0the Relation domain, with one element per relationl4269d4256x6e6j\4f6 15f0There is also a predefined Datatype domain, with pre-defined elements StringType, BoolType, and IntType, called built-in types.  We do not allow client-defined datum types at present.z18697x6e6jk40(2116)\27f6 8i1I6f0 28f6 10f0i2f6I8f0i1I5f6 7f0 9i14I3f1 54f0Information about domains, relations, and attributes are represented by system relations in which the system entities participate.  The pre-defined SubType relation is a binary relation between domains and their subdomains.  There are also predefined binary relations that map attributes to information about the attributes:z18697x6e6jk40\72i16I60f6 7f0aRelation:  maps an attribute entity to its relation entity.l4269d4256x6e6j(635)\f6 9f0aType:  maps an attribute to its type entity (a domain or a built-in type)l4269d4256x6e6j\f6 5f0aUniqueness:  maps an attribute entity to {TRUE, FALSE}, depending whether it is part of a key of its relation.  We are assuming only one key per relation, here; our implementation relaxes this assumption in the case of single-attribute keys.  l4269d4256x6e6j\f6 11f0 100f1 133f0The following diagram graphically illustrates a segment of a data schema describing the member relation and several domains.  The left side of the figure shows two subdomains of Organization, (Company and University), and the right shows the types and uniqueness properties of the member relation's attributes memberOf, memberIs, and memberAs.z18697x6e6jk40(2116)\88f6 6f0 84f6 40f0 63f6 6f0 23f6 20f0 3f6 9f0ix6e12jk90(635)<==<MLDFig2.4.press<x6e12jk80New domains, relations, and attributes are defined by creating entities and relationships in these pre-defined system domains and relations.  However, our implementation provides special operations to define the data schema, to simplify error checking.  These operations are:z18697x6e6jk40(2116)DeclareDomain[name], and for each subtype relationship:z18697l4269x6e6jk40\f6 21f0DeclareSubType[superdomain, subdomain]z18697l5539x6e6jk40\f6DeclareRelation[name], and for each attribute:z18697l4269x6e6jk40\f6 23f0DeclareAttribute[name, relation, type, uniqueness].z18697l5539x6e6jk40\f6Queriesx6e12jk80(635)\iThe operation RelationSubset[relation, attribute value list] enumerates relationships in a relation satisfying specified equality constraints on entity-valued attributes and/or range constraints on datum-valued attributes.  For example, RelationSubset might enumerate all of the relationships that reference a particular entity in one attribute and have an integer in the range 23 to 52 in another.x6e12j\14f6 47f0 176f6 14f0 147bThe operation DomainSubset[domain, name range] enumerates entities in a domain.  The enumeration may optionally be sorted by entity name, or restricted to a subset of the entities with a name in a given range.x6e12j\14f6 33f0More complex queries can be implemented in terms of DomainSubset and RelationSubset.  A future implementation will provide a MultiRelationSubset operation to efficiently enumerate single queries spanning more than one relation.  MultiRelationSubset operates upon a parsed representation of the query language, and produces the same kind of enumeration as RelationSubset.  See CSL-83-4 for more details. x6e18j\52f6 12f0 5f6 14f0 42f6 19f0 85f6 19f0 107f6 14f02.6 Convenience operationsx6e30jk80\b26BiSome more convenient specialized operations are built upon the basic operations described in the previous two sections.  They implement what we call properties and translucent attributes.  Although theoretically speaking these operations add no power to the model, they permit a significantly different perspective on the data access and so should be thought of as part of the model.x6e12j\149i10I5i22IPropertiesx6e12jk80\iProperties allow the client to treat entities as if they, like relationships, had "attributes."  They provide the convenience of treating attributes of relationships that reference an entity as if they were attributes (or properties) of the entity itself.  The property operations are: x6e12j\222i10I54b1.	GetPList[entity, attribute1, attribute2]:  Attribute1 and attribute2 must be from the same relation.  Returns the values of attribute1 for all relationships in the relation that reference the entity via attribute2.  Attribute2 may be omitted, in which case it is assumed to be the only other entity-valued attribute of the relation.l4269d3669x6e12j\3f6 15f0 2f6 9f1o252 1f6o0 11f1o252 1f6o0 13f1o252 1f0o0 5f6 9f1o252 1f0o0 56f6 9f1o252 1f0o0 69f6 9f1o252 1f6o0 12f1o252 1f0o02.	GetP[entity, attribute1, attribute2]: this is identical to GetPList except exactly one relationship must reference the entity via attribute2; otherwise an error is generated.  GetP always returns one value.l4269d3669x6e12j\3f6 22f1o252 1f6o0 11f1o252 1f6o0 3f0 4f6 1f0 16f6 8f0 63f6 9f1o252 1f0o0 36f6 4f03.	SetPList[entity, attribute1, value list, attribute2]:  Attribute1 and attribute2 must be from the same relation.  Destroys any existing relationships whose attribute2 equals the entity, and creates new ones for each value in the list, with attribute1 equal to the value, and attribute2 equal to the entity.  Attribute2 can be defaulted as in GetPList.l4269d3669x6e12j\3f6 26f1o252 1f6o0 23f1o252 1f6o0 13o252 1f0o0 5f6 9f1o252 1f0o0 76f6 9f1o252 1f0o0 74f6 9f1o252 1f0o0 25f6 9f1o252 1f0o0b1B22f6 9f1o252 1f0o0 24f6 8f04.	SetP[entity, attribute1, value, attribute2]:  this is identical to SetPList except it simply adds a new relationship referencing the entity instead of destroying any existing ones (unless attribute1 is a key of its relation, in which case the existing one must be replaced).l4269d3669x6e12j\3f6 22f1o252 1f6o0 18f1o252 1f6o0 4f0 21f6 8f0 113f6 9f1o252 1f0o0Thus the property operations allow information specified through relationships to be treated as properties of the entity itself, in single operations.  The property operations and the operations defined in earlier sections may be used interchangeably, as there is only one underlying representation of information:  the relationships.  As an example of the use of properties, consider the following database:x6e12j<==<MLDFig2.5.press<x6e12jk100The figure shows the entity John Smith, and three relationships in which he participates:  an age relationship and two member relationships,  The member relationships are ternary, the age relationship binary.  On this database, the property operations work as follows:x6e12j\28f6 10f0 56f6 3f0 22f6 6f0 21f6 6f0 32f6 3f0 21f1 2f0GetPList[John Smith, memberOf] would return the set {Acme Company, State University}.l4269d4256x6e12j\f6 31f0 22f6 30f0GetP[John Smith, ageIs] (or GetPList) would return 34. l4269d4256x6e12j\f6 25f0 3f6 8f0 15f6 3f0SetP[John Smith, memberOf, Foo Family] would create a new member relationship specifying John to be a member of the Foo Family.  SetPList would do the same, but would destroy the two existing member relationships referencing John.  In either case, the memberAs attribute would be left undefined in the new relationship. l4269d4256x6e12j\f6 39f0 19f6 6f0 25f6 4f0 9f6 6f0 8f6 10f0 3f6 8f0 55f6 6f0 27f6 4f0 23f6 8f0SetP[John Smith, ageIs, 35], where ageOf is a key of the age relation, would delete the relationship specifying John's age to be 34, and insert a new one specifying John's age to be 35.  Note that SetP acted differently than on the member relation because memberIs is not a key. l4269d4256x6e12j\f6 29f0 6f6 5f0 72f6 4f0 13f6 2f0 34f6 4f0 13f6 3f0 12f6 4f0 31f6 6f0 18f6 8f0Again, the property operations are simply a convenience, although they provide a different perspective on the data model by allowing an entity-based view of a database. x6e12j\168f1Translucent attributesx6e12jk80\i22ISome database application programs may not wish to be concerned with whether an attribute is entity-valued, string-valued, or integer-valued.  They might prefer to have all values mapped to some common denominator, e.g. a string.  An example would be a program that is simply displaying tuples on the screen.x6e12jAnother class of applications would like to be independent of whether a particular attribute is represented as an entity or datum value.  Consider the member relation in the previous figure.  If we choose to define an Organization domain, then the memberOf attribute is entity-valued; but instead we might choose to make the memberOf attribute be string-valued, merely giving the name of the organization without defining organizations as entities.  This might be appropriate, for example, if we did not wish to invoke the type checking on uniqueness of names and the correctness of entity types.  We would like to write programs that are independent of whether an attribute is string-valued or entity-valued (as in the Relational data model).x6e12j\151f6 6f0 61f6 12f0 18f6 8f0 69f6 8f0We introduce translucent attributes to avoid dependence on attribute types.  Any attribute may be treated as a translucent attribute, by using the GetFS and SetFS operations to retrieve or assign its value.x6e12j\13i22I112f6 5f0 5f6 5f0GetFS[relationship, attribute] is identical to the GetF operation, except it returns a string regardless of the attribute's type.  If the attribute is datum-valued, e.g. an integer or boolean, it is converted to a string equivalent.  If the attribute is entity-valued, the name of the entity is returned.x6e12j\f6 31f0 20f6 4f0 218i4ISetFS[relationship, attribute, value] performs the inverse mapping.  If the attribute is datum-valued, e.g. an integer or boolean, a string equivalent is accepted.  If the attribute is entity-valued, the name of the entity is passed to SetFS.  If an entity with the given name does not exist in the domain that is the attribute's type, then one is automatically created.x6e12j\f6 38f0 198f6 5f0Changing entity namesx6e12jk80\i21IAnother convenience operation is provided on entities to change an entity's name:  ChangeName[entity, new name].  This operation is semantically equivalent to destroying the given entity and creating a new one with the new name, participating in the same relationships that the old one did.  See the description of ChangeName in Section 3.4 for precise semantics in our implementation, however. z18697x6e6jk40(2116)\83f6 28f0 179f1 104f02.7 Normalizationz18697x6e30jk80\b17BiA few comments on relational normalization are included here for users who must do their own data schema design.  Others may skip to the next section.z18697x6e6jk40\29i13IA relation is normalized by breaking it into two or more relations of lower order (fewer attributes) to eliminate undesirable dependencies between the attributes.  For example, one could define a "publication" relation with three attributes:z18697x6e6jk40Publication:l3520x6e12jk160(0,7520)\iPerson	Book	Date George	Backgammon for Beginners	1978John	Backgammon for Beginners	1978Mary	How to Play Chess	1981Mary	How to Cheat at Chess	1982l5708d5696x6j\u6U1u4U1u4UThis relation represents the fact that John and George wrote a book together entitled "Backgammon for Beginners," published in 1978, and Mary wrote two books on the subject of chess, in 1981 and 1982.  Alternatively, we could encode the same information in two relations, an author relation and a publication-date relation:z18697x6e6jk40(2116)Author:l3520x6e12jk160(0,7520)\iPerson	Book George	Backgammon for Beginners	John	Backgammon for Beginners	Mary	How to Play Chess	Mary	How to Cheat at Chess	l5708d5696x6j\u6U1u4UPublication-date:l3520x6e12jk160\iBook	Date Backgammon for Beginners	1978	How to Play Chess	1981	How to Cheat at Chess	1982	l5708d5696x6j(0,10272)\u4U1u4UAlthough the second two relations may seem more verbose than the first one, they are actually representationally better in some sense, because the publication dates of books are not represented redundantly.  If one wants to change the publication date of "Backgrammon for Beginners" to 1979, for example, it need only be changed in one place in the publication-date relation but in two places in the publication relation.  If the date were changed in only one place in the publication relation, the database would become inconsistent.  This kind of behavior is called an update anomaly.  The second two relations are said to be a normalized form (as it happens, third normal form) of the first relation, and thereby avoid this particular kind of update anomaly. z18697x6e6jk40(2116)\571i15IRelational normalization is not strictly part of the Cypress data model.  However the model's operations (and the tools we will develop in the implementation) encourage what we will call functionally irreducible form, in which relations are of the smallest order that is naturally meaningful. z18697x6e6jk40\187i24IA relation is in irreducible form if it is of the smallest order possible without introducing new artificial domain(s) not otherwise desired (all relations can be reduced to binary by introducing artificial domains).  We will allow a slight weakening of irreducible form, functionally irreducible form, which permits combining two or more irreducible relations only when their semantics are mutually dependent (and therefore all present or absent in our world representation).  For example, a birthday relation between a person, month, day, and year can be combined instead of using three relations.  Another example would be an address relation between a person, street, city, and zip code.  Combining an age and phone relation would not result in functionally irreducible form, however, as their semantics are not mutually dependent. z18697x6e6jk40\17i11I1i4I460i8I128i7I70i3I5i5IThe functionally irreducible relations seen by the user are independent of the physical representation chosen by the system for efficiency, so we are concerned only with the logical data access.  Note that in addition to avoiding update anomalies, functionally irreducible form provides a one-to-one correspondence between the relationships in the database and the atomic facts they represent, a canonical form that is in some sense more natural than any other form. z18697x6e6jk402.8 Segmentsz18697x6e30jk80\bWe would like a mechanism to divide up large databases, to provide different perspectives or subsets of the data to different users or application programs.  In this section we discuss a mechanism to provide this separation:  segments.  A segment is a set of entities and relationships that a database client chooses to treat as one logical and physical part of a database.z18697x6e6jk40\226i8IIn introducing segments, we will slightly change the definition of an entity, previously defined to be uniquely determined by its domain and name.  We will treat entities with the same name and domain in different segments as different entities, although they may represent the same external entity.  The unique identifier of an internal entity is now the triplez18697x6e6jk40[segment, domain, name]. z18697l4269x6e6jk40A consequence of this redefinition of entities is that relations and domains do not span segments, either.  Application programs must maintain any desired correspondence between entities, domains, or relations with the same name in different segments.  We will return to this later.  In the next section, we will discuss a more powerful but more complex and expensive mechanism, augments, in which the database system itself maintains the correspondence.z18697x6e6jk40\283b1B95i9I65i1IWe introduce three new operations to the data model in conjunction with segments:z18697x6e6jk40DeclareSegment[segment, file]:  opens a segment with the given name, whose data is stored in the given file.z18697l4269x6e12jk40\f6 29f0GetSegments[] returns a list of all the segments which have been opened.z18697l4269x6e12jk80\f6 14f0SegmentOf[entity or relationship] returns the segment in which a given entity or relationship exists.  It may also be applied to relations or domains, since they are entities.z18697l4269x6e12jk80\f6 34f0With the addition of segments to the data model, we redefine the semantics of the basic access operations as follows:z18697x6e18jk801.	DeclareDomain and DeclareRelation take an additional argument, namely the segment in which the defined domain or relation will reside.  The entity representing a domain or relation now represents data in a particular segment.l4269d3669x6e12j(635)\3f6 13f0 5f6 15f02.	DeclareEntity and DeclareRelationship are unaffected:  they implicitly refer to the segment in which the respective domain or relation was defined.  By associating a segment (and therefore a transaction and underlying file) with each relation or domain entity returned to the database client, we conveniently obviate the need for additional arguments to every invocation of the basic operations in the data model.l4269d3669x6e12j\3f6 13f0 5f6 19f03.	DestroyEntity, DestroyRelationship, GetF, SetF, DomainOf, RelationOf, and Eq are similarly unaffected:  they deal with entities and relationships in whatever segment they are defined.  Note that by our definition, entities in different segments are never Eq.  Also note that nothing in our definition makes a SetF across a segment boundary illegal (i.e.  SetF[relationship, attribute, entity] where the relationship and entity are in different segments).  Our current implementation requires that special procedures GetFR and SetFR be used on attributes that can cross segment boundaries, see Section 3.l4269d3669x6e12j\3f6 68f0 6f6 2f0 179f6 2f0 98f6 38f0 63f1 147f04.	DomainSubset and RelationSubset are unchanged when applied to client-defined domains or relations, i.e., they enumerate only in the segment in which the relation or domain was declared.  However an optional argument may be used when applied to one of the system domains or relations (e.g. the Domain domain), allowing enumeration over a specific segment or all segments.  RelationSubset's attribute-value-list arguments implictly indicate the appropriate segment even for system relations, so a segment is not normally needed unless the entire relation is enumerated.l4269d3669x6e12j\3f6 12f0 5f6 14f0 339f1Note that the data in a segment is stored in an underlying file physically independent from other segments, perhaps on another machine.  Introducing a file system into the conceptual data model may seem like an odd transgression at this point.  From a practical point of view, however, we believe it better to view certain problems at the level of file systems.  This point of view allows segments to be used for the following purposes:z18697x6e6jk40(2116)1.	Physical independence:  Different database applications typically define their data in separate segments.  As a result one application can continue to operate although the data for another has been logically or physically damaged.  One application can entirely rebuild its database without affecting another, or an application can continue to operate in a degraded mode missing data in an unavailable segment.l4269d3669x6e12j(635)\3i21I2.	Logical independence:  Different database applications may have information which pertains to the same external entity, e.g. a person with a particular social security number.  When one application performs a DestroyEntity operation, however, we would like the entity to disappear only from that application's point of view.  Information maintained by other applications should remain unchanged. l4269d3669x6e12j\3i20I189i13I3.	Protection:  Clients can trust the protection provided by a file system more easily than a complex logical protection mechanism provided by the database system.  An even higher assurance of protection can be achieved by physical isolation of the segment at a particular computer site.  A more complex logical protection mechanism would be desirable for some purposes, but was deemed beyond the scope of Cypress.l4269d3669x6e12j\3i10I4.	Performance:  Data may be distributed to sites where they are most frequently used.  For example, personal data may reside on a client's machine while publicly accessed data reside on a file server.  If the file system provides replication, it can be used to improve performance for commonly accessed data.l4269d3669x6e12j\3i12IConcurrency control is handled by the file system. z18697x6e6jk40(2116)As noted earlier, information about an external entity may be distributed over multiple segments.  One or more database applications may cooperate in maintaining the illusion that entities, domains, and relations span segment boundaries.  This illusion may be used in at least two ways:z18697x6e6jk401.	Private additions may be added to a public segment by adding entities or relationships in a private segment.  The new relationships may reference entities in the public segment by creating representative entities with the same name in the private segment.  An example would be personal phone numbers and addresses added to a public database of phone numbers and addresses:  an application program would make the two segments appear to the user as one database.l4269d3669x6e12j(635)2.	If two applications use separate segments A and B, they may safely reference each other's data yet remain physically independent.  One of the applications may destroy and reconstruct its segment if it uses the same unique names for its entities.  If both applications have relationships referencing an entity e, and application A does a DestroyEntity operation on e, the entity and relationships referencing it disappear from application A's point of view, but application B's representative entity and relationships remain.l4269d3669x6e12j\45i1I5i1I260i1I18i1I35i1I73i1I34i1Iz18697x6e6jk40(2116)