Page Numbers: Yes X: 530 Y: 10.5" First Page: 73 Columns: 1 Edge Margin: .6" Between Columns: .4" Margins: Top: 1.3" Bottom: 1" Line Numbers: No Modulus: 5 Page-relative Even Heading:qk40(635)\g DESIGN AND IMPLEMENTATION OF A RELATIONSHIP-ENTITY-DATUM DATA MODELy756qck40\g1f1 10f0 1f1 19f0 1f1 11f0 2f1 5f0 2f1 4f0 2f1 4f0 1f1 Odd Heading: Not-on-first-pageqk40\g DATA MODEL IMPLEMENTATION ISSUESy756qck40\g1f1 4f0 1f1 4f0 2f1 14f0 1f1 6. Data model implementation issues z18697x6e36jk80(2116)\f5b In this section we describe important aspects of our implementation of the Cypress data model defined in the previous sections. In the first subsection, we give an overview of the Cypress design and the storage primitives with which we are dealing. Then we discuss a number of specific optimizations we have found useful in the implementation. In the last subsections, we describe the implementation of each of the major operations in more detail and summarize our results. z18697x6e12jk40 The Cypress DBMS runs on personal computers, in the Cedar programming environment developed in the Computer Science Laboratory. It is written in the Cedar language, a descendent of Mesa (Mitchell et al [1979]) with changes aimed at making Cedar particularly convenient for developing prototype systems.z18697x6e12jk40 Cypress is built in four levels: the Model level, which implements the data model; the Storage level, which provides basic storage allocation and access structures; the Segment level, which manages the database system's virtual memory; and the File level, which provides transaction-based file access.z18697x6e12jk40 The File level can be one of a variety of file systems that provide page-level transaction-based access to disk files. We currently use the Pilot system for files on the database client's machine, and the Alpine file server for remote files. In the latter case, the Model, Storage, and Segment levels run on the user's personal machine, and the File level on an Alpine file server.z18697x6e12jk40 The Segment level may communicate with the File level by local or remote procedure call. The File level provides procedures to: z18697x6e12jk40 1. open, close, and abort transactions, l4269d3669x6e12j(635) 2. open disk files under transactions, and l4269d3669x6e12j 3. read and write file pages. l4269d3669x6e12j The Segment level provides an LRU cache of database pages from the various segments, reducing network traffic when the File level resides on a different machine. The Segment level implements the segment abstraction, including a mapping from 32-bit database addresses to underlying file pages. z18697x6e12jk40(2116) The lower levels of Cypress (the Segment and File levels) will not be of central concern to us for the purposes of this report. In the next subsection, we describe the Storage level interface, and in the remainder of Section 6 we discuss the Model level implementation. z18697x6e12jk40 6.1 Storage level structures z18697x6e30jk80\b28B1b The underlying Storage level provides the physical data structures used by the Model level: recordtypes, group lists, and indices. There is little that is particularly new in the Storage level, it uses reasonably well-understood mechanisms (the design is much like the RSS level of System R, Astrahan et al [1976]). z18697x6e12jk40\93i26I4i7I Recordtypes are the basic storage mechanism for data. A recordtype is a set of records with fields that can be (1) fixed length blocks of words, (2) variable-length byte strings, or (3) references to other records. Recordtypes are manipulated using recordtype and field handles that describe what the Storage level needs to know to perform an operation on a recordtype or one of its fields, respectively. (Specifically, we need to know the byte length of records in a recordtype, and the byte position and size for its fields.)z18697x6e12jk40\i11I240i11I4i5I Both entities and relationships are represented as records. Entities are represented as records whose first field is the name. Relationships are normally represented as records whose fields are the attributes of the relation. Each domain or relation maps to a particular recordtype. z18697x6e12jk40 Every record has a unique 32-bit tuple identifier or TID, which uniquely defines it. The Cypress TID for a record contains its segment number, the page number where it is stored in that segment, and its record number on that page. TIDs can be used to find a record relatively efficiently, normally in one page access. The record number on a page is mapped to a record through an array at the beginning of the page. A record could be moved to another page without changing all references to its TID, simply by entering a forward reference in this array, to its new location on an overflow page. We do not currently use this feature, however. z18697x6e12jk40\53i3I543f1 45f0 Groups are doubly-linked lists maintained through records, also known as parent-child sets (System R). A "parent" record resides at the head of the group. The other records in the list contain three pointers: z18697x6e12jk40\i6I67i17I next: the TID of the next record in the group prev: the TID of the previous record in the group parent: the TID of the parent record z18697l4269x6e12jk40 Groups are declared by defining a group field of a recordtype. Every record referenced by one of these group fields is the parent of a group list containing all records which reference that record in that group field. z18697x6e12jk40\34i11I The Model level currently uses groups as follows. If an attribute a is an entity-valued attribute declared with link=Linked (see DeclareAttribute), it is defined as a group field at the Storage level. Thus, a group list is maintained for each entity e referenced by attribute a of one or more records in a's relation. Those records (the relationships which reference e) are linked together, with the entity e at the head of the list. For example, an entity of type "person" might be at the head of a number of group lists, one for the first attribute of the "author" relation, two for the first and second attributes of the "friend" relation, one for the second attribute of the organization-member relation, and so on. We can quickly find all the author relationships referencing a particular person by following the group list associated with the first author attribute. Group lists are updated by the SetF operation on entity-valued attributes, and are used by the RelationSubset operation to process a query when an attribute value list contains an entity-valued attribute. z18697x6e12jk40\67f6 1f0 45f6 11f0 6f6 17f0 105f6 1f0 25f6 1f0 27f6 1f0 63f6 1f0 539f6 4f0 60f6 14f0 Indices are B-Trees with variable-length string keys indexing TID values. More than one index entry may be present with the same key. Indices are maintained by the Model level for the names of entities in a domain, using the TID to reference the record representing the entity. The B-Tree indices on names are updated by the ChangeName and DeclareEntity operations, and may be used by the DomainSubset operation. Indices are also maintained as requested by the client for relations; these are updated by the SetF operation, and are used in the implementation of the RelationSubset operation. We describe DomainSubset and RelationSubset in Section 6.8. z18697x6e12jk40\i7I321f6 10f0 5f6 13f0 36f6 12f0 108f6 4f0 54f6 14f0 25f6 12f0 5f6 14f0 6.2 Entities and relationships z18697x6e30jk80\b30B1b It is worth noting that Cypress application programs reference database items via handles rather than the cursors provided by most database systems. A handle is returned by Cypress whenever an application program retrieves a relationship or entity from a database. Application programs may have any number of entity-valued or relationship-valued variables, each containing a handle that uniquely identifies the database object it currently references. Each call to NextRelship, GetF, etc., produces another such handle. In other database systems (e.g. INGRES or System R), references to tuples are made through cursors, with a higher overhead than handles. Client programs may copy whole blocks of data from a client buffer into the database tuple that is at the current cursor position, or vice versa.z18697x6e12jk40\82i7I17i7I355f6 11f0 2f6 4f0 We do not claim that handles are any better than cursors; we emphasize the difference because it effects the design of Cypress. Handles have the advantage of tighter coupling between the programming language and DBMS: database items simply appear as values in the programming language. Handles have the disadvantage of requiring garbage collection to determine which database references are no longer in use. The client indicates this explicitly in a cursor-based DBMS. z18697x6e12jk40 As already discussed, the Model level represents both entities and relationships as records. The records may be of three internal types: virtual, surrogate, or stored. Virtual records are system schema entities, e.g. DomainDomain or StringType. Stored records are schema or data records; these are the Storage level records we described in Section 6.1. They may represent relations, attributes, domains, relationships, or entities. Surrogate records are used in conjunction with the surrogate relation optimization, which we describe in Section 6.6. z18697x6e12jk40\220f6 13f0 2f6 11f0 243i18I Note that a TID can uniquely define an entity, just as a [segment name, domain name, entity name] triple does. However, TIDs are used only inside Cypress and are never provided to clients. The Entities (and Relships) returned to the client are pointers to records that contain the TID and other information; the client may not access these data, and the database system keeps a pointer to this record so that the record may later be modified.z18697x6e12jk40\140i6I49f6 8f0 6f6 8f0 54i7I Keeping TIDs a "secret" in this way has a number of advantages. Note that we can find all references to records in running programs, as we keep a list of the record handles in use by clients of the system. We can find all references to records within databases, as the group lists and indices can quickly be traversed to find references to a record, and we do not store TIDs outside of groups and indices. As a result, we can: z18697x6e12jk40 1. invalidate references to a record when it is deleted, l4269d3669x6e12j(635) 2. move a record and update all references to it (we do not yet do this), and l4269d3669x6e12j 3. re-use TIDs formerly belonging to deleted records. l4269d3669x6e12j Among other things, these features allow us to get by with TIDs as short as 32 bits, as we need only as many TIDs as there are records in the largest possible database. (Actually slightly more TIDs, since there is internal fragmentation due to the fixed maximum number of records per page). z18697x6e12jk40(2116) We never use TIDs for references across segment boundaries, although they could in theory be used for such as TIDs contain a segment number. The lack of cross-segment references makes segments independent, so that one segment can be destroyed or rebuilt without affecting others. It also avoids the needs for a common network database that maps segments to database address space. Destruction or inaccessibility of such a common database would make all segments inaccessible. z18697x6e12jk40\452i3I 6.3 Values z18697x6e30jk80\b10B1b The choice was made to represent values as REF ANYs in Cedar. A REF ANY is a pointer to an object whose type is determined at run-time, via a type code stored with the object. REF ANYs have some inconveniences and inefficiencies, particularly the overhead of using a pointer when it is not required (e.g., with INTs) and the overhead of the run-time NARROW operation which coerces the REF ANY to a particular type. However REF ANY seems the best solution since convenience and simplicity of the interface is important. x6e12j(635)\43f6 7f0 15f6 7f0 106f6 7f0 128f6 3f0 36f6 6f0 29f6 7f0 32f6 7f0 Because only a few of the many datatypes made available to alpha clients of the original Cedar DBMS were widely used, we implement only REF INT, REF BOOL, REF GreenwichMeanTime, ROPE, and Entity values. Some other types, e.g. pointers to Mesa records, are still used by internal procedures but are not currently exercised by client programs. x6e12j\136f6 46f0 Note that in our system particular Relship or Entity variables all have the same Cedar type; i.e., there is not a different programming language type for each domain or relation. Run-time operations are provided to fetch the domain or relation to which an entity or relationship belongs. In the associative programming language LEAP (Feldman and Rovner [1969]), this run-time typing is taken to the extreme of treating a relationship's relation not as its type, but as simply another field of the record: relationships are not typed. If this approach were taken in our implementation, for example, RelationSubset would not need the first (relation) argument: the user could specify the desired relation as part of the AttributeValueList or omit it. The typeless relationships would provide little or no additional utility in our system, however, and in fact may complicate the world for a typical application. The cost of complete typelessness does not seem to justify any utility it may have. x6e12j\35f6 7f0 4f6 6f0 238f1 710f0 The run-time typing of relationships, entities, and values means that a fair amount of the code in the dozen Model level operations is concerned with error checking. If at the Query level we introduce a compiler for database operations, we can save execution time by performing many of these type checks at compile-time. Fortunately the caching of schema information avoids database accesses for most of this checking, so the current overhead is not overwhelming. x6e12j The Storage level implements strings of arbitrary and variable length. The length hint supplied to DeclareAttribute is used to decide how much space to allocate in the record itself, but the characters are stored elsewhere if the field overflows. It is currently not efficient to store strings with thousands of characters in Cypress if the strings are frequently updated, because the storage allocation function is not designed to deal efficiently with data chunks larger than a page or so. We are implementing a supplementary package to Cypress which provides efficient storage of these large text strings, such as the body of a document or program. The actual text is stored in conventional files; the file, position, and size of a text string is stored in the database where it is referenced. The supplementary package also implements a log of all database updates so that a database client may replay or back up database actions, independent of the database transaction mechanism. x6e12j\100f6 16f0 731i4I 6.4 Caching the data schema z18697x6e30jk80(2116)\b27B1b In order to improve the performance of the database system, we cache information about the client's data schema in a quickly-accessible form. On the first reference to an attribute, for example, the system saves the type of value the attribute demands, its relation, and its uniqueness. On subsequent uses the type-checking performed by GetF and SetF can be performed without access to the data schema stored in the database. The attribute cache information is flushed whenever an AbortTransaction or DestroyRelation call is invoked. z18697x6e12jk40\339f6 4f0 5f6 4f0 132f6 17f0 3f6 15f0 Schema caches are also maintained for commonly-accessed information about domains and relations (subtypes of a domain, indices on a relation).z18697x6e12jk40 The schema cache is distinct from the Segment level's LRU cache of data pages. Although the schema is cached just as any other data at the Segment level, we wish to avoid all database operation overhead, for examining schema entities. The schema-caching optimization makes a considerable improvement in Cypress performance: it gives a factor of ten speedup in GetF in nearly all database applications. z18697x6e12jk40\363f6 4f0 Since the Storage level does not know how to deal with system records such as the DomainDomain or IntType, the Model Level must make exceptions for these entities in the implementation of a number of operations. There are currently a few limitations in the way this is done; for example the DomainDomain is not actually in the enumeration of DomainSubset[DomainDomain], and user-defined Relships may not reference system entities. GetDomainRefAttributes and DomainSubset on the DomainDomain and RelationDomain had to be coded specially, and so on. Since Datatypes (which are system entities, not stored in the database) cannot be stored by the underlying storage level, integers are stored corresponding to IntType, StringType, and so on, and these are mapped to the system entities when passed to or from the client program. z18697x6e12jk40\82f6 12f0 4f6 7f0 187f6 12f0 39f6 28f0 17f6 8f0 37f6 22f0 5f6 12f0 8f6 12f0 5f6 14f0 46f6 9f0 144f6 7f0 2f6 10f0 The data schema is stored in four underlying storage-level recordtypes. These recordtypes are the domain recordtype, relation recordtype, attribute recordtype, and index recordtype. The domain recordtype has as attributes the domain name and the recordtype handle used by the underlying storage level. The relation recordtype has the same two attributes, plus a special attribute which indicates whether the surrogate relation optimization has been performed on the relation. The attribute recordtype has several attributes, indicating each attribute's type, uniqueness, and relation, and, if applicable, their length and the index they participate in. Indices are represented in an index recordtype. z18697x6e12jk40 There are some ordering constraints on schema definition, as noted in Section 3. All of these constraints could be removed by enough special-case code, including deletion of attributes. We chose to pass the buck to the data schema editing tool described in Section 7.2 for now. z18697x6e12jk40 6.5 Links and co-location z18697x6e30jk80\b25B1b As discussed in Section 6.1, relationships with entity-valued attributes declared with link=Linked are inserted in a doubly-linked list (a group) with the entity they reference at the head of the list. Thus groups provide an efficient way to find all references to an entity via a particular attribute. There are three other possible values for the link hint: Unlinked, Colocated, and Remote. z18697x6e12jk40\87f6 11f0 253f6 4f0 8f6 19f0 6f6 6f0 Colocated is the same as Linked, but an attempt is made to store the relationship which references an entity on the same file page as the entity that it references. It provides even faster access to an entity from a relationship or vice versa. Only one attribute of a relation can be declared colocated. Relationships which reference an entity in this attribute are colocated with the entity and with each other. z18697x6e12jk40\f6 9f0 16f6 6f0 Colocation must be performed by DeclareRelship when it creates a new relationship for which a value has been specified for the colocated attribute. (SetP calls DeclareRelship, so it achieves the same result; but SetF causes no colocation, as the record has already been allocated.) When DeclareRelship is called to create a relationship with a colocated attribute a with value e: z18697x6e12jk40\32f6 14f0 103f6 5f0 7f6 14f0 38f6 4f0 64f6 1f0 7f6 15f0 62f6 2f0 10f6 2f0 1. DeclareRelship attempts to store the new relationship on the same page where e is stored;l4269d3669x6e12j(635)\3f6 14f0 63f6 1f0 2. if that page is full, DeclareRelship attempts to store the new relationship on the same page as the most recently created relationship referencing e in a. l4269d3669x6e12j\25f6 14f0 111f6 1f0 4f6 2f0 3. if that page is also full, or there are no other relationships referencing e in a, then the relationship is stored anywhere. l4269d3669x6e12j\78f6 1f0 4f6 2f0 One can imagine a number of variations on this algorithm; we have not experimented with them at the time of this writing. One might profitably reserve pages for relationships referencing an entity on the basis of an estimate of the space that will be required. z18697x6e12jk40(2116) NOTE: colocation is not yet implemented at the time of this writing. Relationships are simply placed on pages in the order they are created. We currently achieve some approximation to colocation by dumping a segment in textual form, with all relationships that reference an entity grouped together, and then reconstructing the segment from the dump file. z18697x6e12jk40\f1 4f0 3f1 351f0 The string name of referenced entities are stored in Unlinked attributes, rather than using a group list. Thus, groups cannot be used to find an entity from an unlinked reference to it, nor vice versa. When an attribute is declared Unlinked, the database client typically declares an index on the attribute with DeclareIndex, thereby making access to a relationship reasonably fast (logarithmic in the size of the relation instead of a single page access, but rarely more than a few page accesses). In other cases, an index is better than groups. Indices could use less space if entity names are short: groups require three 32-bit TIDs per record. If TIDs are simply being compared for equality, e.g. for a relational join, the TIDs may be compared directly in B-tree pages rather than fetching the pages containing the records themselves. z18697x6e12jk40\53f6 8f0 173f6 9f0 71f6 12f0 Note that when GetF is performed on an unlinked attribute, a B-Tree lookup must also be performed, to map the entity name onto the entity record that is returned. If GetFS is used, this lookup is bypassed. z18697x6e12jk40\15f6 4f0 148f6 5f0 Remote indicates that an attribute may reference entities in other segments (in a different segment than the relation). Remote attributes are currently treated much like Unlinked: the name is stored instead of a physical link (remember we do not want links, which are based on TIDs, to cross segment boundaries). Unlike Unlinked attributes, the segment and domain name must be stored in the reference as well as the entity name. The Remote attribute link hint is not technically necessary, as the Model level could determine how to store an entity-valued attribute "on the fly." It greatly simplifies the implementation, however. z18697x6e12jk40\f6 6f0 115f6 6f0 44f6 8f0 144f6 8f0 106f6 6f0 If an index is not declared on an unlinked or remote attribute, the RelationSubset operation will take time linear in the size of the relation to process a query whose attribute value list asks for a particular entity value for the attribute. An unlinked and unindexed entity-valued attribute would be useful only if the relation is known to be of a small fixed size, or queries are never performed on the attribute. z18697x6e12jk40\15i3I50f6 14f0 A time comparison of three simple cases of entity-valued attributes follows: an attribute declared linked but not indexed, an attribute declared indexed but not linked, and an attribute that is neither linked nor indexed. We show the times for the three common operations of fetching the attribute, setting the attribute, and doing a DeclareRelship operation with the OldOnly option to find the relationship with a particular value for the attribute (this is equivalent to a RelationSubset and NextRelship). These and all other measurements were performed on the Xerox Dorado computer; see Section 8.2 for details. z18697x6e12jk40\336f6 14f0 20f6 7f0 100f6 14f0 5f6 11f0 2f1 108f0 Operation linked indexed unlinked, unindexedz18697x6e12jk40(0,3744)(1,7040)(2,9184)(3,11392)\1b GetF 1 ms 4 ms 1 ms + N*.5 ms, N = domain size SetF 7 ms 40 ms 4 ms FetchRelship 4 ms 32 ms about N*.5 ms, N=domain size z18697x6e12jk40\b1f6B4f0 44f6 4f0 18f6 12f0 These measurements were performed on relationships whose pages were already in the Segment level's cache, thus the "linked" column is near the ideal time one would obtain if records were colocated. The difference between linked and colocated times depends on a particular application's data schema, of course. A page read and write both require approximately 60 ms. Thus GetF and SetF could take as much as 60 ms in the worst case. For typical applications, the payoff from colocation is much smaller than that, but still seems to produce more than a factor of two speedup in both GetF and SetF.z18697x6e12jk40(2116)\374f6 4f0 5f6 4f0 198f6 4f0 5f6 4f0 If the surrogate relation optimization is applicable, the basic operations are still faster. GetF on a linked entity-valued attribute, for example, averages about 0.3 ms. This compares to 1 ms in the above table. z18697x6e12jk40\94f6 4f0 Note that the introduction of entities to the relational model allows the DBMS to make some intelligent guesses as to where to use colocation and links. By default, an attempt is made to link and colocate relationships with the entities they reference. This may be overridden by the client, of course.z18697x6e12jk40 Our current database applications most frequently retrieve together all or most of the relationships, from a variety of relations, which reference a particular entity. Most implementations of the Relational model store the records of a relation contiguously. The relational arrangement results in many more page reads than ours for the entity-centric access we find most common. z18697x6e12jk40 6.6 Surrogate relations z18697x6e30jk80\b23B1b The surrogate relation optimization is used when DeclareAttribute is called to define the first attribute of a relation, and: (1) the attribute is entity-valued, (2) it is a primary key of its relation, (3) its type is a domain with no sub-types, and (4) that domain does not yet have any entities defined in it. The relation is then specially tagged as being a surrogate relation, and the 2nd through Nth attributes will be added to the domain's recordtype, i.e. the data will be stored in the entities rather than separate records. The relation itself will not exist as a recordtype, and any relationships in the relation will be represented as surrogate records. Thus the records that represent entities may have fields in addition to the entity name, containing surrogate relation attributes. z18697x6e12jk40\49f6 16f0 375i8I The data schema must record the relations for which the surrogate relation optimization has been performed, for internal use. This information can also be useful to select clients such as the data schema update tool. z18697x6e12jk40 When compared to colocation (i.e. first attribute declared Colocated), the surrogate relation optimization is an improvement in two ways. First, a surrogate relationship uses less space (no additional record and link overhead); this may allow information to be stored on the same page. The overhead of an additional page access (60 ms) can make a considerable difference in an operation that takes less than 1 ms (GetF). Also, accessing the field directly rather than following the link to another record is somewhat (currently about .4 ms) faster even when the other record is on the same page. z18697x6e12jk40\59f6 9f0 348f6 4f0 The effect of the surrogate relation optimization on Cypress performance is of course dependent upon its applicability to a particular application's data schema. In general, we find that the surrogate optimization can be performed on about half the relations in the schema. For the electronic mail database application, construction of database entries runs about 30% faster with the surrogate relation optimization enabled (instead of just colocation). Other operations, e.g. displaying a message, have equal or better performance improvement. z18697x6e12jk40 We could allow database clients to specifically indicate when the surrogate relation optimization should be performed, as we do with links. The optimization appears desirable whenever possible, however, so this unnecessary. z18697x6e12jk40 Note the restrictions mentioned earlier on the application of the surrogate relation. The restriction that a domain have no subtypes is made to simplify the problem of assigning field positions in the recordtypes. The restriction that the domain be empty is made to avoid reconstructing or marking existing entities which would not have the field. The latter restriction is similar to one on relations, that attributes may only be added to a relation while it is empty. The schema editing tool described in Section 7 makes such restrictions invisible by copying the domain or relation when necessary. z18697x6e12jk40\472f1 132f0 6.7 Basic operations z18697x6e30jk80\b20B1b In this subsection, we discuss more details of the basic operations on entities and relationships. z18697x6e12jk40 DeclareEntity and DeclareRelship simply create a record in the underlying domain or relation recordtype, updating any associated index entries. The DeclareEntity operation uses the DomainSubset operation to determine whether an entity with the given name already exists. DeclareRelship similarly uses RelationSubset to determine whether the given relationship already exists, unless version is NewOnly. z18697x6e12jk40\f6 13f0 5f6 14f0 117f6 13f0 20f6 12f0 79f6 14f0 16f6 15f0 78f6 7f0 A small complication is introduced by the interaction of the surrogate relation optimization and the use of indices. The attributes of relationships of a surrogate relation are stored as "attributes" of entities in the target domain. Index entries reference the entities. This would suggest that we must create index entries when the entities are created. The semantics of null values can be used to improve efficiency, here: we need make no entries until values are actually assigned. This should not be done until surrogate relationships are created and their attributes assigned values. z18697x6e12jk40\337i8I GetF and SetF must make special checks for the surrogate relation optimization; when this optimization has been performed, the storage level "field handles" for attributes of the relation actually reference fields of the domain, in which the relation's attributes are stored. With SetF, a particular constraint has currently been added: the key attribute of a record must be assigned before others, so that the Model level knows which entity in the domain is being referenced. Note this would not be a problem if we disallowed SetF, requiring that whole relationships be created or destroyed as units. z18697x6e12jk40\275f1 61f0 3f1 265f0 Type checking is necessary only for one operation, SetF. If the domain sub-type hierarchy is not used, i.e., an entity stored in an entity-valued field is precisely of the domain required by the attribute, the check is fast. Furthermore, if type-checking is thwarted altogether, i.e., the attribute is of type AnyDomainType, the check is even faster: the system need only check that the base type (i.e., entity-valued, string-valued, etc.) is correct. In the case that a check on the attribute's domain type fails, SetF recursively searches up the hierarchy from the actual entity type looking for the attribute's type. It fails upon finding no more super-types. (Searching upward is faster than searching downward for randomly-selected examples in the domain type lattice in the largest application schema we have so far.) More efficient schemes could be implemented if necessary, e.g., pre-computing the super-domains and redundantly storing the list with each domain. z18697x6e12jk40\51f6 4f0 257f6 13f0 194f6 4f0 22i2I282f1 148f0 SetF must update group lists when they are being maintained for entity-valued attributes, as described earlier. Conversely, if links are not maintained for an entity-valued attribute, GetF must perform an index lookup to map the stored entity name into the entity handle returned to the client (GetFS need not do this lookup, however, as it is only required to return the name). SetF uses RelationSubset to determine whether key constraints are violated by the new attribute value. z18697x6e12jk40\f6 4f0 134i3I44f6 4f0 107f6 5f0 80f6 4f0 6f6 14f0 The algorithm for DestroyRelship is relatively simple. The Storage level operation to destroy the record automatically deletes the record itself and removes it from the linked list maintained through all records of its recordtype. Variable-length strings which have overflowed the space allocated in the record (the length argument to DeclareAttribute) are stored in separate pseudo-records that must also be de-allocated. Also, any index entries for the destroyed Relship must be deleted, and the relationship must be removed from any group lists in which it participates. z18697x6e12jk40\18f6 14f0 286f6 6f0 13f6 16f0 115f6 7f0 DestroyEntity is similar to DestroyRelship, but any relationships that reference the entity must also be destroyed. There are three special cases of relationships referencing an entity: z18697x6e12jk40\f6 13f0 15f6 15f0 1. Surrogate relationships stored in the entity record itself must be destroyed; the comments about deleting variable-length strings and updating groups and indices apply as in DestroyRelship. l4269d3669x6e12j(635)\177f6 14f0 2. Relationships linked to the entity via a Linked attribute may be found and deleted quickly.l4269d3669x6e12j\44f6 6f0 3. Relationships referencing the entity via an Unlinked attribute must be found by a RelationSubset operation on attributes that can reference an entity of the kind we are deleting. A group list of these unlinked attributes is maintained in the database for each domain, to make this operation more efficient. l4269d3669x6e12j\47f6 8f0 30f6 14f0 The Eq operation returns TRUE iff its arguments have identical TIDs. If its arguments are from different segments they will have different TIDs, of course, as the segment numbers are stored as the high order bits of the TIDs. The correspondence between segment numbers and atoms is maintained to implement the SegmentOf operation. z18697x6e12jk40(2116)\4f6 2f0 19f6 4f0 283f6 9f0 The procedure GetAllRefAttributes is implemented using a special Storage level operation that returns all the groups in which an [entity] record participates. As unlinked attributes will not be found by this operation, they must be added to the list returned by GetAllRefAttributes, as well.z18697x6e12jk40\14f6 19f0 230f6 19f0 10f1 6.8 Aggregate operations z18697x6e30jk80\b In this section, we discuss details of the DomainSubset and RelationSubset algorithms that have not already been covered. z18697x6e12jk40\43f6 12f0 5f6 14f0 47b1B The algorithm for DomainSubset is relatively simple. If a name or range of names is specified in the call, the index for the domain is used; otherwise the linked list the Storage level provides through all the records of the domain recordtype is used. If it is requested that subdomains be searched, a DomainSubset operation is invoked on each in sequence. Whatever the method, the information about the enumeration is packaged up in the EntitySet returned so that NextEntity can sequence through the indices, lists, or domains as required. z18697x6e12jk40\18f6 12f0 274f6 12f0 125f6 9f0 18f6 10f0 65b1B The algorithm for RelationSubset is somewhat more complex than DomainSubset. We are given an AttributeValueList, composed of triples of the form [attribute, low, high] constraining the given attribute to have a value greater or equal to low and less than or equal to high. The high value is typically omitted and is then assumed to equal low. The high value must be omitted for entity-valued attributes, since a range of values makes no sense in that case. RelationSubset must choose an efficient manner in which to find the relationships satisfying the list. z18697x6e12jk40\18f6 14f0 31f6 12f0 19f6 18f0 34f6 22f0 70f6 3f0 27f6 4f0 7f6 4f0 57f6 6f0 4f6 4f0 7i4I96f6 14f0 88b1B The following strategies are used by RelationSubset, in this order: z18697x6e12jk40\37f6 14f0 16b1B 1. If the surrogate relation optimization has been performed on the relation, then (a) if a key-valued attribute is contrained to equal a particular entity in the attribute-value list passed to RelationSubset (the constraint list), we may return a single-element RelshipSet; (b) otherwise, we set a boolean in the RelshipSet (the one we will return) to indicate that whichever of the strategies 2, 3, and 4 are used below, NextRelship must convert the entities we obtain to surrogate relationships. l4269d3669x6e12j(635)\194f6 14f0 6f6 10f0 39f6 10f0 41f6 10f0 99f6 11f0 64b1B 2. If one of the attributes in the constraint list is entity-valued, then we use the group list for the given entity value. That constraint is removed from the list, and the remaining reduced list, is saved to be serially checked by NextRelship against records in the group list. l4269d3669x6e12j\35f6 10f0 140i12I37f6 11f0 35b1B 3. If there is an index on one or more of the attributes in the constraint list, the index containing the largest subset of the attributes is used to find the relationships satisfying that subset of the list. The remaining reduced list is checked by NextRelship, to filter the relationships enumerated from the index. l4269d3669x6e12j\64f6 10f0 177f6 12f0 54f6 1f0b1B 4. If all of the above strategies fail, the underlying storage-level operation to enumerate the list through all records of the relation's recordtype is invoked, and NextRelship serially filters out the relationships satisfying the constraint list. l4269d3669x6e12j\166f6 12f0 54f6 10f0 5f6 1f0b1B The NextRelship operation enumerates the appropriate group, index, or recordtype as specified by the RelshipSet that RelationSubset produced. As all three of these enumerations are bi-directional at the Storage level, PrevRelship is a straightforward inverse of NextRelship. Note the slight complication in NextRelship and PrevRelship introduced by the surrogate relation optimization. In that case, the indices, group lists, or recordtypes can be scanned as usual, but refer to entities rather than the surrogate relationships which have been mapped to them. Surrogate relationships are created on-the-fly. z18697x6e12jk40(2116)\4f6 11f0 86f6 10f0 6f6 14f0 88f6 11f0 33f6 12f0 34f6 11f0 5f6 11f0 275b1B The time required by the RelationSubset operation is of course dependent upon the path through the above algorithm taken, whether an index or group is scanned, and so forth. A typical application averages about 7 ms for a variety of RelationSubset calls. Thanks to the schema cache, only about half of this time is spent examining the data schema to check the validity of the query and determine which physical access path to use to process it. z18697x6e12jk40\25f6 14f0 195f6 14f0 198b1B The time for NextRelship is typically 0.4 ms, and is nearly the same for a group, index, or recordtype enumeration. Currently most of this time is for the Cedar allocation of the Relship handle itself! However, the NextRelship time will be much larger for a database application without a working set in the database page cache: the time to fetch pages will dominate. z18697x6e12jk40\13f6 12f0 155f6 7f0 31f6 11f0 142b1B We will not discuss the MultiRelationSubset operation and Query level here, as they have not yet been constructed. One possible plan would be to (1) build a parser at the Query level to map text queries into a tree representation, (2) make a major extension of the RelationSubset operation to optimize access path selection over the entire query, (3) make a minor extension to RelshipSets (and NextRelship) to allow them to act upon nested relations (still producing a single relation at the top level), and (4) build a Query-level formatter which prints the RelshipSet enumeration for the client. z18697x6e12jk40\24f6 19f1 1f0 222f6 14f0 98f6 11f0 6f6 11f0 154f6 10f0 28b1B1f1 6.9 Summary z18697x6e30jk80\b The Cypress DBMS is built in four levels: the Model, Storage, Segment, and File levels; in this section we have concentrated on the implementation of the Model level. We summarized the optimizations made in that implementation: z18697x6e12jk40\228b1f1B 1. caching the data schema, l4269d3669x6e12j(635)\27b1B 2. use of B-tree indices upon entity names and relation attributes, l4269d3669x6e12j\67b1B 3. use of "group" links through relationships referencing an entity l4269d3669x6e12j\67b1B 4. co-locating relationships which reference the same entity on the same page, and l4269d3669x6e12j\82b1B 5. storing relationships which reference an entity as surrogates in the entity itself. l4269d3669x6e12j\86b1B Caching the data schema is crucial to good performance; almost all operations must examine the data schema to check the legality of the client's request or to decide how to perform the operation. The payoff can be as much as 90% .z18697x6e12jk40(2116)\229b1B The payoff from the use of indices is variable, but they are crucial upon entity names and are often beneficial upon relations. There is a trade-off between using indices and links on relations; indices can be used on multiple attributes and use less space if the keys are short, but links are typically a factor of three faster. z18697x6e12jk40 It is enlightening to observe the progressively better performance for the GetF operation on an entity-valued attribute that is (1) indexed, (2) linked, (3) colocated, or (4) surrogate. As just noted, links are typically a factor of three faster than an index, as at most one page (other than the page on which the relationship resides) is accessed. Colocation typically produces another factor of three or so, though this varies widely with database application. The surrogate relation optimization provides another factor of two or so, by avoiding another tuple lookup (on the same page). The best time for a GetF is currently about .2 ms. z18697x6e12jk40\75f6 4f0 536f6 4f0 In Section 5, we showed how the Cypress data model was designed to provide a higher-level conceptual model, simplifying the development of database applications. In this section we have shown that a higher-level data model can also be used to improve the performance of a database system, by capitalizing on the additional knowledge of data semantics and interconnectivity. In Section 7 we will concentrate on the third advantage of a higher-level model: the more powerful tools and integration of database applications it enables. z18697x6e12jk40