DBStoragePage:
CEDAR
DEFINITIONS
IMPORTS DBCommon =
BEGIN
This interface contains types and procedures relating to the structure of pages (tuple pages and overflow pages). Most of the types are PRIVATE; clients are expected to use a purely procedural interface.
Types
MaxWordsPerPage: CARDINAL = 512;
RestOfWord: CARDINAL = 128;
LengthField: TYPE = CARDINAL [0..MaxWordsPerPage);
SlotIndexField: TYPE = CARDINAL [0..RestOfWord);
SlotTypeField:
TYPE =
CARDINAL [0..RestOfWord);
There seems to be no advantage in having these vary with the true pagesize. These structures will need to be re-thought for 1024 word pages (which may be too large for our expected applications).
VecPage:
TYPE =
PRIVATE
MACHINE
DEPENDENT
RECORD[
tag:
CARDINAL [0..377B],
Page type, on all pages in the system.
unused: CARDINAL [0..1] ← NULL,
highSlot: SlotIndexField,
Largest slot index that is valid on this page.
nFreeSlots: SlotIndexField,
Number of slots in [1..highSlot] that are unused.
nWordsInFreeVecs: LengthField
Number of words in free vecs on this page.
];
--VecPage
Header of a page that is formatted to hold vecs. There are three possible values for tag: DBStoragePagetags.Tuple, .SystemTuple, and .OverflowTuple.
Slot:
TYPE =
PRIVATE
MACHINE
DEPENDENT
RECORD[
type: SlotTypeField,
"Type" of object stored in vec.
vecOffset: LengthField
Page-relative pointer to a vec.
];
--Slot
A Slot is a typed pointer to a vec. The types are irrelevant inside of this interface, with the following exceptions:
FreeType: SlotTypeField = 0;
The slot is free; ignore vecOffset.
UnFreeType: SlotTypeField = 177B;
The slot is not free. This is the type given to newly-created slots by AllocVec.
MaxTuplesetPerPage: SlotTypeField = 171B;
Slots with types IN [1..MaxTuplesetPerPage] point to tuples.
IndTupleType: SlotTypeField = 172B;
The slot points to an indirect tuple **unimplemented.
LStringType: SlotTypeField = 173B;
The slot points to a local string extension.
EStringType: SlotTypeField = 174B;
The slot points to an external string extension.
NonlocalStringExtensonType: SlotTypeField = 175B;
The slot points to a nonlocal string extension **unimplemented.
TSDictType: SlotTypeField = 176B;
The slot points to a tupleset dictionary vec.
VecHeader:
TYPE =
PRIVATE
MACHINE
DEPENDENT
RECORD[
slotIndex: SlotIndexField,
Slot index of slot that points here, or FreeSlotIndex for free vecs.
length:
PUBLIC LengthField
Number of words in vec, including vec header.
];
--VecHeader
Information located at the beginning of each vec. A pointer to a vec points to the header, so the header length must be added to get the first word of data.
TupleBody:
TYPE =
MACHINE
DEPENDENT
RECORD[
header: VecHeader,
TupleBody is stored in a vec, so prefix is a VecHeader
groupOffset:
CARDINAL,
offset from TupleBody of 1st GroupEntry
fields:
ARRAY[0..0)
OF
CARDINAL
words for tuple fields (fixed length), uninterpreted at this level
];--TupleBody
This is the layout of the "body" of a tuple, consisting of the fixed-length representations of the fields, plus the variable-length representation of pointers to the groups of which this tuple is head. This is limited in length to the maximum vec length; tuple bodies do not cross page boundaries. Variable-length fields are represented by a fixed-length portion, at a fixed offset in the tuple body, and a variable-length portion, which lies outside of the tuple body (maybe on another page).
SizeOfNullTuple: CARDINAL = SIZE[TupleBody];
PageHeader:
TYPE =
MACHINE
DEPENDENT
RECORD[
pageTag: CARDINAL [0..256),
dontCare: CARDINAL [0..256)
]; -- PageHeader
First word of a page.
Page tag values.
Unused: CARDINAL = 0; -- unused page, obtained by extending Juniper file
Free: CARDINAL = 1; -- page on segment or local free list
AMap: CARDINAL = 2; -- page used to store segment address-map values
Tuple: CARDINAL = 3; -- page used to store tuples
SystemTuple: CARDINAL = 4; -- page used to store system tuples
OverflowTuple: CARDINAL = 5; -- page used to store overflow tuples & long strings
BTree: CARDINAL = 6; -- page used to store part of a B-tree
in future: HashTable, BitVector, etc...
IString:
TYPE =
MACHINE
DEPENDENT
RECORD[
bytesInString: LengthField,
length, in bytes, of string stored in text portion of InlineString.
slotOfExtension: SlotIndexField,
zero means no extension (whole string fits into text field)
nonzero is slot number of an LString.
rest:
SELECT
OVERLAID *
FROM
word => [words: ARRAY [0..0) OF WORD ← NULL],
text => [text: PACKED ARRAY [0..0) OF CHARACTER ← NULL],
ENDCASE
];--IString
This is the portion of a variable length field (VarByte or VarWord) that is stored in the tuple body. The text portion allows space to be reserved in the tuple body for the "expected" length of the field. The actual amount of space for text in an IString is not represented anywhere in the tuple body, but is kept in the field handle that is used for all accesses to this field.
SizeOfNullIString: CARDINAL = SIZE[IString];
LString:
TYPE =
MACHINE
DEPENDENT
RECORD[
header: VecHeader,
LString is stored in a vec, so prefix is a VecHeader
bytesInRemString:
CARDINAL,
length of string represented by this LString (so storage allocation can be done all at once).
eStringID:
DBCommon.TID
ID of a "tuple" on another page; this tuple holds the first chars from remainder of string. Can't be null.
];--LString
SizeOfNullLString: CARDINAL = SIZE[LString];
EString:
TYPE =
MACHINE
DEPENDENT
RECORD[
header: VecHeader,
EString is stored in a vec, so prefix is a VecHeader
eStringID:
DBCommon.TID,
ID of a "tuple" on another page; this tuple holds the first chars from remainder of string. Null => this tuple holds tail of string.
rest:
SELECT
OVERLAID *
FROM
word => [words: ARRAY [0..0) OF WORD ← NULL],
text => [text: PACKED ARRAY [0..0) OF CHARACTER ← NULL],
ENDCASE
];--EString
SizeOfNullEString: CARDINAL = SIZE[EString];
Reserved slot index on tuple pages:
TSDictSlotIndex: SlotIndexField = 1;
This slot points to the tupleset dictionary vec.
TSDict:
TYPE =
MACHINE
DEPENDENT
RECORD[
header: VecHeader,
TSDict is stored in a vec, so prefix is a VecHeader
allocLink: DBCommon.DBPage,
TSDict is stored in a vec, so prefix is a VecHeader
seq:
ARRAY [1..1)
OF TSDictEntry
sequence of TSDictEntry, length implicit in header.length indices in this array are used as slot types; NOTE 1-origin
];--TSDict
TSDictEntry:
TYPE =
MACHINE
DEPENDENT
RECORD[
tuplesetID: DBCommon.TID,
next: DBCommon.DBPage,
prev: DBCommon.DBPage
links in a doubly-linked list of pages that contain tuples from this tupleset
];--TSDictEntry
TuplesetObject:
TYPE =
MACHINE
DEPENDENT
RECORD[
wordsForTupleFields:
CARDINAL [0..377B] ← 0,
length of fixed-length data for the fields of a tuple in this set, in words.
nVarFields:
CARDINAL [0..377B] ← 0,
number of VarByte or VarWord fields in a tuple in this set.
searchList: TSDictEntry,
includes tuplesetID, next, prev. Head of doubly-linked list of all pages containing tuples from this TS. Pointers to front and back allow all types of scans.
allocList: DBCommon.DBPage ← DBCommon.NullDBPage,
a list of part-full pages that contain at least one tuple from this tupleset, and may be used for creating new tuples in this tupleset. a part-full page is located on exactly one tupleset's allocList, despite the fact that it may contain tuples from many different tuplesets.
pageAllocator: PageAllocator
a source of fresh pages for tuples in this tupleset
];--TuplesetObject
This cannot be longer than DBStorage.TupleSetObjectSize
PageAllocator:
TYPE =
MACHINE
DEPENDENT
RECORD[
segmentID: DBCommon.DBPage,
name of the segment containing this tupleset (the source for new pages)
firstPageInBlock: DBCommon.DBPage ← DBCommon.NullDBPage,
block is the initial allocation for this tupleset.
nPagesInBlock: CARDINAL ← 0,
nFreePagesInBlock:
CARDINAL ← 0,
free pages are at end of block.
nPagesPerExtent:
CARDINAL ← 0,
Number of pages to allocate from segment when allocator is empty.
freeList: DBCommon.DBPage ← DBCommon.NullDBPage,
singly-linked list of free blocks or pages (freed by deletions from this TS)
nPagesOnFreeList:
CARDINAL ← 0
purpose is to allow pages to be freed gracefully to segment, if TS shrinks.
];--PageAllocator
In addition to serving as a method of assigning short names to tuple types, the tupleset dictionary is used to link together all pages that contain tuples of a given type. This allows a tupleset to be scanned without looking at all pages in the segment. There is a corresponding cost when tuples are created and destroyed.
SizeOfInitialTSDict: CARDINAL = SIZE[TSDict] + SIZE[TSDictEntry];
SizeOfNullTSDict: CARDINAL = SIZE[TSDict];
Procedures
InitializeVecPage:
PROC[p:
LONG
POINTER
TO VecPage, pageTag:
CARDINAL];
Creates an empty page in page p of the cache, ready to store vecs using the procedures below. The new page has tag = pageTag.
TagOfPage:
PROC[p:
LONG
POINTER
TO VecPage]
RETURNS[
--pageTag--
CARDINAL] =
TRUSTED
INLINE
BEGIN
Returns the page tag of page p. We intend that a page's tag be checked before any of the procedures below are called on that page.
RETURN[p.tag];
END;--TagOfPage
AllocVec:
PROC[p:
LONG
POINTER
TO VecPage, nWords:
CARDINAL]
RETURNS[
--slotIndex--
CARDINAL,
--success--
BOOLEAN];
Makes a new vec of total length nWords (including vec header), a slot to hold it, and returns the index of the slot. If ~success, then the call failed for lack of space, and slotIndex is garbage.
WordsInLargestAllocableVec:
PROC[p:
LONG
POINTER
TO VecPage]
RETURNS[
CARDINAL];
Returns the largest value nWords such that AllocVec[p, nWords] is guaranteed to succeed.
FreeVec:
PROC[p:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL];
Frees the vec held in the slot at slotIndex, and the slot also.
ModifyVec:
PROC[p:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL, deltaWords:
INTEGER, preserveContents:
BOOLEAN]
RETURNS[
--success--
BOOLEAN];
Changes the length of the vec at slotIndex by deltaWords. If deltaWords<0, the final deltaWords words of data in the vec are lost forever. If deltaWords>0 and preserveContents, then the old contents of the vec will be found in the initial words of the new vec; the new words are not initialized. If ~success, then the call failed for lack of space. (The call cannot fail if deltaWords<0).
VecOfSlot:
PROC[p:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL]
RETURNS[
LONG
POINTER
TO VecHeader] =
TRUSTED
INLINE
BEGIN
Returns a LONG POINTER to the VecHeader of the vec stored in the slotIndex-th slot of page p. The pointer may be used for data retrieval from the vec, and to find the vec's length (below). When retrieving data, the header must be skipped explicitly.
IF slotIndex=0
OR slotIndex>HighSlotIndexOfPage[p]
THEN
ERROR DBCommon.InternalError -- BadSlotIndex --;
RETURN[
LOOPHOLE[p +
LOOPHOLE[p + (DBCommon.WordsPerPage -
SIZE[Slot]) - slotIndex*
SIZE[Slot],
LONG POINTER TO Slot].vecOffset, LONG POINTER TO VecHeader]];
END;--VecOfSlot
LengthOfVec:
PROC[v:
LONG
POINTER
TO VecHeader]
RETURNS[
CARDINAL] =
TRUSTED
INLINE
BEGIN
Returns the number of words in vec v, including the header.
RETURN[v.length];
END;--LengthOfVec
TypeOfSlot:
PROC[p:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL]
RETURNS[
CARDINAL] =
TRUSTED
INLINE
BEGIN
Returns the type code of the slotIndex-th slot of page p.
IF slotIndex=0
OR slotIndex>HighSlotIndexOfPage[p]
THEN
ERROR DBCommon.InternalError -- BadSlotIndex --;
RETURN[
LOOPHOLE[p + (DBCommon.WordsPerPage -
SIZE[Slot]) - slotIndex*
SIZE[Slot],
LONG POINTER TO Slot].type];
END;--TypeOfSlot
SetTypeOfSlot:
PROC[ p:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL, newType:
CARDINAL] =
TRUSTED
INLINE
BEGIN
Makes the type code of the slotIndex-th slot of page p be newType.
IF slotIndex=0
OR slotIndex>HighSlotIndexOfPage[p]
THEN
ERROR DBCommon.InternalError -- BadSlotIndex --;
LOOPHOLE[p + (DBCommon.WordsPerPage -
SIZE[Slot]) - slotIndex*
SIZE[Slot],
LONG POINTER TO Slot].type ← newType;
END;--SetTypeOfSlot
WordsLeftOnPage:
PROC[p:
LONG
POINTER
TO VecPage]
RETURNS[
CARDINAL] =
TRUSTED
INLINE
BEGIN
Partially obsoleted by WordsInLargestAllocableVec. Returns the number of words available on the page for vec allocation. Note that slots are also allocated from this space, so the total number of words required to construct a vec of nWords words may be nWords + SIZE[Slot].
RETURN[p.nWordsInFreeVecs - SIZE[VecHeader]];
END;--WordsLeftOnPage
HighSlotIndexOfPage:
PROC[p:
LONG
POINTER
TO VecPage]
RETURNS[
CARDINAL] =
TRUSTED
INLINE
BEGIN
Returns the highest slot index valid on this page. This is to be used for iterations that wish to look at each slot on a page. Such an iteration runs through [1..HighSlotIndexOfPage[p]], and must ignore all slots with type = FreeType.
RETURN[p.highSlot];
END;--HighSlotIndexOfPage
CheckVecPage:
PROC[p:
LONG
POINTER
TO VecPage, pageTag:
CARDINAL];
Verifies that the internal structure of the page is consistent, and that it has tag = pageTag.
IndexToSlot:
PRIVATE
PROC[p:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL]
RETURNS[
LONG
POINTER
TO Slot] =
TRUSTED
INLINE
BEGIN
RETURN[
LOOPHOLE[p + (DBCommon.WordsPerPage -
SIZE[Slot]) - slotIndex*
SIZE[Slot],
LONG POINTER TO Slot]];
END;--IndexToSlot
IndexToOffset:
PRIVATE
PROC[slotIndex:
CARDINAL]
RETURNS[
CARDINAL] =
TRUSTED
INLINE
BEGIN
RETURN[DBCommon.WordsPerPage - SIZE[Slot] - slotIndex*SIZE[Slot]];
END;--IndexToOffset
AssertVecIsLString:
PROC[pagePtr:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL] =
TRUSTED
INLINE {
SELECT TypeOfSlot[pagePtr, slotIndex]
FROM
LStringType => {};
ENDCASE => ERROR DBCommon.InternalError; -- BadVecTag
};--AssertVecIsLString
AssertVecIsEString:
PROC[pagePtr:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL] =
TRUSTED
INLINE {
SELECT TypeOfSlot[pagePtr, slotIndex]
FROM
EStringType => {};
ENDCASE => ERROR DBCommon.InternalError; -- BadVecTag
};--AssertVecIsEString
AssertVecIsTSDict:
PROC[ pagePtr:
LONG
POINTER
TO VecPage, slotIndex:
CARDINAL] =
TRUSTED
INLINE {
SELECT TypeOfSlot[pagePtr, slotIndex]
FROM
TSDictType => {};
ENDCASE => ERROR DBCommon.InternalError; -- BadVecTag
};--AssertVecIsTSDict
AssertPageIsTuplePage:
PROC[p:
LONG
POINTER] =
TRUSTED INLINE {
SELECT
LOOPHOLE[p,
LONG
POINTER
TO PageHeader].pageTag
FROM
Tuple => {};
ENDCASE => ERROR DBCommon.InternalError; -- BadPageTag
}; -- AssertPageIsTuplePage
AssertPageIsSystemTuplePage:
PROC[p:
LONG
POINTER] =
TRUSTED INLINE {
SELECT
LOOPHOLE[p,
LONG
POINTER
TO PageHeader].pageTag
FROM
SystemTuple => {};
ENDCASE => ERROR DBCommon.InternalError; -- BadPageTag
}; -- AssertPageIsSystemTuplePage
AssertPageIsAnyTuplePage:
PROC[p:
LONG
POINTER] =
TRUSTED INLINE {
SELECT
LOOPHOLE[p,
LONG
POINTER
TO PageHeader].pageTag
FROM
Tuple, SystemTuple => {};
ENDCASE => ERROR DBCommon.InternalError; -- BadPageTag
}; -- AssertPageIsAnyTuplePage
AssertPageIsOverflowTuplePage:
PROC[p:
LONG
POINTER] =
TRUSTED INLINE {
SELECT
LOOPHOLE[p,
LONG
POINTER
TO PageHeader].pageTag
FROM
OverflowTuple => {};
ENDCASE => ERROR DBCommon.InternalError; -- BadPageTag
}; -- AssertPageIsOverflowTuplePage
NEntries:
PROC[
LONG
POINTER
TO TSDict]
RETURNS[
CARDINAL];
Returns the number of TSDictEntries in the seq of a TSDict. This can be computed from the VecHeader.
GetIndex:
PROC[
--p--
LONG
POINTER
TO VecPage,
--tsID--DBCommon.TID]
RETURNS[
--index--
CARDINAL,
--found it--
BOOLEAN];
Returns index of the TSDictEntry on page p that contains tsID as its tuplesetID field. Second result is TRUE iff entry was actually found; otherwise we return 1+number of entries in the TSDict, and FALSE.
GetEntry:
PROC[
--p--
LONG
POINTER
TO VecPage,
--tsID--DBCommon.TID]
RETURNS[
LONG
POINTER
TO TSDictEntry];
Returns ptr to the TSDictEntry on page p that contains tsID as its tuplesetID field.
ERRORs InternalBug if no such entry exists.
EntryFromIndex:
PROC[
--p--
LONG
POINTER
TO VecPage,
--index--
CARDINAL]
RETURNS[
LONG
POINTER
TO TSDictEntry];
Returns ptr to the index-th TSDictEntry on page p.
ERRORs InternalBug if no such entry exists.