LoganBerryDoc.tioga
Doug Terry, July 1, 1986 9:19:38 pm PDT
LOGANBERRY
CEDAR 6.1 — FOR INTERNAL XEROX USE ONLY
LoganBerry
a simple data management facility
Doug Terry
© Copyright 1985 Xerox Corporation. All rights reserved.
Abstract: LoganBerry is a simple facility for managing databases. Data is stored in one or more log files and indexed using btrees. A database survives processor crashes, but the data management facility does not provide atomic transactions that span multiple operations. Databases may be shared by multiple clients and may reside on a workstation's local disk or be accessed remotely via RPC.
Created by: Doug Terry
Maintained by: Doug Terry <Terry.pa>
Keywords: database, logs, btrees, query
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
Introduction
LoganBerry is a simple facility for managing databases. It allows various types of data to be stored persistently and be retrieved by key or key subrange. Data is stored in one or more log files and indexed using stable btrees. A database survives processor crashes, but the data management facility does not provide atomic transactions. Databases may be shared by multiple clients and accessed remotely via RPC.
To get started, bringover [Cedar]<CedarChest®>Top>LoganBerry.df. Additional tools, such as the LoganBerry browser, can be found in [Cedar]<CedarChest®>Top>LoganBerryTools.df. See also LoganBerryToolsDoc.Tioga.
Data representation
A LoganBerry database is maintained as a collection of entries, where each entry is a set of type:value attributes. Both the type and value of an attribute are simply text strings (i.e. ROPEs). However, the programmer interface treats attribute types as ATOMs. For instance, a database entry pertaining to a person might be [$name: "Doug Terry", $phone number: "494-4427", $workstation: "Lake Champlain"].
Database entries are stored in a collection of logs. A database is updated by appending an entry to the end of a log, log entries are never overwritten (except when a log is compacted or replaced in its entirety). Updating individual attributes of an entry requires rewriting the complete entry. In many cases, all of the logs are readonly except for a single activity log. All updates are applied to the activity log (log 0) unless the client explicitly controls which logs are written.
Access methods
Only key-based access methods are supported on a database. These are provided by btrees. A separate btree index exists for each attribute type that serves as a key for the database. Two types of btree indices are supported: primary and secondary. A primary index must have unique attribute values for the associated attribute type, whereas secondary indices need not have unique values.
A sequence of database entries, sorted in btree order, can be accessed using either an enumerator or generator style of retrieval. Generators employ a cursor to indicate a logical position in the sequence being retrieved. A cursor can be used traverse a sequence in increasing or decreasing order.
Database schema
A database must be "opened" before any of its data can be accessed. The Open operation reads a database schema, which contains information about all logs and indices that comprise a database, and returns a handle that identifies the database in subsequent operations.
A schema is created by clients of this package and stored in a DF file that can be used to backup the database. Lines of the file starting with "-->" are deemed to contain schema information for the file named in the subsequent line of the DF file or at the end of the same line. The two types of schema entries are as follows:
--> log <logID> <readOnly or readWrite> <optional CR> <filename>
--> index <key> <primary or secondary> <optional CR> <filename>
The following is a sample database schema file:
============================
-- SampleDB.df

Directory [Indigo]<LoganBerry>Backups>Top>
SampleDB.df!3 12-Aug-85 13:16:12 PDT

Directory [Indigo]<LoganBerry>Backups>SampleDB>
--> log 0 readwrite
Activity.log!2 12-Aug-85 12:07:24 PDT
--> log 1 readonly
Readonly.log!1 5-Aug-85 11:50:51 PDT
--> index "Name" primary Name.index
============================
Basic operations
Following is a summary of operations that can be performed on a LoganBerry database; see the interface, LoganBerry.mesa, for the complete truth. All of these operations (except EnumerateEntries) can be invoked via RPC. They take a RPC.Conversation as the first argument so that secure remote procedure calls can be made to a LoganBerry database server; see LupineUsersGuide.tioga for information on establishing secure conversations.
Open: Initiates database activity and checks for consistency. This can be called any number of times to get a new OpenDB handle or reopen a database that has been closed. Indices are automatically rebuilt if any are missing or if a partially-completed update left them out-of-date.
Describe: Returns schema information about the database.
ReadEntry: Returns an entry that contains the given attribute value for the given key; returns NIL if none exists. If the key refers to the primary index then the unique entry is returned and `others' is FALSE. If the key is secondary and several entries exist with the same value for the key, then an arbitrary entry is returned and `others' is set to TRUE; use EnumerateEntries to get all of the matching entries.
EnumerateEntries: Calls `proc' for each entry in the specified range of key values. The enumeration is halted when either the range of entries is exhausted or `proc' returns FALSE. A NIL value for `start' represents the least attribute value, while a NIL value for `end' represents the largest attribute value. Thus, the complete database can be enumerated by specifying start=NIL and end=NIL with `key' equal to the primary key.
GenerateEntries: Similar to EnumerateEntries, but creates a cursor so that entries in the specified range of key values can be retrieved using NextEntry (defined below). Initially, the cursor points to the start of the sequence. A NIL value for `start' represents the least attribute value, while a NIL value for `end' represents the largest attribute value. Thus, the complete database can be enumerated by specifying start=NIL and end=NIL with `key' equal to the primary key.
NextEntry: Retrieves the next entry relative to the given cursor. The cursor, and thus the sequence of desired entries, must have been previously created by a call to GenerateEntries. The cursor is automatically updated so that NextEntry may be repeatedly called to enumerate entries. NIL is returned if the cursor is at the end of the sequence and dir=increasing or at the start of the sequence and dir=decreasing.
EndGenerate: Releases the cursor; no further operations may be performed using the given cursor. This must be called once for every call to GenerateEntries.
WriteEntry: Adds a new entry to the database. The entry is added to the activity log unless another log is explicitly requested. The entry must have an attribute for the primary key. The primary attribute value must be unique throughout the database unless replace=TRUE; in this case, an existing entry with the same primary attribute value is atomically replaced with the new entry (both must reside in the same log).
DeleteEntry: Deletes the entry that contains the given attribute value for the given key. If the key is for a secondary index and the value is not unique then an Error[ValueNotUnique] is raised. Deletes are actually performed by logging the delete request (so that the log can be replayed at a later time) and then updating all of the indices.
Close: Terminates database activity and closes all log and index files. Databases are always maintained consistently on disk so it doesn't hurt to leave a database open. The main reason for calling Close is to release the FS locks on the log files so they can be manually edited.
An important feature of the btree-log design is: the log is the truth! Thus, the database facility need only worry about maintaining a consistent set of logs in the presence of processor crashes. Logs are fully contained. In the event of inconsistencies arising between the logs and their indices, the associated btrees can be easily rebuilt by reading the logs.
BuildIndices: Rebuilds the indices by scanning the logs and performing WriteEntry or DeleteEntry operations.
CompactLogs: Removes deleted entries from the logs by enumerating the primary index and writing new logs.
Editing logs by hand: be careful!
Logs are human readable and may be edited or created using facilities outside of LoganBerry, such as the Tioga text editor. You should do so cautiously. In particular, logs must be of the following format: 1) attribute types and values are text strings separated by a `:'; thus, attribute types may not contain colons; 2) attributes are terminated by a CR; attribute values may contain CR's if the complete value is enclosed in double-quotes `"', i.e. the attribute value looks like a ROPE literal; 3) entries are terminated by a CR; 4) the end of the log must be the character 377C. Do not delete or change entries in a log that has entries of the form: DELETED: number, since the number is a pointer into the log. It is always safe to add entries to the end of a log (right before the 377C character) or edit a log after it has been compacted. You must remember to run LoganBerry.BuildIndices after manually editing a log!
The following is a sample log file (containing a single entry with three attributes):
============================
Name: Doug Terry
Phone number: 494-4427
Workstation: "Lake
Champlain"

ÿ
============================