Cedar Nucleus (files): Implementation Notes.
File: FileNotes.tioga
Last edited by:
Andrew Birrell, August 8, 1983 11:31 am
Bob Hagmann January 14, 1985 10:51:14 am PST
Note: sections "Pages, Labels and Data Structures" , "Maintaining the Hints", and "Operations" were written by Andrew. "Multi-page Run tables" describes the extensions necessary to handle this feature.
Pages, Labels and Data Structures
A volume consists of a set of pages. Each page has a label and data. The data is what is transferred during the Read and Write operations, and is where properties are stored, and is where the system's permanent data structures are stored. The truth about the pages of a volume is recorded solely in the labels of the pages. Two forms of permanent data structure are used as hints for accessing the volume: a single volume allocation map (VAM) and a run-table for each file. The VAM is itself a file, whose data pages contain a bitmap indicating which pages of the volume are (probably) free. The run-table represents the set of pages which (probably) constitute the pages of the file. This set of pages is, in sequence: the file properties, the run-table itself, and the data pages of the file. The FP of a file consists of a volume-relative UID (2 words) plus the disk address of the file's first page. The labels of the pages of a file contain the file's UID, plus an identification of the page number relative to the file. The disk allocator attempts to allocate contiguous pages as far as possible. The file's property pages and the first page of the run-table are contiguous on disk. The run-table of a file indicates the set of runs of contiguous pages constituting the file, giving for each run its disk address and length. In this discussion, all disk addresses are relative to a logical volume.
Maintaining the Hints
The VAM for each volume is kept in virtual memory. It is written back to the volume within one second of being changed, but is written back asynchronously. Thus is there are no allocations/frees for a second then the VAM on disk is correct. Otherwise, the VAM on disk may indicate pages as free when they aren't, and may indicate pages as in use when they aren't. If a page is marked in the VAM as free but really isn't, this is detected by verifying the page's label when allocating it subsequently. Pages that are free but marked in use in the VAM are lost until the next scavenger run.
To maintain the status of a file's run-table as a hint, the implementation maintains the invariant that the run-table on disk describes a superset of the pages of the file. That is, all the pages of the file are recorded in the run-table, but some pages in the run-table may no longer be part of the file. The hint is verified by checking the label of the last page of the file. If the run-table in fact describes too many pages, the hint can be corrected because the run-table will include the correct last page of the file: this can be found by binary chop of the run-table. When extending (or creating) a file, its run-table is written to disk before writing the labels of the new pages (NB: tricky for the first page of the run-table itself!). When contracting the file, the pages are marked as free before rewriting the run-table. When contracting, we can delay rewriting the run-table, but this doesn't seem worth the effort (considering the rarity of file contractions).
The VAM and the run-tables can be reconstructed completely by reading the label of every page of the volume.
Operations
The VAM of an open volume is kept in virtual memory, as is the run-table of an open file.
The disk operations for opening a file are:
 - read the run-table.
 - verify the label of the last page of the file (delayable).
To extend a file:
 - choose the pages from the VAM, verifying each with a check-label disk operation.
 - write the new run-table.
 - write the new labels of the pages
 - write the VAM (delayable).
This amounts to (2N+2) disk page transfers for allocating N pages, hopefully 4 disk requests.
To contract a file:
 - write the VAM indicating the freed pages
 - write the page labels marking them as free
 - write the run-table (delayable, but probably synchronous).
This amounts to (N+2) disk page transfers for freeing N pages, hopefully 3 disk requests.
To delete a file:
 - write the VAM indicating the freed pages
 - write the page labels indicating them as free (run-table first).
Multi-page Run tables
A shadow page technique is used to handle update of files with headers bigger than 2 pages (multi-page run tables or multi-page properties). Simple updates are done for the normal 2 page run table, and for when the number of header pages does not change. The number of header pages never decreases, except when a file is deleted.
A file's headers are in exactly one or two runs: either it is one run containing the two header pages, plus possibly some data pages; or header page 0 is its own run and the rest of the header pages are in a single run. To write a new header that extends the size of the header, allocate a new run for the new header pages 1 on up. Modify the memory resident image of the header, and write out header pages 1 on up to the new pages. Now write out header page 0. Finally, free the old header pages 1 on up.
The header always is valid on disk. However, a crash may leave us with two sets of headers from 1 on up. The scavenger can tell what's committed by reading header page 0. There always is exactly one of these.