Disks: The Alto File System This document describes the disk formats used in the Alto File System. It also describes a "disk object," a Bcpl software construct that is used to interface low-level disk drivers with packages that implement higher-level objects, such as streams. The primary focus of the description will be for the "standard" Alto disks: either (1) up to 2 Diablo Model 31 disk drives or (2) one Diablo Model 44 disk drive. The low-level drivers for these disks are called "Bfs" (Basic File System). With minor modifications, the description below applies to the Trident Model T80 and T300 disk drives, when formatted for Alto file system conventions. The differences are flagged with the string [Trident]. Low-level drivers for the Trident disks are called "Tfs." 1. Distribution Relocatable binary files for the BFS are kept in BFSBrs.dm. The sources, command files, and test program (described later in this document) are kept in BFSSources.dm Relocatable binary files for the TFS are kept in TFS.dm; sources are kept on TFSSources.dm. 2. File and Disk Structure This section describes the conventions of the Alto file system. The files AltoFileSys.D and Bfs.D contain Bcpl structure declarations that correspond to this description ([Trident]: See also "Tfs.D"). The unit of transfer between disk and memory, and hence that of the file system, is the disk sector. Each sector has three fields: a 2- word header, an 8-word label, and a 256-word data page. ([Trident]: The fields are a 2-word header, a 10-word label, and a 1024-word data page.) A sector is identified by a disk address; there are two kinds of disk addresses, real and virtual. The hardware deals in real addresses, which have a somewhat arbitrary format. An unfortunate consequence is that the real addresses for all the pages on a disk unit are sparse in the set of 16 bit integers. To correct this defect, virtual addresses have been introduced. They have the property that the pages of a disk unit which holds n pages have virtual addresses 0 ... (n-1). Furthermore, the ordering of pages by virtual address is such that successive pages in the virtual space are usually sequential on the disk. As a result, assigning a sequence of pages to consecutive virtual addresses will ensure that they can be read in as fast as possible. ------------ Copyright Xerox Corporation 1982 Disks & Bfs April 10, 1982 2 2.1. Legal Alto Files An Alto file is a data structure that contains two sorts of information: some is mandatory, and is required for all legal files; the remainder is "hints". Programs that operate on files should endeavor to keep the hints accurate, but should never depend on the accuracy of a hint. A legal Alto file consists of a sequence of pages held together by a doubly-linked list recorded in the label fields. Each label contains the mandatory information: The forward and backward links, recorded as real disk addresses. A page number which gives the position of the page in the file; pages are numbered from 0. A count of the number of characters of data in the page (numchars). This may range from 0 (for a completely empty page) to 512 (for a completely full page). ([Trident]: A full page contains 2048 characters.) A real file id, which is a three-word unique identifier for the file. The user normally deals with virtual file ids (see the discussion of file pointers, below), which are automatically converted into real file ids when a label is needed. Three bits in the file id deserve special mention: Directory: This bit is on if the file is itself a directory file. This information is used by the disk Scavenger when trying to re- build a damaged disk data structure. Random: This bit is currently unused. NoLog: This bit is no longer used, but many existing files are likely to have it set. Leader Page: Page 0 of a file is called the leader page; it contains no file data, but only a collection of file properties, all of which are hints. The structure LD in AltoFileSys.D declares the format of a leader page, which contains the following standard items: The file name, a hint so that the Scavenger can enter this file in a directory if it is not already in one. The times for creation, last read and last write, interpreted as follows: A file's creation date is a stamp generated when the information in the file is created. When a file is copied (without modification), the creation date should be copied with it. When a file is modified in any way (either in-place or as a result of being overwritten by newly-created information), a new creation date should be generated. A file's write date is updated whenever that file is physically written on a given file system. Disks & Bfs April 10, 1982 3 A file's read date is updated whenever that file is physically read from within a given file system. A pointer to the directory in which the file is thought to be entered (zeroes imply the system directory SysDir). A "hint" describing the last page of the file. A "consecutive" bit which is a hint that the pages of the file lie at consecutive virtual disk addresses. The changeSerial field related to version numbering: whenever a new version of a file "foo" is made, the changeSerial field of all other files "foo" (i.e., older versions) is incremented. Thus, a program that wishes to be sure that it is using the most recent version of a file can verify that changeSerial=0. If a program keeps an FP as a hint for a file, and is concerned about the relative position of that file in the list of version numbers, it can also keep and verify the changeSerial entry of the file. Version numbers have been deimplemented. These standard items use up about 40 words of the leader page. The remaining space is available for storing other information in blocks which start with a one word header containing type and length fields. A zero terminates the list. The structure FPROP in AltoFileSys.d defines the header format. The only standard use of this facility is to record the logical shape of the disk in the leader page of SysDir. Data: The first data byte of a file is the first byte of page 1. In a legal file with n pages, the label field of page i must contain: A next link with the real disk address of page (i+1), or 0 if i=n-1. A previous link with the real disk address of page (i-1), or 0 if i=0. A page number between 0 and (n-1), inclusive. A numchars word = 512 if i>FP.serialNumber.word1 portion of the file pointer. This allows the directory and random bits to be set in the file id. If useOldFp is true, then filePtr already points to a legal file; the purpose of calling CreateDiskFile is to re-write all the labels of the existing file with the new serial number, and to re- initialize the leader page. The data contents of the original file are lost. Note that this process effectively "deletes" the file described by filePtr when CreateDiskFile is called, and makes a new file; the FP for the new file is returned in filePtr. If pageBuf is supplied, it is written on the leader page of the new file after setting the creation date and directory FP hint (if supplied). If pageBuf is omitted, a minimal leader page is created. DeleteDiskPages(disk, CA, firstDA, filePtr, firstPage, newFp, hintLastPage) Arguments beyond firstPage are optional. Deletes the pages of a file, starting with the page whose number is firstPage and whose disk address is firstDA. CA is a page-sized buffer which is clobbered by the routine. hintLastPage is as described under ActOnDiskPages. If newFp is supplied and nonzero, it (rather than freePageFp) is installed as the FP of the file, and the pages are not deallocated. 6. Allocating Disk Space The disk class also contains routines for allocating space and for converting between virtual and real disk addresses. In most cases, users need not call these routines directly, as the four routines given above (ActOnDiskPages, WriteDiskPages, DeleteDiskPages, CreateDiskFile) manage disk addresses and disk space internally. AssignDiskPage(disk, virtualDA, nil) returns the virtual disk address of the first free page following virtualDA, according to the bit table, Disks & Bfs April 10, 1982 11 and sets the corresponding bit. It does not do any checking that the page is actually free (but WriteDiskPages does). If there are no free pages it returns -1. If it is called with three arguments, it returns true if (virtualDA+1) is available without assigning it. If virtualDA is eofDA, AssignDiskPage makes a free-choice assignment. The disk object remembers the virtual DA of the last page assigned and uses it as the first page to attempt to assign next time AssignDiskPage is called with a virtualDA of eofDA. This means that you can force a file to be created starting at a particular virtual address by means of the following strategy: ReleaseDiskPage(disk, AssignDiskPage(disk, desiredVDA-1)) CreateDiskFile(disk, ...) // or whatever (e.g., OpenFile) ReleaseDiskPage(disk, virtualDA) marks the page as free in the bit table. It does not write anything on the disk (but DeleteDiskPages does). VirtualDiskDA(disk, lvRealDA) returns the virtual disk address, given a real disk address in rv lvRealDA. (The address, lvRealDA, is passed because a real disk address may occupy more than 1 word.) This procedure returns eofDA if the real disk address is zero (end-of-file), and fillInDA if the real disk address does not correspond to a legal virtual disk address in this file system. RealDiskDA(disk, virtualDA, lvRealDA) computes the real disk address and stores it in rv lvRealDA. The function returns true if the virtual disk address is legal, i.e. within the bounds of disk addresses for the given "disk." Otherwise, it returns false. 7. Lower-level Disk Access The transfer routines described previously have the property that all disk activity occurs during calls to the routines; the routines wait for the requested disk transfers to complete before returning. Consequently, disk transfers cannot conveniently be overlapped with computation, and the number of pages transferred consecutively at full disk speed is generally limited by the number of buffers that a caller is able to supply in a single call. It is also possible to use the disk routines at a lower level in order to overlap transfers with computation and to transfer pages at the full speed of the disk (assuming the file is consecutively allocated on the disk and the amount of computation per page is kept relatively small). The necessary generic disk operations and other information are available to permit callers to operate the low-level disk routines in a device-independent fashion for most applications. This level makes used of a Command Block Zone (CBZ), part of whose structure is public and defined in Disks.d, and the rest of which is private to the implementation. The general idea is that a CBZ is set up with empty disk command blocks in it. A free block is obtained from the CBZ with GetDiskCb and sent to the disk with DoDiskCommand. When it is sent to the disk, it is also put on the queue which GetDiskCb uses, but GetDiskCb waits until the disk is done with the command before returning it, and also checks for errors. Disks & Bfs April 10, 1982 12 If you plan to use these routines, read the code for ActOnDiskPages to find out how they are intended to be called. An example of use of these routines in a disk-independent fashion (i.e., using only the public definitions in Disks.d) may be found in the DiskStreamsScan module of the Operating System. Only in unusual applications should it be necessary to make use of the implementation-dependent information in Bfs.d or Tfs.d. InitializeDiskCBZ(disk, cbz, firstPage, length, retry, lvErrorRoutine). CBZ is the address of a block of length words which can be used to store CBs. It takes at least three CBs to run the disk at full speed; the disk object contains the values DSK.lengthCBZ (fixed overhead) and DSK.lengthCB (size of each command block) which may be used to compute the required length (that is, length should be at least lengthCBZ+3*lengthCB). FirstPage is used to initialize the currentPage field of the cbz. Retry is a label used for an error return, as described below. lvErrorRoutine is an error routine for unrecoverable errors, described below; it defaults to a routine that simply invokes SysErr. The arguments after firstPage can be omitted if an existing CBZ is being reinitialized, and they will remain unchanged from the previous initialization. cb = GetDiskCb(disk, cbz, dontClear[false], returnIfNoCB[false]) returns the next CB for the CBZ. If the next CB is empty (i.e., it has never been passed to DoDiskCommand), GetDiskCb simply zeroes it and returns it. However, if the next CB is still on the disk command queue, GetDiskCb waits until the disk has finished with it. Before returning a CB, GetDiskCb checks for errors, and handles them as described below. If there is no error, GetDiskCb updates the nextDA and currentNumChars cells in the CBZ, then calls cbz>>CBZ.cleanupRoutine(disk, cb, cbz). Next, unless dontClear is true, the CB is zeroed. Finally, the CB is returned as the value of GetDiskCb. If returnIfNoCB is true, GetDiskCb returns zero if there are no CBs in the CBZ or the next CB is still on the disk command queue. If the next CB has suffered an error, then GetDiskCb instead takes the following actions. First it increments cbz>>CBZ.errorCount. If this number is ge the value disk>>DSK.retryCount, GetDiskCb calls the error routine which was passed to InitializeDiskCBZ; the way this is done is explained in the description of ActOnDiskPages above. (If the error routine returns, GetDiskCb will proceed as if an error hadn't occurred.) Otherwise, after doing a restore on the disk if errorCount ge disk>>DSK.retryCount/2, it reinitializes the CBZ with firstPage equal to the page with the error, and returns to cbz>>CBZ.retry (which was initialized by InitializeDiskCBZ) instead of returning normally. The idea is that the code following the retry label will retry all the incomplete commands, starting with the one whose page number is cbz>>CBZ.currentPage and whose disk address is cbz>>CBZ.errorDA. DoDiskCommand(disk, cb, CA, DA, filePtr, pageNumber, action, nextCb) Constructs a disk command in cb with data address CA, virtual disk address DA, serial and version number taken from the virtual file id in filePtr, page number taken from pageNumber, and disk command specified by action. The nextCb argument is optional; if supplied and nonzero, DoDiskCommand will "chain" the current CB's label address to nextCb, in such a way that the DL.next word will fall into nextCb>>CB.diskAddress. DoDiskCommand expects the cb to be zeroed, except that the following Disks & Bfs April 10, 1982 13 fields may be preset; if they are zero the indicated default is supplied: labelAddress lv cb>>CB.label numChars 0 If DA eq fillInDA, the real disk address in the command is not set (the caller should have either set it explicitly or passed the CB as the nextCb argument for a previous command). Actions are checked for legality. The public cells in the CBZ most likely to be of interest are the following: client: information of the caller's choosing (e.g., a pointer to a related higher-level data structure such as a stream.) cleanupRoutine: the cleanup routine called by GetDiskCb (defaulted to Noop by InitializeDiskCBZ). currentPage: set to the firstPage argument of InitializeDiskCBZ and not touched by the other routines. (Note, however, that GetDiskCb calls InitializeDiskCBZ when a retry is about to occur, so when control arrives at the retry label, currentPage will be set to the page number of the command that suffered the error.) errorDA: set by GetDiskCb to the virtual disk address of the command that suffered an error. nextDA: set by GetDiskCb to the virtual disk address of the page following the one whose CB is being returned. (This information is obtained from the next pointer in the current page's label. Note that errorDA and nextDA are actually the same cell, but they are used in non-conflicting circumstances.) currentNumChars: set by GetDiskCb to the numChars of the page whose CB is being returned. head: points to the first CB on GetDiskCb's queue; contains zero if the queue is empty. 8. Error Codes The following errors are generated by the BFS. Similar errors are generated by other instances of a disk object. 1101 unrecoverable disk error 1102 disk full 1103 bad disk action 1104 control block queues fouled up 1105 attempt to create a file without creation ability 1106 can't create an essential file during NewDisk 1107 bit table problem during NewDisk 1108 attempt to access nonexistant bit table page Disks & Bfs April 10, 1982 14 9. Implementation -- Bfs The implementation expects a structure BFSDSK to be passed as the "disk" argument to the routines. The initial portion of this structure is the standard DSK structure followed by a copy of the DiskDescriptor header and finally some private instance data for the disk in use. (Note: The Alto operating system maintains a static sysDisk that points to such a structure for disk drive 0.) Bfs ("Basic File System") is the name for a package of routines that implement the disk class for the standard Alto disks (either Diablo Model 31 drives or a single Diablo Model 44 drive). The definitions (in addition to those in AltoFileSys.D and Disks.D) are contained in Bfs.D. The code comes in two "levels:" a "base" for reading and writing existing files (implements ActOnDiskPages, RealDiskDA and VirtualDiskDA only); and a "write" level for creating, deleting, lengthening and shortening files (implements WriteDiskPages, CreateDiskFile, DeleteDiskPages, AssignDiskPage, ReleaseDiskPage). The source files BfsBase.Bcpl, Dvec.Bcpl and BfsMl.Asm comprise the base level; files BfsWrite.Bcpl BfsCreate.bcpl, BfsClose.bcpl, and BfsDDMgr.bcpl implement the write level. BfsMakeFpFromLabel(fp, la) constructs a virtual file id in the file pointer fp from the real file id in the label la. disk = BFSInit(diskZone, allocate[false], driveNumber[0], ddMgr[0], freshDisk[false], tempZone[diskZone]) returns a disk object for driveNumber or zero. The permanent data structures for the disk are allocated from diskZone; temporary free storage needed during the initialization process is allocated from tempZone. If allocate is true, the machinery for allocating and deallocating disk space is enabled. If it is enabled, a small DDMgr object and a 256 word buffer will be extracted from diskZone in order to buffer the bit table. A single DDMgr, created by calling 'ddMgr = CreateDDMgr(zone)', can manage both disks. If freshDisk is true, BFSInit does not attempt to open and read the DiskDescriptor file. This operation is essential for creating a virgin file system. success = BFSNewDisk(zone, driveNum[0], nDisks[number spinning], nTracks[physical size], dirLen[3000], nSectors[physical size]) creates a virgin Alto file system on the specified drive and returns true if successful. The zone must be capable of supplying about 1000 words of storage. The logical size of the file system may be different from the physical size of driveNum: it may span both disks (a 'double-disk file system'), or it may occupy fewer tracks (a model 44 used as a model 31). The length in words of SysDir, the master directory, is specified by dirLen. Some machines that emulate Altos implement 14 sectors per track. BFSExtendDisk(zone, disk, nDisks, nTracks) extends (i.e. adds pages to) the filesystem on 'disk'. Presumably 'nDisks' or 'nTracks' or both is bigger than the corresponding parameters currently in disk. A single model 31 may be extended to a double model 31 or a single model 44 or a double model 44, and a single model 44 may be extended to a double model 44. The zone must be capable of supplying about 750 words of storage. 0 = BFSClose(disk, dontFree[false]) destroys the disk object in an Disks & Bfs April 10, 1982 15 orderly way. If dontFree is true, the ddMgr for the disk is not destroyed; presumably it is still in use by the other disk. (Note that this procedure is the one invoked by the CloseDisk generic operation.) BFSWriteDiskDescriptor(disk) insures that any important state saved in memory is correctly written on the disk. virtualDA = BFSFindHole(disk, nPages) attempts to find a contiguous hole nPages long in disk. It returns the virtual disk address of the first page of a hole if successful, else -1. BFSTryDisk(drive, track, sector[0]) returns true if a seek command to the specified track on the specified drive is successful. Note that the drive argument can contain an imbedded partition number. Seeks to track zero will fail if the drive is not on line. Seeks to track BFS31NTracks+1 will fail if the drive is a model 31. 10. Implementation -- Tfs Operation and implementation of the Trident T80 disks is described in separate documentation under the heading "TFS/TFU" in Alto Subsystems documentation. 11. BFSTest BFSTest is the test and utility program for the Basic File System. It performs many of the same functions as TFU (in fact much of its code is lifted from TFU), except that commands which are better provided by other subsystems such as the Executive (copy, rename, delete, etc.) are omitted. It has a conventional command scanner and implements the following commands. ERASE formats one or more disks as an Alto file system. It asks you to specify the number of disks, cylinders and sectors. Any sectors marked "incorrigable" (by the CERTIFY command, below) are not included in the file system and will never again cause trouble unless the disk is erased with DiEx. The OS erase command also preserves this bad spot information. EXERCISE creates, deletes, reads, writes and positions files the same way that normal programs do, and checks the results which normal programs do not do. These high-level operations cause patterns of disk commands which are quite different from those generated by lower-level tests such as DiEx. Exercise assumes that a file system exists on DP0 (and perhaps extends on to DP1). It creates as many test files (named Test.001, Test.002, ...) as will fit in the file system, filling each file with a carefully chosen test pattern. When it is done, it deletes all of its files. One 'pass' consists of stepping through the test files, performing a randomly chosen operation on the file, and checking the results. The duration and throughness of a pass depends on the amount of free space on the disks. Disks & Bfs April 10, 1982 16 Ideally, the disk(s) under test have just been erased with the ERASE command, below. While running, exercise looks for commands from the keyboard. The current commands are: Q Quit Delete all test files and stop. S StopOnError Wait until a character is typed. All test files are 100 pages long. Each page of a file has the page number in its first and last words and a data pattern in the middle 254 words. The data pattern is constant throughout a file, consisting of a single one-bit in a word of zeros or a single zero-bit in a word of ones. Files are read and written with ReadBlock and WriteBlock using buffers whose lengths are not multiples of the page size. The operations are: Write Write the entire file with the data pattern. Read Read the entire file checking the data pattern. Delete Delete the file, create it again and then write it. Copy Copy the file to some other randomly chosen file. If both disks are being tested, one third of the time pick a destination file on the other disk. Position Position to twenty randomly chosen pages in the file. Check that the first word of the page is indeed the page number. One third of the time dirty the stream by writing the page number in the last word of the page. CERTIFY tests one or more disks for bad spots and marks such pages "incorrigable" so that they will not be included in subsequent file systems. Ideally, a disk should be certified before being used. Certify can be stopped at any time by hitting any key. A bad spot is any sector which gives three or more checksum errors. If the Scavenger encounters a bad spot, it will also mark the sector incorrigable. Alto and Dorado disks almost never have bad spots; but Dolphin disks typically have a few, so CERTIFYing is a must for trouble-free operation. Note that bad spot information will be lost if a certified pack is written by COPYDISK or DIEX. CREATEFILE attempts to create a contiguous file of a specified size. If it can't, it creates a file with a minimum number of page runs. PARTITION allows you to set the disk partition on which BFSTest will operate. This command is only available on Dolphins and Dorados. A thorough, high-level "acceptance test" for the disk subsystem of an Alto or Alto-emulating machine (i.e. a Dolphin or Dorado) consists of at least 100 passes of CERTIFY with less than 5 bad spots (any bad spots on cylinder 0 are unacceptable), followed by ERASE, followed by at least 10 passes of EXERCISE with no errors.