Cedar Disk Formats XEROX To From Cedar Users Robert Hagmann PARC Subject Date Cedar Disk Formats January 7, 1985 Release as ??? Last edited by Hagmann, January 7, 1985 Abstract This is a piece of archaeology to discover the truth about the disk formats on Cedar Attributes technical, Alpine, Cedar, File, Disk Formats See also FileNotes.tioga in the latest Cedar release Introduction and Description General Disks are magnetic media that store data. They have some number of platters (disks) that store data by magnetism, and spin at a high rate. Platters normally store data on both surfaces. Often the top and bottom surface of the outermost platters are not used and are only there for physical protection. A physical disk is called removable if you can easily remove it from the drive (e.g., a T-80). Sometimes both the outside platters are there for physical protection of the removable media, and the outside platters are somewhat different from the data recording platters (different color, thinner, no oxide). Many disks also have servo surfaces: this is a normal read/write surface that is used (read-only) for head positioning. Thus, our five platter T-80's have ten surfaces: 5 for data, 1 for servo and 4 for physical protection. T-300's have 19 data surfaces (?). T-315's have 19 read/write heads (see below), but only 5 platters for data recording. For each surface used for data (and servo), there is a disk head (or more then one, see below). This is a read/write head flying (not joke!) millionths of an inch above the disk surface. It encodes and decodes "bits" into magnetic fields onto/from the disk surface. There is room to store many bits on the surface as the disk does a revolution. In addition, the disk head may be positioned to a different location. Each trace of a head around a particular surface is called a track. All the tracks for a given position of the head positioning mechanism is called a cylinder. Tracks are divided into pieces called sectors (sometimes pages and sometimes blocks). Sectors may be variable in size, but it is more common to see fixed sector sizing: all sectors on a disk are the same size. All of our disks are formatted with fixed sector sizing. Sometimes the number of sectors on a track is a function of the cylinder number (cylinders closer to the spindle are smaller). We do not do this. Back to our T-80, it has 5 heads and 815 cylinders. Each track has 28 sectors of 512 bytes each, for a total of 114,100 sectors. Hence, our T-80's store 58,419,200 bytes of data. The T-315's have 433,580 sectors, and 221,992,960 bytes. Since the "80" and "315" are supposed to indicate how many megabytes each drive holds, there is a lot of space used by the format (see below). On a fixed head disk (also called a drum, incorrectly, or a head per track disk), there is a head for each track. Here, heads are never positioned. I don't think we use any of these, but VAXC may have one. Most disks have movable heads: the heads seek to different cylinders. There are also disks where some of the disk is a fixed head disk, while other parts have movable heads. Some disks have more than one head mounted on an actuator arm flying above the same disk surface. Sometimes these are used to simply decrease seek time (I used a NCR computer that did this). Another use is to make a single surface appear to be many surfaces. Our T-315's have two heads per surface. There is some way for the hardware to tell when index occurs: that is when the disk has started a new revolution. Sometimes there is a physical notch on a disk surface that is sensed. At index on all tracks is sector 0. If the disk controller and computer are fast enough (capable of doing back to back sector transfers), the sectors are numbered in simple increasing order. This is called 1:1 interleaving. If the controller and computer are not fast enough, consecutive sectors cannot be read or written. Sectors are then skipped to give the disk controller and computer time to turn around and process the next sector. If every other sector is transferred, this is 2:1 interleaving, and if every third sector, then it is 3:1 interleaving. The Dorado is fast enough, and it uses 1:1 interleaving. Previous versions of this document claimed, incorrectly, that 2:1 interleaving is used. The next paragraph is wrong, but it is left in this document for general information and historical reasons. The Dorado is not fast enough, and it uses 2:1 interleaving! I don't know for sure the sector ordering, but typical interleaving goes as follows: 0, 14, 1, 15, 2, 16, 3, 17, 4, 18, 5, 19, 6, 20, 7, 21, 8, 22, 9, 23, 10, 24, 11, 25, 12, 26, 13, 27 (and we are back to index). Note that it takes three revolutions to read a track in a multi-track transfer: start at 0 and read to 27; that is two revolutions; you do not have enough time to pick up sector 0 on the next track, so you have to waste a revolution waiting for the disk to turn. The real reason is that 2 and 28 are not relatively prime. It is possible to precess the index location to get higher transfer rate, and we think that this is done on the Dorado for drive 0 only. It precesses by 4 sectors between cylinders. I think that is a four interleaved sector precess, so it really is 8 sectors. This allows 16.7 milliseconds/revolution * 8 sectors / 28 sectors per track = 4.77 milliseconds to do a track to track seek, which is just about what the specifications say. You also have to understand the various performance characteristics of disks to understand how they work. Such things as rotational speed (typically 3600 RPM), track density per inch (a few hundred; the inverse of the distance between two tracks), bit density, and head positioning time (track to track (adjacent tracks), average and worst case). Logical and Physical Addresses, Labels and Headers Most disks have preamble area for each sector that is called the format and is written by "Format" programs. There is also an interrecord gap between sectors. Once this format is written it is permanent, and identifies the sector (cylinder, track and sector address) as well as providing very fine grain synchronization for reading or writing the disk. Sometimes, there is a special area that precedes each sector that have read/write data that logically identifies the sector. This is called the label, and we use this feature on every sector (almost nobody else in the world uses this feature). Our labels are ten 16-bits words long (this may be false of Dolphins), and are specified in DiskFace.Label as follows (see VolumeFormat.Attributes for the definitions of Attributes, AbsID and RelID below): Label: TYPE = MACHINE DEPENDENT RECORD[ fileID(0): FileID, filePage(5): INT, attributes(7): Attributes, dontCare(8): DontCare ]; FileID: TYPE = MACHINE DEPENDENT RECORD[id(0): SELECT OVERLAID * FROM rel => [ relID(0): RelID, fill4(4): CARDINAL _ 0], abs => [ absID(0): AbsID ], ENDCASE ]; RelID: TYPE[4]; AbsID: TYPE[5]; Attributes: TYPE[1]; DontCare: TYPE[2]; A physical disk is called a Physical Volume (at least in Iago). A Physical Volume is divided into parts called Physical Sub-Volumes. These are contiguous (see below) regions of the Physical Volumes. Cedar deals with Logical Volumes. Each logical volume is a collection of Physical Sub-Volumes (normally a single Physical Sub-Volume) that has a name such as "Cedar" or "Debugger" or "Luther.alpine". Disk sectors have both a physical disk address (cylinder, track and sector) and a logical disk address. The latter is the offset within the logical volume for a sector. Due to backward compatibility with the Alto and Alto microcode for the Dorado, there is something funny about the real physical disk addresses and the idea of contiguous sectors. One normally expects that disk addresses will vary sector fastest, then track (head), and finally cylinder. This is because for large sequential file transfers, you do all the I/O you can before you move the head. This is the way it works for DLions, I think also for Dolphins, and for all drives except drive 0, the system drive, for a Dorado. Here, the order is sector, cylinder and then track. This is because the Alto microcode expects to have the sole use of a disk surface. I think that it thinks of a surface as being a double drive model 44 (or something like that). Hence, when you build a disk in Iago, you specify the number of Alto partitions. These are surfaces, and are allocated from the high surface number (19 and down on a T-315, and 5 and down on a T-80) and only on drive 0. All remaining room on the disk (almost) is available for Cedar logical volumes. The "Create Logical Volume" command in Iago lets you allocate some unused room on the disk to a new logical volume. The "Create Physical Volume" command in Iago destroys all the logical volumes on a physical disk. Files A collection of sectors can be grouped together into a file. This is physical representation of the abstraction provided by the Cedar module File. A file is something with a unique for all time id, that uses some number of sectors to store an array of data. The sectors are divided between the header and the data. The header is currently exactly two sectors, while the data is zero (or maybe it's one) or more logical sectors numbered consecutively numbered from zero. Any collection of logical sectors from a logical volume (with some restrictions) can be made into a file. Sectors are grouped into extents called runs that are consecutive sectors in the file that have consecutive logical disk addresses. There is a maximum (currently) of 83 runs per file. This is because the 2 page headers don't have enough room for more. Just looking at the label of a sector, as described above, you can find the FileID, filePage, and attributes. Header pages are "filePage" zero and one, with the proper "FileID", and have "attributes" of "header". Header page 0 is for the run table. The run table is a hint as to where the pages of the file are. The truth is, as always, in all the file headers and labels on the disk. See "Header Page 0" below for the format of page 0 for all files. See "Header Page 1" for the kind of file, FS or Alpine. Page 1 is for file attributes as defined by the File System (e.g., FS or Alpine) built over File. Data pages are "filePage" zero on up, with the proper "FileID", and have "attributes" of "data". Note that there are two "filePage" zeros: one has attributes of "header" and one has "data" File Naming: FS "File" does not support a naming or other higher level functions of a File System. This is provided by a "File System", and the normal one to talk about is FS. FS has mapping information stored in the FS BTree (also called simply the BTree). There is a special way to find this special file: see Logical Root description below. The BTree has attachment information (///FS.mesa => /Indigo/Cedar5.2/FS/FS.mesa!2); it has local files: files where the only copy exists on this disk; and finally, it has entries from the file cache: it might have a local copy of /Indigo/Cedar5.2/FS/FS.mesa!2. The BTree is a cache of the truth, which is stored in the labels and headers of all the sectors in the logical volume. Scavenge will rebuild the BTree from the labels and headers, and the remnants of a BTree (if any). The "FS BTree" is kept in a B-Tree maintained through the BTree package. Free Pages and the VAM Another important file is the VAM (volume allocation map) for a logical volume. This file contains the bit map of pages that are free. The VAM is a hint: the truth is stored in the label of the page (attributes of freePage). This is important since the VAM does not truly have to be up to date at the time of a crash. When File tries to allocate a file extent, it looks in the VAM and reads the labels of the corresponding sectors. It then accepts the initial run of truly free pages as the next extent (approximately). Low Level Data Structures Physical page 0 is the place to start. I think it is two pages long, and has attribute PhysicalRoot. The second page is bad page table. Anyway, VolumeFormat.PhysicalRoot describes all the fields in page 0. From the physical root, you can find the checkpoint, microcode, germ, and bootFile (bootingInfo); label; and sub volumes (subVolumes). These are the physical sub volumes (type the "describe physical" command to Iago). From the logical volume ids (VolumeFormat.SubVolumeDesc.lvID) in the physical sub volumes, the logical volumes can be discovered. There can be gaps between physical sub volumes for reasons I don't know (could be Pilot compatibility or initial microcode). Now we have discovered all the logical volumes. Reading logical page 0 of a logical volume will give you the Logical Root (has attribute LogicalRoot). See VolumeFormat.LogicalRoot. This has a single page of information. The more interesting stuff is the bootingInfo field, which points to the checkpoint, microcode, germ, bootFile, debugger and debuggee files (included for Pilot compatibility?); the rootFile field, which points to the same stuff as bootingInfo (? some of this is redundant and probably not used), plus VM, VAM, client, and alpine. VM is the virtual memory backing file, and normally is also included in the FS file name BTree as VM.DontDeleteMe. "client" is the file for the FS BTree. "alpine" is the file for the root of the Alpine directory system. Type the "describe logical" command to Iago to get a description of the disk. In VolumeFormat there are also sectors described called PhysicalMarker and LogicalSubvolumeMarker. I think that they are no longer used. Files to Know About Much of the detail of the formats are in the following interface files: VolumeFormat, PhysicalVolume, FileInternal, File, BootFile, and DiskFace. Selected Page Formats Header Page 0 This is from VolumeFormat.LogicalRunObject. It is stored in the data part of header page 0 for a Cedar file. Word use 0 headerPages(0): CARDINAL -- number of pages in header (should be 2) 1 maxRuns(1): CARDINAL -- 83 2 intention(2): RunTableIntention -- stable or unstable 5-end runs(5): SEQUENCE COMPUTED CARDINAL OF LogicalRun where: LogicalRun: TYPE = MACHINE DEPENDENT RECORD[ -- 3 words each first(0): LogicalPage, -- which is a PhysicalVolume.LogicalPage, which is a RECORD[INT] size(2): RunPageCount -- CARDINAL ]; The number of actual runs is computed by scanning forward until a sentinel is reached. This is VolumeFormat.lastLogicalRun Header Page 1 - For an FS File This is from FSPropertiesImpl.PropertiesObject. Word use 0-3 validation: GVBasics.Password -- should be GVBasics.MakeKey["August 19, 1983 1:16 pm"], no joke. -- word 0 is 151306B 4-5 bytes: INT -- how big FS thinks the file is in bytes. 6 keep: CARDINAL 7-8 created: BasicTime.GMT 9 version: FSBackdoor.Version 10-end nameBody: TextRep, where TextRep is RECORD[length: CARDINAL, chars: PACKED ARRAY [0 .. 0) OF CHARACTER]; Header Page 1 - For an Alpine File (called the Leader page in Alpine) This is from LeaderPageFormat.LeaderPageRecord Word use 0 seal (0): CARDINAL -- = 43875 (which is 125543B) 1 dataStart (1): LeaderPageOffset. 2 dataEnd (2): LeaderPageOffset 10 data (offsetData): ARRAY [offsetData..AlpineEnvironment.wordsPerPage) OF UNSPECIFIED]; Each entry has three fields (PropertyRep): size ( 1/2 word) of the number of words in the entry property (other 1/2 word) from PickledProperty value žThe fields "fileID", "attributes" and "dontCare" have opaque types and are not interpreted by the Head, only by the higher level software. When verifying a label, all fields except "dontCare" must match. When the head/microcode moves to next page of a run, label.filePage is incremented. Ignored in label verification. Typically used for boot chain links, in which case it's actually a DiskFace.Address. Êï•StyleDef­BeginStyle (Cedar) AttachStyle (firstHeadersAfterPage) {0} .cvx .def (root) "format for root nodes" { docStandard 36 pt topMargin 36 pt headerMargin .5 in footerMargin .5 in bottomMargin 1.75 in leftMargin 1.5 in rightMargin 5.25 in lineLength 24 pt topIndent 24 pt topLeading 0 leftIndent 10 pt rightIndent } StyleRule (positionInternalMemoLogo) "Xerox logo: screen" { docStandard 1 pt leading 1 pt topLeading 1 pt bottomLeading } ScreenRule (positionInternalMemoLogo) "for Xerox logo" { docStandard 1 pt leading 1 pt topLeading -28 pt bottomLeading -1.5 in leftIndent } PrintRule (internalMemoLogo) "Xerox logo: screen" { "Logo" family 18 bp size 20 pt topLeading 20 pt bottomLeading } ScreenRule (internalMemoLogo) "for Xerox logo" { "Logo" family 18 pt size 12 pt leading -28 pt topLeading 36 pt bottomLeading -1.5 in leftIndent } PrintRule (memoHead) "for the To, From, Subject nodes at front of memos" { docStandard AlternateFontFamily 240 pt tabStops } StyleRule EndStyle˜Iblock•Mark insideHeaderšœ˜Ipositioninternalmemologo˜Iinternalmemologošœ˜memoheadšÏsœ˜Nšœ Ïtœ˜Nšœ˜N˜N˜N˜N˜N˜—šœ˜ Nšœ"˜"N˜N˜—Iabstractš œ œœ˜6OšžœT˜]Oš œ-˜7Ošœ,˜4head˜˜IbodyšœÍÏi œ¨ŸœÁ˜ÄQšœvŸœ‘˜‹Qš œßŸœ ŸœŸœBŸœ†˜ÙQšœþ˜þQš œŸœŸœŸœnœ ŸœŸœ°˜®Qš œ0ŸœÕŸœ‰Ÿœ(Ÿœ˜ì šœÿ˜ÿQš œÐbiœ—Ÿœð:œ‰¬œ˜—Q˜Û—˜2 šœAŸœ8ŸœçŸœ­˜§J˜š œÏkœ¡œ¡ œ¡œ˜'J˜Jšœ ¡œ˜J˜J˜Jšœ¡™¡—J˜šœ¡œ¡œ¡ œ¡œ¡œ¡œ¡˜EJšœ$¡œ˜2J˜Jš¡œ˜ —Jšœ¡œ˜Jšœ¡œ˜J˜Jšœ ¡œ˜J˜šœ ¡œ˜Jšœt™t—Q˜—Qš œŸœEŸœWŸœÃŸœ$ŸœC˜½QšœßŸœ’˜÷QšœÜ˜ÜQ˜§—˜Qšœ7Ÿœ³ŸœÑ˜ÃQšœm˜mQšœŸ œŸœÐ˜ôQšœ½˜½—˜QšœËŸœŸœY˜ÊQšœ[Ÿ œÄ˜ª—˜QšœŸœë˜Œ—˜Qšœ­˜­Qš œnŸ œŸœ÷¡œ¡œ¡œª˜×Qšœ‰˜‰—˜Qšœ‘˜‘——˜˜ Qšœm˜mQ˜ šœ¡˜Qšœ,˜,—Qšœ¡œ˜Q˜8 š œ¡œ¡œ¡œ¡œ ˜8 šœ˜ š œ ¡œ¡œ¡ œ¡œ˜=QšœL¡œ¡œ˜WQšœ¡˜!Qšœ˜———Qšœ{˜{—˜Qšœ/˜/Q˜ šœ"˜"QšœB˜BQ˜—Qšœ ¡œ+˜:Qšœ ¡˜Qšœ¡˜Qšœ˜ šœ˜ šœ˜Qš ¡œ ¡œ ¡œ¡œ ¡œ¡ œ˜D———˜EQšœ.˜.Q˜ Qšœ ¡œ˜3Qšœ#˜#Q˜ šœ¡œ.¡œ¡ œ˜Z šœ+˜+Q˜4Qšœ.˜.Qšœ˜——Q˜———…—<¤G1