Heading:
Dolphin Booting and Maintenance Panel Codes
Page Numbers: Yes X: 527 Y: 10.5"
XEROX
PARC
6 March 1984
To:Dolphin Users and Maintenance Personnel
From:Edward Fiala
Subject:Dolphin Booting and Maintenance Panel Codes
Filed On:[Indigo]<D0Docs>MPCodes.Bravo, .Press
The purpose of this memo is to provide enough information to diagnose malfunctions to the extent that MP codes permit, covering Pilot, Cedar, and Alto microcode systems. Lisp and Smalltalk presently use the same MP codes as Alto, but note that Cedar now has diverged from Pilot to the point that its MP codes are no longer similar.
This revision incorporates changes in booting procedures and maintenance panel codes associated with the Pilot 11.0 (Klamath) and Cedar 5.0 releases. It is still inaccurate and incomplete for Tor (5700) configurations. Please report discrepancies or omissions. The original version of this memo was on 10 October 1980 by Hal Murray. I have revised it and extended it periodically.
Please note that MP codes produced during booting by Initial (one of the microprograms discussed below) changed about 1 June 1982 and about 10 January 1983; a few revisions also were made in January 1984. If you are using Pilot or Alto emulator microcode produced before 1 June 1982, then this document will be inaccurate with respect to Initial. If you ether boot Initial (which happens if you have just powered up, and your SA4000 disk is not yet up to speed), then this memo should be accurate. There are only a few changes in Initial’s MP codes between 8.0 and 11.0 Pilot releases, and between 4.0 and 5.0 Cedar releases.
I have included a recent summary of MP codes shown by Pilot and Cedar software, but this is NOT authoritative with regard to MP codes generated by software--only microcode-generated MP codes are reliably described in this document. An authoritative document for Pilot MP codes is the [Idun]<APilot>11.0>*>PilotMP.mesa source file; for Cedar, the equivalent file is [Indigo]<Cedar5.1>*>MPCodes.mesa, I think.
Some recent changes in MP codes and booting procedures are as follows:
1) Cedar now boots BasicCedarD0 from both the disk and the Ethernet; Iago is used instead of Othello; Othello is no longer needed by Cedar users. The MP codes shown by BasicCedarD0 are all different.
2) Othello is still used by Pilot, but the MP codes shown by both the 11.0 germ and Othello are enriched and changed slightly from 10.0. 928 (waiting for boot server) now appears in the MP until a boot server responds; 930 (Pilot control and Mesa runtime components being initialized) is usually not seen now; and 975 (initializing transactions) no longer appears.
3) Dolphin microcode files are now obtained from one of the following places:
[Indigo]<D0>Klamath>11.0 microcode
[Indigo]<D0>Trinity>8.0 to 10.0 microcode
[Indigo]<D0>Cedar>Cedar microcode (compatible with Pilot 6.0)
[Indigo]<D0>Alto>Alto microcode
Note that the initial microcode for Cedar (=Pilot 7.0), for PIlot 8.0 to 10.0, and for Pilot 11.0 are mutually incompatible because the location and format of the Pilot germ, germ request, and switches differ.
Files of interest are as follows:
InitialAltoD0.ebDisk initial microcode for default Alto boot.
InitialPilotD0.ebDisk initial microcode for default Pilot or Cedar boot.
Initial.eb3 mb boot server initial microcode.
EtherInitialD0.eb10 mb boot server initial microcode (not ready yet)
PilotD0.ebPilot microcode for workstations (UTVFC), used both
for ether boot and disk boot.
CedarD0.ebCedar microcode for workstations (UTVFC), used both
for ether boot and disk boot.
PilotTor.ebPilot 5700 microcode (UIB).
AltoD0.eb3 mb boot server Alto microcode (UTVFC).
Alto5700.eb3 mb boot server Alto microcode (UIB).
These files may appear elsewhere outside PARC, and the 5700 and Tor named files may not appear on [Indigo]<D0>.
4) It is now possible to 3 mb ether boot Pilot or Cedar while holding the "BS" and "p" keys (current release) or the "BS" and "v" keys (AlphaMesa release). 10 mb ether booting is being worked on. A summary of keyboard boot features is given later in this document.
5) During ether loading of a file by Initial, the MP code shown is now 740 + 2 x low four bits of boot file number.
6) Pilot microcode released after 11 May 1983 will crash with an MP code of 220 + pipe task when an io task memory reference experiences a page or write protect fault. Emulator references are still passed to software.
7) Pilot microcode released after 11 May 1983 will crash with an MP code of 137 when a MX (monitor exit) or MW (monitor exit and wait) opcode is executed and the selected monitor is already unlocked.
Booting
A Dolphin bootstrap takes place in five or six stages, as follows:
(1) Hardware loads the Boot microcode from an EPROM;
(2) Boot (EPROM microcode) tests the processor and loads Initial;
(3) Initial microcode tests the map and storage and loads an emulator;
(4) Pilot or Alto microcode initializes itself, then loads and starts the Pilot Germ or Alto breath-of-life program;
(5a) Pilot Germ reads in the physical volume or Ethernet boot file. The Ethernet boot file is BasicCedarD0 for Cedar users, OthelloD0 for others.
(5b) Alto breath-of-life program reads in the Alto OS from the 3 mb Ethernet or the disk. The Alto OS completes the bootstrap.
(6a) BasicCedarD0 or OthelloD0 completes an ether boot; other boot files are also possible on a physical volume boot.
Lisp and Smalltalk booting is similar but won’t be discussed here.
Hardware Boot
A bootstrap may be initiated in the following ways:
(1) by pushing and releasing the start (power-on) button;
(2) from the test hardware (Midas);
(3) from a TOD clock (not standard equipment);
(4) from a watchdog timer (not standard equipment);
(5) by executing the Boot function in a microinstruction;
(6) pushing the keyboard boot button while the UTVFC microcode is running normally (CSL keyboards only).
In addition to these normal methods of booting, if your machine is sick, it might spontaneously boot itself when a fault happens while the fault task is running--the Dolphin hardare does this. If the problem is really bad, it may reboot over and over.
On machines with CSL keyboards, repeat booting may occur when the display is turned off; the back channel reports continuous null in this case, perceived by microcode as "boot button depressed". This doesn’t happen with Star keyboards.
Also, some deficiencies in error handling by Initial cause certain classes of recoverable hardware failures (disk and Ethernet problems) to reboot the machine rather than recovering from the error. The reboot discards any keyboard control interpreted by Initial.
While you depress the start button, the hardware shows 8888 in the MP as a light test. When you release the power-on button, the machine will then run through the boot sequence discussed above. During the hardware boot, you might see either 8808 or 8880 in the MP if RM or IMX parity errors are detected by the hardware during loading. Control is transferred to location 0 after loading.
EPROM Boot Microcode
Boot is a tiny diagnostic and bootstrap loader limited (for reasons discussed in the D0 Hardware Manual Section 4.10) to 1024 instructions and 16 immediately accessible RM registers; other registers can be referenced using the stack pointer, however. Its function is to test parts of the machine needed for the next stage of the boot thoroughly but quickly, reporting any failures on the MP. If these tests are passed, it then boots the Initial microcode from an io device; the SA4000 disk and the 3 mb Ethernet are boot devices; 5700 (Tor) configurations use a variant Initial that boots from floppy disk; and 10 mb Ethernet booting will probably be made available at some point. Because loading Initial from the disk or Ethernet requires most of the processor to work, the tests performed by Boot must include most of the machine. However, the map and storage are not needed during the boot of Initial, so these parts of the machine are not tested. Also, many processor functions needed only by the emulator are not tested by Boot.
Boot runs a few processor tests to find out whether or not the processor is healthy enough to continue loading. Many machines malfunction when first powered up, then work correctly. If the processor tests fail, Boot will show an error MP code (0000 to 0039) for a second or so and then reboot. Otherwise, registers are initialized.
Then Boot determines whether or not it was started from the maintenance panel (i.e., by Midas during debugging). If so, it will show the MP code 0002 and read the Kernel debugging program over the printer interface from Midas.
Otherwise, Boot tries to read the first program on the SA4000 boot record directly into the microstore. When Boot starts this, it will show 0040 in the MP. If you see this (or 0041 or 0046 which frequently follow immediately), your processor is at least somewhat alive.
InitialAltoD0.eb and InitialPilotD0.eb are two standard microcode configurations which might be installed on the local disk. The initial microcode file lives in a special reserved portion of the disk (cylinder 0 on the SA4000), so you won’t see it among your Alto or Pilot files. Each of these is the concatenation of Initial and AltoD0 with different starting addresses for Initial. Unless redirected by the keyboard boot feature, InitialAltoD0.eb will boot Alto microcode and OS from the local disk, while InitialPilotD0.eb will boot Pilot microcode and physical volume boot file from the local disk. However, if Initial is unable to read Alto or Pilot microcode from the local disk, it will invoke a 3 mb ether boot.
If the disk won’t work (0041 to 0045, 0047 to 0048) or isn’t ready yet (0046), Boot will try to obtain Initial.Eb from the Ethernet; when this decision is made, a one second pause allows you to read the MP; if the reason for the boot is NOT a button push, the delay is extended to one minute to prevent a sick machine from hogging the boot servers. When ether booting starts, 0060 (trying to ether boot) will appear in the MP; other 006x MP codes indicate ether boot problems. While waiting for a boot server to respond and while transmitting Initial microcode from the boot server, 004x slowly alternates with 006x, so that you can see both the reason why the disk boot failed and the current ether boot indication.
Unfortunately, many machines experience a short period of unreliability after being powered on, but then work correctly. These machines encounter the one minute wait intended to prevent a sick machine from hogging the boot servers, and this can be frustrating, if you are waiting for the machine to become ready. If your machine does this, keep pushing and releasing the start button until you see a healthy 0060.
The standard trick for forcing an ether boot is to turn power off and then back on. It takes the disk about two minutes to become ready again. If you push the start button before the disk is ready, you should get to the Alto NetExec.
Note that Initial is loaded directly into the microstore without using either the map or storage.
Initial Microcode
Initial is primarily responsible for testing and initializing the map and storage, reporting any failures in the MP, then loading and starting an emulator. For Pilot, Initial also loads the germ, but the Alto emulator itself loads the breath-of-life program. Initial should also test whatever was not tested by Boot, but it doesn’t presently do so. Much of the machine is unfortunately not tested at all by any program during the boot sequence.
Initial has several entry points which determine the emulator booted later and the io device used for the boot; the file which has each starting address is also shown:
InitialPilotD0.ebSA4000 Pilot physical volume boot;
InitialAltoD0.ebSA4000 partition 1 Alto start;
Initial.eb3 mb Ethernet Alto start;
EtherInitialD0.eb10 mb Ethernet Pilot boot (undebugged);
--3 mb Ethernet Pilot boot;
--3 mb Ethernet AlphaMesa Pilot boot;
--10 mb Ethernet AlphaMesa Pilot boot (undebugged);
--SA4000 partition 2 Alto start.
Other entries are used by test programs. If the keyboard boot feature has been used to modify the default boot, then Initial will automatically restart itself at the appropriate starting address and clear the keyboard information.
The ’Ethernet Alto start’ entry point later will cause the Alto emulator, breath-of-life packet, and NetExec to be obtained from the Ethernet also. The ’SA4000 Pilot boot’ starting address will boot the Pilot microcode and germ installed on your SA4000; the Pilot germ must be compatible with Initial.
Shortly after Initial receives control, it puts 0700 (’starting map test’) into the MP very briefly followed by 0400 (’starting storage test’); if you see 0040 then 0700 or 0400 (without an intervening 0060), your RDC is at least somewhat healthy since Initial was loaded from the disk. Initial first tests the map; it will hang with the ’bad map’ MP code (0702) displayed, if the map is imperfect.
Then Initial tests storage and uses only ’good’ pages. 0400 is seen in the MP during storage testing, which lasts about 3 seconds with 8 96k-word storage boards.
If Initial detects any storage imperfection, it will do additional testing, and 0400 will be shown for 4 to 9 seconds (timing approximately proportional to the amount of storage). If the number of pages with correctable failures exceeds 1/8 of all pages, and if the amount of good storage is less than 128k words, then the entire test will be repeated allowing pages with soft failures to be used; otherwise, only perfect pages are used. After all testing is complete, four numbers will be shown in the MP for about 1.0 seconds each:
1) 0400+ 1 if 1st storage board imperfect,
+ 2 if 2nd imperfect,
+ 4 if 3rd imperfect
+ 8 if 4th imperfect,
+ 16 if 5th imperfect,
+ 32 if 6th imperfect,
+ 64 if 7th imperfect,
+128 if 8th imperfect.
This interpretation is for the 96k-word storage boards with 16k RAMs. Each such board occupies 128k words of real address space but implements only the first 96k words of the space. Hence, the bits actually represent imperfections in the existing storage of each 128k-word bank of storage.
2) Hard bad page count (some uncorrectable failures). This count, and the imperfect board indication previously, are incomplete if a page with hard errors is thought to be a non-existent page. The current algorithm is very unlikely to make this mistake.
3) Soft bad page count (correctable failures only).
4) Blk and Syndrome bits for the last failure detected by the error correction hardware. This is shown as four octal characters BSSS, where the first character of the MP code is the two Blk.0 and Blk.1 bits and the last three characters are the Syndrome.
Only 400 is shown when all storage boards are perfect.
Even when some storage is bad, unless the amount of ’good’ storage is reduced by failures to less than 64k words, initialization will continue normally following the bad-page MP codes.
When the storage test has finished, consecutive map entries enumerate pages of ’good’ storage and storage has been zeroed; map entries above the last good page are initialized to Vacant. The counts of good pages and hard and soft bad pages and the bad-board word are stored in high RM words where they will not be smashed by Pilot or Alto. A Pilot opcode allows these test results to be accessed and reported.
On an Alto disk boot, Initial then puts 0720 into the MP and continues reading the SA4000 boot record. But this time the emulator from the boot record is placed into storage rather than directly into the microstore. On a 3 mb ether boot, it instead shows 0758 (Star or CSL), or 0762 (Tor) in the MP and reads Alto.Eb from the Ethernet into storage. When it is going to start Pilot, Initial instead shows 0725 in the MP; this kind of boot is more complicated since it has to locate and load the germ as well as the emulator microcode.
At the end of file, the microcode image in storage is loaded and started with LoadRAM.
Emulator Microcode
Early in initialization both Pilot and Alto emulators show the MP code ’Start device init’ (0104) barely long enough to see. Seeing 0104 means that Initial tested and zeroed storage, loaded the microcode image into storage from the disk or Ethernet and then from storage into the microstore, transferred control to it, and executed at least the first few microinstructions successfully.
After initializing io microcode, Pilot and Cedar then show the number of pages found to be good by Initial; this will be up long enough to see (~ 0.4 seconds). Microcode initialization then finishes by starting the Germ, which will show 0900 (Pilot) or 0810 (Cedar) followed by the sequences given later. Note that Initial must be compatible with the Pilot or Cedar Germ, and there were incompatible changes in 6.0, 8.0, and 11.0 Pilot releases.
Since a germ and OthelloD0 or BasicCedarD0 compatible with Initial can be directly booted from the Ethernet, there is a booting difficulty only when an incompatible germ and boot file are wanted.
In this case, one must presently boot the obsolete Pilot germ by means of the Alto NetExec. From the Alto NetExec, a MesaNetExec corresponding to the Pilot release is booted; then the "Othello" command is issued to the MesaNetExec to boot OthelloD0. The possibilities are:
CedarMesaNetExec6.0 = Rubicon = Cedar
MesaNetExec8.0 to 10.0 = Trinity to Sierra
AlphaMesaNetExec11.0 = Klamath
Beginning with Cedar 5.0, Iago in BasicCedarD0 performs functions equivalent to Othello, which is no longer needed by Cedar users.
For the 96k-word storage boards using 16k RAMs, the MP will show 96K/256 pages/board which is 384 x number of storage boards (i.e., 0768, 1152, 1536, 1920, 2304, or 2688). Some MPs will show this value plus 1 occasionally for unknown reasons.
The Alto emulator has several different entry points, as discussed below. After 0104, on a disk boot, 0118 (GotBreathOfLife) is shown for 0.3 seconds after the first page from the disk boot record has been read successfully. On an Ethernet boot, 0114 (start booting the NetExec) is shown, then 0118 when the breath-of-life has been received from the Ethernet boot server. Finally, on both disk and Ethernet bootstraps, the the number of good pages found by Initial is shown. Then the final microcode overlay is loaded and started at the breath-of-life program’s disk or ether boot address.
An emulator is usually started by Initial, but Lisp and Smalltalk are started directly, preserving the Alto PC and disk partition. When Initial doesn’t immediately precede an emulator, the number of good pages displayed will be correct unless you have run the Audio or Jasmine microcode since you last booted the hardware--these io drivers smash the results left earlier by Initial.
When started at its normal entry point, the Alto emulator will boot the OS from SA4000 partition 1 only if you have not used the keyboard boot button kludge (which is only available with CSL keyboards, not with Star or Tor terminals).
Keyboard-Controlled Booting
The entry point selected in the .Eb file is overruled when you have carried out a keyboard boot with one of the key selections interpreted by Initial. Keyboard control is only available to you only in the following circumstance:
1) You have a CSL microswitch keyboard; Dallas keyboards don’t have a keyboard boot button. Keyboard control is not available when you push the power-on button.
2) You have a UTVFC terminal and the UTVFC task for one of the standard emulators is refreshing the display normally at the instant the keyboard boot button is depressed. (This is necessary because the keystrokes are interpreted by the emulator that is running when the keyboard boot button is depressed, not by Initial.). If the UTVFC task isn’t running, the keyboard boot button will be ignored.
3) The key selection exactly matches one expected by Initial; extra keys depressed may (or may not) invalidate the selection.
Note that you may release the depressed keys as soon as you have released the keyboard boot button. If Initial cannot interpret the keystroke combination, it passes it on unchanged to the default emulator. The Alto emulator, for example, will switch from partition 1 to partition 2 if the "0" key is depressed, but "0" is ignored by Initial.
Since an emulator has to be running to effect keyboard control, you may have to cycle power, if your machine won’t disk boot normally, to force an ether boot and get to the Alto NetExec; then hold down an appropriate keystroke combination and push the keyboard boot button.
Here is a list of interpreted keystroke combinations:
7Storage test boot. Initial’s map and storage tests will repeat until a failure is detected; then the four MP codes describing the failure will be shown in sequence indefinitely.
BSAlto 3mb ether boot.
dAlto partition 1 disk boot.
0Alto partition 2 disk boot. This only works when the Alto emulator is booted by default because this keystroke is interpreted by the Alto emulator, not by Initial.
d pPilot disk (SA4000) boot. Pilot is booted using the Pilot microcode and germ installed on your disk.
pPilot disk (SA4000) boot.
BS pPilot 3mb ether boot.
BS vAlphaMesa Pilot 3mb ether boot.
e pPilot 10 mb ether boot (not implemented yet).
e vAlphaMesa Pilot 10 mb ether boot (not implemented yet).
Microcode Coordination
If you install microcode on your disk, you can be reasonably sure that it won’t change out from under you, but if you ether boot, you will get whatever microcode is currently installed for your type of machine on the nearby boot servers.
Note that all boot servers try to automatically keep their boot files up-to-date. This makes it hard to have two different versions of the microcode available in different geographic locations. It is possible to defeat the automatic update heuristics. See Hal Murray if you really need to know more.
The two microcode systems you might want to install on your disk as initial microcode (not including Lisp and Smalltalk) are:
InitialAltoD0.Eb, the concatenation of Initial.Eb and AltoD0; it will by default load the microstore with Alto and Alto-style Mesa emulators and start the Alto emulator at its normal starting address after running Initial. Various keyboard combinations can be used on a CSL keyboard to overrule the default boot.
InitialPilotD0.Eb, also the concatenation of Initial.Eb and AltoD0, but with a different starting address; it will be default boot Pilot using the pilot microcode and germ installed on the Pilot portion the disk; the pilot microcode can be either PilotD0.Eb or CedarD0.Eb. The Tor system uses a variation of InitialPilotD0 with which I am not very familiar.
Note that different versions of these files exist for 6.0, 8.0, and 11.0 Pilot releases; Cedar uses the Rubicon version.
(Details of installing microcode and such are described in the Othello document for Pilot users, and this is done by a command file for Cedar.)
Normal MP Code Sequences
While you push the power-on button, you should see 8888 in the MP. As soon as you release the power-on button, you should see one of the sequences given below; these are viewable sequences--numbers not left up long enough to see are not mentioned. On a boot initiated by pushing the keyboard boot button, you won’t see 8888.
"[n, n]" below indicates that the sequence repeats for awhile, usually during ether booting. "x" as in "9x0" indicates that the digit represented by the "x" can’t be determined because of flickering.
If your hardware is working properly, the following sequences are reasonable:
[46 60] 700 400 758 104 114 118 NP.3 mb Alto boot
[46 60] 700 400 762 104 114 118 NP.3 mb 5700 Alto boot
40 700 400 720 104 118 NP.Alto disk boot
40 700 400 725 104 NP [910 920] 930 940 9x0 990.6.0 to 10.0 Othello disk boot
40 700 400 742 760 104 NP 900 928 [910 920] 940 9x0 990.11.0 Othello 3 mb boot
40 700 400 742 744 104 NP 900 928 [910 920] 940 9x0 990.11.0 AlphaMesa Othello 3 mb boot
40 700 400 725 104 NP [910 920] 930 940 9x0 990.Cedar BasicCedarD0 disk boot
40 700 400 742 744 104 NP 900 928 [910 920] 940 9x0 990.Cedar BasicCedarD0 3 mb boot
If your hardware is working properly, but your disk is not yet up to speed, you get the 3 mb Alto boot sequence given above. In this case, if your boot server is busy, "46, 60" may repeat several times before continuing with the rest of the sequence. 700 is up barely long enough to see.
If you have just powered up your machine, it might get a 22 or 24. I am also suspicious that a microcode bug in the boot microcode may be causing 84 (unexpected wakeup from task 16b). After a second or so, your Dolphin will automatically reboot, but this time 46 will stay in the MP for a minute or so. If you get tired of waiting, poke the button again.
40 is from Boot, 700 to NP from Initial, 900 to 930 from the Pilot germ, the rest from OthelloD0. If you are booting Tajo directly, the numbers after 930 may be different.
On Pilot ether booting, 742 is shown while Pilot microcode is being read into storage from the 3mb Ethernet, 760 or 744 while reading the Pilot germ; these numbers are 740 + 2 x low four bits of boot file number. 928, which means "waiting for boot server to respond," may or may not be visible.
If Your Machine Won’t Boot...
If your machine won’t boot or boots very slowly, it is important to go through the following check list:
1) If you have just powered up, make sure your display is turned on--the emulator won’t run if the display is powered off, and you obviously won’t be able to see anything.
2) If you have a CSL keyboard, sometimes the keyboard microcomputer will power-on in a bad state; if this happens, you can have all kinds of trash happen on the backchannel--erroneous keystrokes, mouse button clicks, and mouse movement. To fix this, push the keyboard boot button.
3) Otherwise, watch the MP while the problem is happening. The detailed sequence of numbers may indicate what is going wrong. You will have to get your head down low to observe the numbers on the MP reliably; people have frequently reported numbers to me with 1’s translated into 7’s, and some other observation errors are occasionally made.
In the following pages, a # indicates a final MP code. The machine will hang with this number in the MP until you boot again. All other MP codes are either errors that will be retried automatically or simple indications of progress.
MP Codes from Hardware
8888Lamp test (shown while the start or power-on button is down). It can also occur when the microcode is repeatedly resetting the MP and counting up to a number < 1000. An "8" is what appears in a MP digit when all of it is illuminated, and this happens when the MP is counting through all states of the digit faster than the eye can follow.
8808#RM parity error. The MP freezes with this code, indicating (??) a failure during the hardware boot.
8880#IMX or control store parity error. The MP freezes with this code indicating (??) a failure of the hardware boot.
0888#This can occur when the microcode is repeatedly resetting the MP and counting up to a number < 1000 faster than the eye can follow.
0088#This can occur when the microcode is repeatedly resetting the MP and counting up to a number < 100 faster than the eye can follow.
0008#This can occur when the microcode is repeatedly resetting the MP and counting up to a number < 10 faster than the eye can follow.
MP Codes from Rev L EPROMs
0000One of the first instructions in the EPROM clears the MP; seeing this number means that the ALU all-zeroes, all-ones, or GoTo tests failed.
0001The ALU all-zeroes, all-ones, and GoTo tests passed. You should never see this since some other error should happen or the MP should change to 0040 when RDC booting starts.
0002Midas boot.
0004 to 0015
One of the preliminary ALU tests failed. See Boot.mc for the details. This MP code list may be buggy in this area.
0004XOR failure using 0 constant.
0005XOR failure using 0 from T.
0006ALU failure using 125b constant.
0007ALU failure using 125b from T.
0008ALU failure using 252b constant.
0009ALU failure using 252b from T.
0010ALU failure using 125000b constant.
0011ALU failure using 125000b from T.
0012ALU failure using 52400b constant.
0013ALU failure using 52400b from T.
0014ALU failure using 177400b constant.
0015ALU failure using 177400b from T.
0016Mismatch after write then read of an RM register via the stack.
0017The contents of an RM register have changed.
0018Register read and compare error using the Stack.
0020 to 0035A fault happened. The MP contains 20 plus the contents of the Parity register. A fault in the fault handler will reboot the machine, so you may not get to see these codes. The value shown is 20d + (1 if memory error) + (2 if RM parity error) + (4 if control store parity error) + (8 if stack overflow or underflow).
0020A Breakpoint microinstruction (i.e., one containing the SetFault function) was executed.
0021Memory error. Since the EPROM code doesn’t touch the map or storage, this is probably an H4 Parity error.
0022RM register parity error.
0024Control Store parity error.
0022 and 0024 are to be expected if you have just powered up your machine. (The bias on the RAM chips hasn’t been pumped up yet.) This will invoke a one minute delay to avoid hogging the boot servers when something is broken. Poke the button again if you want faster service.
0028Stack error.
0040Starting to load microcode from RDC.
0041Can’t find RDC. (Will now try to ether boot)
0042SA4000 disk read error.
0043SA4000 seek timed out.
0044SA4000 disk checksum failure.
0045SA4000 bad Control Store address--attempt to load into EPROM area.
0046SA4000 disk not ready. (Will now try to ether boot)
0047The label word which should contain a link to the next page of microcode to be loaded has an invalid disk address.
Most disk errors (0042, 0043, 0044, 0045, 0047) can be caused by simple transient read problems. The RDC task simply retries all of them while the emulator task is counting down a timer. If the timer runs out, you will see 0048.
0048Didn’t load microcode from RDC within 1 second. (Will now try to ether boot)
0060Trying to load microcode via Ethernet.
0061#Can’t find Ethernet board.
0062Bad Ethernet checksum while reading microcode.
0063Bad Control Store address--attempt to load into EPROM area.
0064Hardware error (bad status) at end of packet.
0065#Timeout after 15 tries at about 10 seconds each.
If ether booting doesn’t work, the MP will slowly alternate between 006x and 004x so that you can see both what was wrong with the disk and what is wrong with the Ethernet. If the ether boot eventually times out, you will see 0065 alternating with the bad disk code.
0070 to 0085An unnexpected wakeup happened. The MP contains 70 plus the number of the offending task.
0071Unexpected wakeup for task 1.
0072Unexpected wakeup for task 2.
0073Unexpected wakeup for task 3.
0074Unexpected wakeup for task 4. (RDC or Ethernet output.)
0075Unexpected wakeup for task 5.
0076Unexpected wakeup for task 6.
0077Unexpected wakeup for task 7.
0078Unexpected wakeup for task 10B. (Ethernet input.)
0079Unexpected wakeup for task 11B.
0080Unexpected wakeup for task 12B.
0081Unexpected wakeup for task 13B.
0082Unexpected wakeup for task 14B.
0083Unexpected wakeup for task 15B.
0084Unexpected wakeup for task 16B (timer task).
0085Unexpected wakeup for task 17B (fault task).
If you get one of these, one of the IO controllers is probably broken. For example, its reset logic is not working, or the wakeup logic on the RDC or Ethernet board is generating the wrong task number.
0104Loading SCB Initial from floppy disk--don’t confuse this MP code with StartDeviceInit (also 0104) from emulators; the preceding sequence of MP codes disambiguates them.
MP Codes from Initial
Fault handler MP codes in the range 100 to 255 may also happen when running the new Initial; these are in a different table because they are common to Initial, Pilot, and Alto.
0700Start map test.
0702#Bad map.
0400Start storage test.
0401 to 0655Failures detected on one or more boards. MP shows for about 1.0 seconds:
0400
+ 1 if storage board 1 had failures
+ 2 if board 2
+ 4 if board 3
+ 8 if board 4
+ 16 if board 5
+ 32 if board 6
+ 64 if board 7
+128 if board 8.
The above is the interpretation for the 96k-word storage boards; +1 actually means ’the first 128k words’; +2 ’the second 128k words’; etc.
NHardBadCount of hard bad pages (ones with uncorrectable failures), where pages are 256 16-bit words; shown after 0401-0655.
NSoftBadCount of soft bad pages (only correctable failures); shown after NHardBad.
BlkSyndromeA four character octal number in which the first character is the value of Blk[0:1] and the last three characters are the Syndrome of the last storage failure detected by the error correction hardware; shown after NSoftBad. A translation table from values of BlkSyndrome to chip positions on the storage boards is given in a table later.
0701#Not enough memory. (Initial requires 64K words.)
0720Starting to load emulator microcode from disk.
0721#No RDC.
0722#RDC Read error.
0725Starting to boot Pilot from disk.
0730 to 0736 appear during floppy disk booting which only happens on Tor and ESS configurations, which have different Initial microcode.
0730Start floppy disk loader.
0731#Mesa never ready.
0732#Illegal for floppy disk booting.
0733#Bad floppy disk status.
0734#Execute input channel transfer not found.
0735#Loader ended.
0736#Invalid FIFO type.
0741#Can’t find UTVFC or UIB.
When Initial starts ether booting, it puts 740+2*low four bits of the boot file number. If EtherLoad can’t load that file within a reasonable length of time it will give up and bump the MP by one.
0742Trying to load PilotD0.eb or AlphaMesaPilotD0.eb from the 3 mb Ethernet.
0743#Timeout trying to load PilotD0.eb or AlphaMesaPilotD0.eb.
0744Trying to load D0.eg or AlphaMesaD0.eg from the 3 mb Ethernet.
0745#Timeout trying to load D0.eg or AlphaMesaD0.eg.
0746Trying to load PilotTor.eb from the 3 mb Ethernet.
0747#Timeout trying to load PilotTor.eb.
0758Trying to load AltoD0.Eb from the Ethernet.
0759#Timeout trying to load AltoD0.Eb.
0762Trying to load Alto5700.Eb from the Ethernet.
0763#Timeout trying to load Alto5700.Eb.
0809#Pilot Microcode not installed on SA4000. (Try to ether boot.)
0810#Germ not installed on SA4000. (Try to ether boot.)
0811#Physical Boot Volume not set on SA4000. (Try to ether boot.)
0812#Label check from SA4000. (Try to ether boot.)
Syndrome Translation Table for 256k Storage Boards using 64k RAMs
Initial records Blk and Syndrome for the last error correction fault of its storage test and shows these as a single octal number with Blk.0 and Blk.1 displayed in the first character and Syndrome in the last three characters. If the Syndrome is not in the table below, then it is not a single error, so no single chip can be identified. Blk.0 and Blk.1 are irrelevant for the 256k storage boards.
Syn-Bad BitSynBad Bit
drome0-63dLocationdrome0-63dLocation
(octal) (c.0 to c.7)(octal)(c.0 to c.7)
1c.7U9717230U70
2c.6U10717446U60
4c.5U11717762U51
70U155200c.0U160
10c.4U1272061U111
1316U14821217U101
1532U13821433U91
1648U12121749U81
20c.3U1262279U158
268U8423325U151
3224U6723541U141
3440U5723657U125
3756U482475U92
40c.2U14325321U66
464U11325537U56
5220U12225653U47
5436U11226613U116
5752U10227229U106
6712U15927445U96
7328U15227761U86
7544U1423073U156
7660U13231319U149
100c.1U15331535U139
1062U8231651U123
11218U6832611U95
11434U5833227U69
11750U4933443U59
12710U12433759U50
13326U1143467U93
13542U11535223U83
13658U10535439U104
1476U15735755U94
15322U15036715U87
15538U14037331U71
15654U10337547U61
16614U8537663U52
Syndrome Translation Table for 96k Storage Boards using 16k RAMs
Initial records Blk and Syndrome for the last error correction fault of its storage test and shows these as a single octal number with Blk.0 and Blk.1 displayed in the first character and Syndrome in the last three characters. If the Syndrome is not in the table below, then it is not a single error, so no single chip can be identified.
Syn-Bad BitBlkBlkBlk
drome0-63d)=0=1=2
(octal) (c.0 to c.7)
1c.7U082U143U143
2c.6U104U165U165
4c.5U103U164U164
70U003U003U065
10c.4U081U142U142
1316U005U005U067
1532U065U126U126
1648U067U128U128
20c.3U020U020U082
268U011U011U073
3224U013U013U075
3440U073U134U134
3756U075U136U136
40c.2U042U042U104
464U007U007U069
5220U009U009U071
5436U069U130U130
5752U071U132U132
6712U015U015U077
7328U017U017U079
7544U077U138U138
7660U079U140U140
100c.1U041U041U103
1062U026U026U088
11218U028U028U090
11434U088U149U149
11750U090U151U151
12710U034U034U096
13326U036U036U098
13542U096U157U157
13658U098U159U159
1476U030U030U092
15322U032U032U094
15538U092U153U153
15654U094U155U155
Syn-Bad BitBlkBlkBlk
drome0-63d)=0=1=2
(octal) (c.0 to c.7)
16614U038U038U100
17230U040U040U102
17446U100U161U161
17762U102U163U163
200c.0U019U019U081
2061U025U025U087
21217U027U027U089
21433U087U148U148
21749U089U150U150
2279U033U033U095
23325U035U035U097
23541U095U156U156
23657U097U158U158
2475U029U029U091
25321U031U031U093
25537U091U152U152
25653U093U154U154
26613U037U037U099
27229U039U039U101
27445U099U160U160
27761U101U162U162
3073U004U004U066
31319U006U006U068
31535U066U127U127
31651U068U129U129
32611U012U012U074
33227U014U014U076
33443U074U135U135
33759U076U137U137
3467U008U008U070
35223U010U010U072
35439U070U131U131
35755U072U133U133
36715U016U016U078
37331U018U018U080
37547U078U139U139
37663U080U141U141
MP Codes from Cedar and Pilot Microcode
Fault handler MP codes in the range 100 to 255 may also happen when running Pilot; these are in a different table because they are common to Initial, Pilot, and Alto.
0104Start device initialization.
NPagesNumber of ’good’ pages found by Initial, shown briefly after initializing devices and just before loading the final microcode overlay. This number will be garbage unless you ran the new Initial the last time you booted, and it will be garbage if you ran Jasmine or Audio microcode since last booting.
MP Codes From Alto Emulator Microcode
Fault handler MP codes in the range 100 to 255 may also happen when running Alto; these are in a different table because they are common to Initial, Pilot, and Alto.
Assembly switches allow the Alto emulator to be assembled for direct startup from Midas with no microcode overlays and no requirement for running Initial first. The 0100 MP code only happens in this debugging configuration, which must do its own map and storage test and initialization.
0100Start map and storage test and initialization.
0101#Not enough memory. (You need 512 pages or 128K words--more than the 64k words required by Initial.)
0104Start device initialization.
0105Started UTVFC initialization (invisible).
0106Finished loaded or flushing CSL keyboard overlay (invisible--only happens on systems with CSL keyboard overlays; debugging systems and some Lisp and Smalltalk systems don’t have such an overlay).
0107Finished display initialization (invisible--this will be 0106 with no CSL keyboard overlay).
0110Started disk boot (invisible).
0111#Timeout waiting for disk status.
0112#Hardware error reading disk (after 10 retries).
0114Start booting the NetExec.
0118Breath-of-life read successfully from disk or Ethernet.
NPagesThe number of ’good’ 256-word pages determined by Initial is put in the MP just before the final microcode overlay overwrites initialization; it normally remains in the MP until you boot or crash. The Alto emulator starts immediately after loading the final overlay.
MP Codes from Pilot Germ and Othello
Fault handler MP codes in the range 100 to 255 may also happen when running Pilot; these are in a different table because they are common to Initial, Pilot, and Alto. I think these MP codes are the same for both Rubicon and Trinity Pilot releases.
0900Germ started loading boot file.
0901#Germ allocation trap--out of frames.
0902#Control fault. This might be an unexpected KFCB or trap, or it might indicate that the germ was put together incorrectly. 902 will be followed by a 999 sequence, as discussed below.
0903#Start trap--attempt to start an already started module (Pilot bug).
0904#Germ page or write protect fault (Pilot bug).
0905#Germ incompatible with initial microcode.
0906#Germ and running Pilot have different version numbers.
0907#Germ reschedule error [A reschedule kfcb trap occurs when the reschedule microcode is entered with interrupts disabled, or while the requeuing microcode or timeout scan microcode is running. This might be caused by a page or write protect fault, by failing to acquire a monitor lock, or by waiting on a condition variable. Since the germ disables interrupts until it is nearly done with its inload, page or write protect faults by the germ will cause this MP code.]
0909#Germ SIGNAL or ERROR (Pilot bug).
0910Germ action running (e.g., inLoad or outLoad). 910 normally alternates with 920 while reading a boot file from the disk or Ethernet, and 920 is visible most of the time.
0911#Germ and physical volume have incompatible version numbers.
0912#Germ and boot file have incompatible version numbers.
0913#No physical boot file installed.
0914#Boot file contains invalid data (e.g., bad checksum on ether boot).
0915#Running TeleDebug server (frequently means couldn’t get to CoPilot or no CoPilot on disk).
0916#Boot file won’t fit in storage.
0917Talking to Ethernet debugger.
0919Germ has returned to caller who has hung.
0920Germ disk, floppy disk, or Ethernet driver running. 920 normally alternates with 910 while reading a boot file; ether booting takes awhile, so be patient as long as the 920 flickers occasionally.
0921#Hard error on device being booted.
0922#Timeout of boot.
0923#Broken link in chained boot file. [Something about your germ or boot files is wrong. First execute the Othello Set Physical Volume Boot Files command; if that fails to cure the problem, try refetching and reinstalling germ and boot files.]
0924#No response from any boot server to Germ’s request for an Ether boot file.
0925Germ bad packet--unexpected packet sequence number or size.
0926Germ trying to find a Pup/3 mb Ethernet 8-bit address.
0927#Boot file ends prematurely (Try reinstalling).
0928Waiting for any boot server to respond.
0930Initializing Pilot Control and Mesa Runtime. This is the first thing that happens after the germ gives control to the Pilot boot file. On 10.0 and earlier releases, this MP code was visible long enough to see, but it seems not to be seen with the 11.0 germ and Othello.
0931#Pilot and StartPilot have incompatible version numbers.
0932#Trap before the trap handler has been set up (Pilot bug).
0933#Pilot and Germ have incompatible version numbers.
0934#Boot file’s StartList contains bad data (Pilot bug).
0935#Can’t get to the debugger or too early in startup to find debugger. Try using Othello’s Set Debugger Pointers command.
0936#Swap to debugger canceled because of H key switch. Waiting for microcode debugger.
0937Attempting to set the time from either the hardware clock or the Ethernet; sometimes get this MP code when you are trying to go to CoPilot but failed to boot CoPilot before booting your Client volume.
0938#Running cleanup procedures, normally before a world swap.
0939#System.PowerOff called, and no power control relay.
0940Initializing Pilot Store component.
0946#System logical volume needs scavenging [riskyRepair].
0947Drive not ready. (Waiting for it to become ready.)
0948#System physical volume needs scavenging.
0949#Disk hardware error while scavenging system volume.
0950Scavenging logical volume (wait awhile).
0951Alternate feedback for progress during a pass of logical volume scavenging.
0952Alternate feedback for additional passes during logical volume scavenging.
0953#Debugger pointers have been set to a non-existent volume or to a volume without an installed debugger.
0960Deleting temporary files from previous run.
0965#Insufficient file space for data space backing storage (specify smaller size with boot switch).
0970Client and other non-bootloaded code being mapped.
0980Initializing Pilot Communication component.
0981Trying to find a Pup/3 mb 8-bit address.
0990PilotClient.Run has been called.
0990 is what you see after Pilot has finished initialization.
999Following a 902, 907, and some other codes, the germ shows 999 followed by a sequence of global frame addresses and byte PC’s for local frames going backwards from the current local frame to the top level. Each number is shown for several seconds; the whole sequence ends with 0 and then repeats from the MP code which preceded 999. Each global frame and PC is shown as four numbers, like the following:
0GGG, 0GGG, 0PPP, 0PPP
You must concatenate the 6 G’s and the 6 P’s to get the complete decimal address of the global frame and byte PC.
MP Codes from Cedar Germ and BasicCedarD0
Fault handler MP codes in the range 100 to 255 may also happen when running Cedar; these are in a different table because they are common to Initial, Pilot, Cedar, and Alto.
0810Germ entered or reentered.
0811Writing outLoad file.
0812Loading boot file or outLoad file.
0813Preserving/restoring vacant VM map entries.
0814Germ transferred control back to client (who has hung).
0815Teledebug server running.
0817Unexpected trap or kernel function call (kfcb).
0818Attempt to start an already started module.
0821Unnamed ERROR.
0822Germ and physical volum have incompatible versions.
0823Germ and boot file have incompatible version numbers.
0824No physical boot file installed.
0825Hard error on device being booted.
0826Operation on boot device not completed in expected time.
0827Broken link in chained boot file. First try seting the physical volume boot file; if that fails to correct the problem, try reinstalling germ and boot files.
0828No response to germ’s request for ether boot file.
0829Packet received from boot server with unexpected sequence number or size.
Debugger-related codes (0830-0839)
0830Cleaning up.
0831Hanging.
0832Can’t world swap.
Initialization codes (0840-?)
0840Initialization started.
0841Incorrect machine type.
0842Incorrect microcode (release date from VERSION opcode not recent enough?).
0845Mesa runtime initialized.
0850VM initialized.
0855Storage initialized.
0860File initialized.
0861Ethernet2 trying to find its Pup address.
0862Ethernet host different at DeviceCleanup.TurnOn.
0865Communication initialized.
0870FS Initialized
Fault MP Codes from Initial, Cedar, Trinity Pilot, and Alto
The old Initial has different fault MP codes.
115Unexpected Ethernet output task wakeup (Alto only).
116No state vector error indicating process code inconsistency (Pilot only); rumor is that this happens when a process that shouldn’t page fault does.
117Two MC2 errors; both MC2A and MC2B pipes indicate errors; since LogSE is illegal, this might mean two consecutive references experienced uncorrectable storage failures. Alternatively, this been caused by violating the Output-Output-PStore4 Gotcha (microcode bug).
119Stack overflow or underflow. For Pilot and Cedar, this can only happen during microcode initialization because stack errors are subsequently passed through to software. For the Alto emulator, this always crashes.
0120 to 0135RM or CS parity error, possibly in combination with other errors. MP code is 120 plus:
1 if MC1 or MC2 error;
2 if RM parity error;
4 if CS parity error;
8 if stack overflow or underflow.
Many codes show a multiple of 20d + a task number, where the task assignments are given below. The ’pipe task’ is the one issuing the reference causing an error--this can determined from the error pipe; the ’current task’ is the one running when the fault aborted execution.
Alto task assignments:SDD Pilot task assignments
0emulator 0emulator
1unused 1unused
2unused 2unused
3unused 3unused
4unused 41st MIOC or 2nd 3mb Ethernet
output
5color display 52nd 3mb input or 3rd 10 mb
Ethernet
63 mb Ethernet output 6color display, 2nd MIOC, or 2nd
10mb Ethernet
73 mb Ethernet input 71st 10mb Ethernet
8RDC (SA4000 controller) 81st 3mb Ethernet output
9unused 91st 3mb Ethernet input
10UTVFC (display controller)10RDC (SA4000 controller)
11unused11unused
12unused12 UTVFC or UIB terminal controller
13unused13unused
14timers14 timers
15fault task15 fault task
The following codes imply no RM or CS parity error.
0136MC12 error occurred but none of the reasons for it is indicated; i.e., neither H4PE, MOB, MC1A, MC1B, MC2A, nor MC2B errors are true. Conceivably, this can be caused by an MC1 fault on the reference following a PFetch4, if the PFetch4 experiences error correction. Due to a hardware bug, the fault isn’t started soon enough in this case, so an extra non-fault-task instruction is executed. If the extra instruction is a reference it wipes out the MC1ErA and MC1ErB indicators.
0137MX (monitor exit) or MW (monitor exit and wait) opcode executed but the monitor referred to was already unlocked [Pilot only].
It is better to show the pipe task rather than the current task for H4PE and MOB errors, but for H4PE’s the hardware doesn’t indicate whether pipe A or pipe B was involved, and I couldn’t find out whether or not the pipe is indicated correctly for MOB’s.
140 to 155Map out of bounds indicating virtual address greater than 22d bits. Code shown is 140 + current task. All MOB faults are passed to software when running Pilot.
160 to 175H4 parity error indicating bad parity on the processor bus used by Input, IOStore4, and IOStore16 references. Code shown is 160 + current task. This can never happen at present since these errors are ignored for all tasks by all microcode systems. However, Initial has an entry point which causes any emulator to treat H4PE’s as a crash condition, and if this entry point were used, these errors would crash.
180 to 195Some fault when the preceding instruction contains a LoadPage and the fault handler decides to continue execution. This indicates a microcode bug and should be reported. For Pilot, this cannot happen for the emulator; for Alto it is never supposed to happen.
200 to 215MC2 crash, indicating correctable storage failure of PFetch1, 2, or 4 with LogSE true in the map entry or an uncorrectable storage failure on any reference. The code shown is 200 plus the pipe task. Since LogSE is presently illegal, this code should indicate an uncorrectable storage failure; some microcode bugs may cause it (e.g., Output-Output-PStore4 Gotcha).
220 to 235MC1 crash, indicating a page or write protect violation. The code shown is 220 plus the pipe task. Pilot only reports MC1 crashes on io task references; emulator MC1 faults are passed to software (except that MC1 faults when executing the LoadRAM-and-jump opcode will crash). MC1 faults always crash for Alto.
240 to 255SetFault or Breakpoint crash. The code shown is 240 plus the current task. This is used by the microcode in a few places when impossible conditions are detected; for unused tasks, it represents an unexpected wakeup.