:TITLE[MemInit]; *Map and storage diagnostic and initialization %Ed Fiala 10 January 1983: PNIP display of Blk.0 and Blk.1 plus the relevant 8-bit syndrome as a single number when failures are detected during the storage test; moved many mi from InitialPage1 to opPage2 to make room. Parameterize refresh timer period. Optionally loop the storage test until an error is detected, and then loop the error MP codes; on CSL keyboards, this is a keyboard boot option. Ed Fiala 27 October 1982: Fix bug at imNoPage+2 causing PageCount to become 0 when 2^22 words of real storage are used. Ed Fiala 24 June 1982: Fix bug in imMarkSector causing wrong page report on MP. Ed Fiala 23 April 1982: Add sophistication to storage enumeration algorithm to better distinguish hard error pages from non-existent pages. Add wait after write sweeps to allow slowed refresh to provoke storage failures. Set refresh timer to standard emulator value after storage testing. Create by Ed Fiala 22 January 1982: Rewrite program to improve testing and speed. Earlier version and changes were by Johnsson, Frandeen, and Henning. Problems: 1) Cannot examine VM with Midas because FFault<0. 2) Need tests for PFetch1/2 and PStore1/2 with/without DF2 addressing, task=0 and task#0. % Loca[MC2ErrRet0,InitialPage1,17]; Loca[MC2ErrRet1,InitialPage1,16]; Loca[imBST,InitialPage1,20]; Loca[imFST,InitialPage1,40]; SetTask[0]; *All RM registers except BootType appear to be available here, *except that RTemp and RTemp1 (RM 52-53) are smashed by PNIP. RV[BootType,20]; *Must agree with identical def in Initial.mc RV[StorageFaults,21]; *Number of quadwords with correctable *failures detected by the diagnostic in pages *which had no uncorrectable failures. RV2[XMAdLo,XMAdHi,22]; *Base register RV4[XMBuf0,XMBuf1,XMBuf2,XMBuf3,24]; *Buffer for XMap RV4[RBuf0,RBuf1,RBuf2,RBuf3,24]; *Buffer for PFetch4 RV4[WBuf0,WBuf1,WBuf2,WBuf3,30]; *Buffer for PStore4 RV[RLink0,34]; *Subroutine return link RV[MapAddr,35]; *Current map location RV[RealPage,36]; *Current real storage page RV[PageCount,37]; *Count of 'good' pages; ***Location known to *old versions of Alto emulator. RV[Transient,40]; *Holds soft error count during storage test. RV[ZPage,41]; *Virtual page being tested during sweep RV[ZWord,42]; *Word within page being tested during sweep RV[RLink1,43]; *MC2 fault return link RV[RLink2,44]; *Another subroutine return link RV[SoftQThreshold,45]; RV[SoftBadPages,46]; RV[HardBadPages,47]; %If there are more than EnoughGoodPages after the first pass of the storage test, then a second pass is not made, even if there are a large number of soft bad pages. In this case, no questionable pages are used. % MC[EnoughGoodPages,1000]; *= 128k words %During the first pass of the storage test, no pages with soft failures are put in service. If there are 'too many' such pages and some other criteria are met, a second less conservative pass of the storage test is made in which pages with up to BadQThreshold correctable errors on any one test are put into service. BadQThreshold .eq. 1 allows pages with a single correctable failure to be used but discards pages with more failures; .ge. 100b allows the page to be used regardless. % MC[BadQThreshold,100]; OnPage[InitialPage]; IMap: LoadPage[InitialPage1]; MapAddr _ 140000C, GoToP[.+1]; %First, test the map; any failure results in the BadMap MP code and termination. The test verifies that each bit can assume both '1' and '0' values and that there are no stuck address drivers nor on-chip addressing problems. Timing ~ 0.12 seconds. First, zero the map by writing consecutive addresses from 0 to 37777b. The XMap in the imWriteMap subroutine writes the map from XMBuf0 and reads the old contents into XMBuf1-3. % OnPage[InitialPage1]; imMapTest: XMBuf0 _ 0C, Call[imWriteMap]; *Loop here. MapAddr _ (MapAddr) + 1; T _ (MapAddr) and (37400C), GoTo[imWriteMap1,ALU<0]; %Now write -1 and verify old value .eq. 0 using forward sweep of all map addresses. This will find not only data bits that can't assume the 0 state but also stuck address drivers or internal map RAM addressing failures. % T _ MapAddr _ 140000C; imOMapLoop: XMBuf0 _ (XMBuf0) or not (0C), Call[imWriteMap]; LU _ T; MapAddr _ (MapAddr) + 1, Skip[ALU=0]; LoadPage[InitialPage], GoTo[imMapFail]; GoTo[imOMapLoop,ALU<0]; *Now write 0 and verify old value .eq. -1 using forward sweep. T _ MapAddr _ 140000C; imZMapCheckLoop: XMBuf0 _ 0C, Call[imWriteMap]; LU _ XMBuf1; MapAddr _ (MapAddr) + 1, Skip[ALU=0]; LoadPage[InitialPage], GoTo[imMapFail];; DblGoTo[imZMapCheckLoop,imStorageTest,ALU<0]; imWriteMap1a: XMAdHi _ T, LoadPage[opPage2], GoTo[imWriteMap2]; imWriteMapT: MapAddr _ T; *imWriteMap writes the data in XMBuf0 into map location MapAddr *and returns the old value in T, complement of old value in XMBuf1. imWriteMap: T _ (MapAddr) and (37400C); imWriteMap1: XMAdHi _ T, LoadPage[opPage2]; imWriteMap2: T _ LSh[MapAddr,10], GoToP[.+1]; OnPage[opPage2]; XMAdLo _ T; XMap[XMAdLo,XMBuf0,0]; T _ LSh[XMBuf3,10]; XMBuf1 _ (RHMask[XMBuf1]) or T; T _ (XMBuf1) xnor (0C), Return; OnPage[InitialPage1]; imMapFail: T _ BadMap, GoToP[InitFail]; *Some map entry was bad %The storage test has the following goals: 1) To find and use all the "good" storage pages. 2) To execute as fast as possible. Suppose that a user boots his machine 3 times/day, 240 times/year and that the test takes 10 sec. Then he waits 7200 sec/year = 2 hr/year--already too long. With 4 perfect storage boards (384k words), this diagnostic completes in less than 2 seconds, which is satisfactory. Our experience with a large sample of 96k storage boards (120d boards) for a period in excess of one year was that less than 0.05 16k RAMs per storage board per year were replaced, after the board was checked out and found to be working. Also, we have not observed either pattern sensitive or intermittent failures. Our experience with 64k RAMs on another computer has also been good. Also, the chance of an uncorrectable failure happening if this diagnostic misses a failure seems to be remote, so the diagnostic and failure handling objectives discussed below may be much less important than fast execution of this test. Bearing this fact in mind... 3) To report information to AMesa or Pilot or directly to the user via the maintenance panel so that the user and/or system maintainer will find out that the machine does or doesn't need service, even if the failure can be bypassed or ignored at present. Proposed service strategy is to replace bad boards at the customer's site and then diagnose and repair the failure back at the shop; so it would also be convenient to report the board number, so the service man wouldn't have to run diagnostics before replacing the board. If enough information to identify the failing RAM on the board could be reported, that would be better yet. 4) To detect as many storage board failures as possible. The following are some that we believe are more likely than others: a) Single-bit failure on a storage RAM; i.e., one bit of the 16k in a RAM stuck at 0, stuck at 1, or flakey. b) Row addressing failure that will affect all or most of the SqRt(16k) consecutive bits (= 2 consecutive pages) in a RAM by mapping writes or reads addressed to one row into another row. c) Column addressing failure that will affect all or most of the SqRt(16k) bits in a RAM at addresses spaced SqRt(16k) bits apart (= 1 bit in every other one of the 128 pages spanned by the RAM); this would map writes or reads addressed to one column into another. d) Bad sockets, bent pins, broken wires, etc. that cause all the bits in one or more RAMs to malfunction; the storage boards being built now have the RAMs soldered into the board, so there are no sockets. Also, bent pins are not a significant problem because the holes in the board are bigger than the socket holes, so pin bending is much less likely; and, even if a bent pin happens, the solder is likely to make good contact. e) Address driver on a storage board stuck at 0 or 1 or flakey; this will affect many RAMs and cause uncorrectable failures. f) Pattern sensitive failures within a RAM. g) Failures due to excessive leakage (cell not holding charge). Although (a) and (d) failures could be found by isolated testing of individual quadwords, (b), (c), (e), (f), and (g) are best tested by sweeping through storage writing at all addresses, then sweeping through again reading at all addresses as follows: One sweep initializes storage to a known value; a second sweep first checks the value at each address, then writes the 1's complement back into that quadword. On-chip addressing failures or stuck address drivers manifest as a comparison failure later in the sweep. Data patterns should ensure that each bit in each RAM (including check bits) is checked at both '1' and '0' values during sweeps in which the complement value is being written in each quadword after reading; the following pattern sequence does this: 0, 0, 0, 0; chkbits 0 100000b, 100000b, 100000b, 1; chkbits 177b 0, 0, 0, 0; chkbits 0, data 0 177777b, 177777b, 177777b, 177777b; data -1 0, 0, 0, 0; data 0 In the event of an on-chip addressing failure or an address driver failure, it is necessary to sweep in both forward and backward directions to detect failures at both the high and low affected addresses. In other words, two different addresses (call them X and Y with Y .gr. X) will appear to be written when either one of them is addressed; during a forward sweep, X will be read correctly, but because the complement value is written into X after reading Y will be read as the complement value, and the failure will be discovered. However, the failure at X is not discovered until a sweep in the reverse direction occurs. To detect failures due to excessive leakage, the timer task in Initial.Mc carries out refresh 50 percent slower than the emulators. Here, pause longer than the refresh period after a write sweep to allow leaky cells to lose charge. Pages in which uncorrectable errors are detected are, of course, discarded, but if the only failures detected in a page are correctable, then MemInit has to decide whether to use the storage or discard it. The following are some considerations that affect this decision: 1) If such a page is discarded, less storage will be available to the program; a column failure, for example, would affect half the pages in a RAM or 64 pages, so the loss of storage might have significant performance implications. Also, if discarding results in storage declining below MinGoodPages (.eq. 64k words), then Initial will crash. 2) If a page is used despite correctable failures, references that invoke error correction will be slowed significantly; one benchmark running on a system in which one RAM was removed from the top storage board without discarding the affected storage was 7 percent slower than with perfect storage (However, except with astronomical amounts of storage, Pilot would be slowed more than 7 percent by discarding the 128 pages affected by the RAM removal.). 3) If another failure coincides with one ignored by MemInit, then an uncorrectable failure will cause a crash. If this diagnostic were a perfect test (i.e., no bad RAMs go undiscovered) and no fluke failures occur, then an uncorrectable error caused by coincidence of a bad RAM ignored by MemInit with a RAM which fails under normal operation is unlikely. To show this, suppose that only (a), (b), and (c) solid failures occur and that 0.8 are single, 0.1 row, and 0.1 column; and suppose 0.05 failures/storage board/year on the average. Then, if a single failure is ignored, only 1/16k of future single failures and 1/128 of future row or column failures on the same storage board will coincide, which happens once every 12000 years/storage board; if a row failure is ignored, only 1/128 of future row failures will coincide, but every column failure will coincide, so a coincident failure occurs about every 180 years/storage board. This means that any major risk of uncorrectable errors must be the result of imperfect testing, much higher than expected failure rate, or fluke failures. With 256k RAMs, fluke failures in good RAMs due to alpha particle collisions are theoretically likely, but these don't happen in 16k RAMs; not sure about 64k RAMs. Intermittent or pattern sensitive failures might make perfect testing difficult, but we don't seem to have had significant problems in this area either. This means that, except possibly with 256k RAMs, we are unlikely to suffer uncorrectable failures as a result of missing a bad RAM during testing or going ahead and using a page despite a correctable error in the page. Based on all these objectives the algorithm used here is as follows: a) To run as fast as possible, the forward sweeps are done only when one or more bad bits are detected by the reverse sweeps. b) Initially, pages with any bad bits are eliminated, and a count of these is kept. c) After testing, if 'too many' pages were eliminated, the entire storage test is repeated eliminating only pages affected by uncorrectable failures. d) If any failures were detected, the board number is shown on the MP and the number of pages affected by both uncorrectable and correctable failures. % %Initially, make any page with one or more bad bits a bad page; if there are too many such pages, repeat less conservatively. % imStorageTest: RLink0 _ HiA[TestingStorage]; SoftBadPages _ T _ 0C; HardBadPages _ T, LoadPage[0]; T _ (RLink0) or (LoA[TestingStorage]), CallP[PNIP]; SoftQThreshold _ 0C; imRepeatStorageTest: StorageFaults _ 0C; %Now write the 1st quadword in each 256-word real page with the page number and some constants using map entry 0; sweep from page 7777b down to page 0. Storage boards using 16k RAMs span 128k words but implement only 96k words of storage--does attempting to write into the 32k-word 'hole' clobber some lower word? If so, the reverse scan is needed to write the lower address correctly after it is clobbered by the write into the 'hole'. % WBuf0 _ 10000C; *max real page +1 = 4096d MapAddr _ 140000C; *carries beyond max VM cause ALU Carry *I think (?) that a missing storage board or a 'hole' delivers all *1's data. WBuf1 _ 326C; *Arbitrary constant WBuf2 _ 134000C; WBuf3 _ 0C, Call[.+2]; *imWriteMap returns here. PStore4[XMAdLo,WBuf0,0]; WBuf0 _ T _ (WBuf0) - 1; XMBuf0 _ T, Skip[ALU<0]; T _ (MapAddr) and (37400C), GoTo[imWriteMap1]; %During this phase, sweep upward through real storage and the map. If the real page number and constants are in the 1st quadword of a page, then the page exists--i.e., it is neither beyond the largest storage address nor in the 'hole' of a 96k storage board. If the quadword does not have the expected value, then the page either doesn't exist or has uncorrectable data failures. Attempt to distinguish non-existent from hard bad pages by counting 1 bits in the XOR of the correct and actual values; if 4 or more bits are wrong then the page is called non-existent, else it is bad but left in for now--testing will presumably mark it bad later. We go to this trouble because, if an existing page is skipped here because of a hard error, then the HardBadPages count will be wrong. Set bit 0 in task 17's FFault register so that the fault handler will exit to the location pointed at by Stack on MC2 errors rather than crashing. % RLink0 _ IP[FFault]C, Call[imSetStkP]; Stack _ (Stack) or (100000C); RLink1 _ LoA[MC2ErrRet0], Call[imSetFaultRet]; *Zero quadword for PStore4. T _ WBuf3 _ 0C, Call[imBlockSet1]; T _ RealPage _ 170000C; imTloop: XMBuf0 _ T; *Point base reg at MapAddr and the map entry at RealPage. *Set all map flags off except LogSE. XMBuf0 _ (XMBuf0) and not (70000C), Call[imWriteMap]; PFetch4[XMAdLo,RBuf0,0]; *Return here on an MC2 error indicating that error correction *occurred on the PFetch4 below. Ignore the MC2 error and compare *the data irrespective of the fault. RTemp1 _ 0C, At[MC2ErrRet0]; T _ LdF[RealPage,4,14]; T _ (RBuf0) xor T, Call[imCountErrs]; T _ (RBuf1) xor (326C), Call[imCountErrs]; T _ (RBuf2) xor (134000C), Call[imCountErrs]; T _ RBuf3, Call[imCountErrs]; *Page is non-existent if .ge. 4 bits wrong LU _ (RTemp1) - (4C); ZWord _ 374C, Skip[ALU<0]; GoTo[imNoPage]; *Page exists; zero it in preparation for first data test sweep. MapAddr _ (MapAddr) + 1, Call[.+1]; T _ ZWord, GoTo[imNoPage,R<0]; PStore4[XMAdLo,WBuf0]; ZWord _ (ZWord) - (4C), Return; imNoPage: T _ RealPage _ (RealPage) + 1; XMBuf0 _ 60000C, GoTo[imTloop,Carry']; *Have completed mapping all the storage that exists; now mark the *rest of the map entries vacant. T _ LdF[MapAddr,3,15]; PageCount _ T, Call[imWriteMap]; *Page vacant *Loop here MapAddr _ (MapAddr) + 1; T _ (MapAddr) and (37400C), GoTo[imWriteMap1a,Carry']; %Now test storage by writing and reading various patterns in both forward and backward sweeps, as discussed earlier. Any unequal data compare indicates an uncorrectable storage failure and the page is marked bad. The 128k-word bank of real memory in which an error is detected is recorded in StorageFaults; bits 8 to 15d are a bit table corresponding to banks 7, 6, ..., 0, where 1's represent detected errors in that bank. After testing, the MP will indicate what storage boards need replacement. Since LogSE is true in the map, a fault handler return occurs for any detected failure. The register Transient is initialized to 100000b when testing begins for a page; any MC2 fault return causes the sign bit to be cleared and the rest of the word incremented, so a count of 1 to 63d indicates the number of quadwords causing MC2 fault returns on a particular sweep. If Transient is greater than the value in SoftQThreshold, the page will be discarded; otherwise, it is put into service. NOTE: following the PFetch4/PStore4 sequence used in the tests below, any MC2 fault on the PFetch4 happens AFTER completing the first mi that touches RBuf--this is a memory controller bug, but the data in RBuf will have been corrected, so it is not especially harmful; also, the TPC clock is not disabled by abort, so any Call in the mi which is aborted by the fault will clobber TPC before the fault starts, even though that mi won't be complete then. If a page is discarded, the map entry at PageCount-1 is copied into the map entry for the page being discarded; then the map entry at PageCount-1 is made vacant; finally, PageCount is reduced by 1 and either the SoftBadPages or HardBadPages register is incremented. % RLink1 _ LoA[MC2ErrRet1], Call[imSetFaultRet], At[imBST,1]; %imBackSweep will return to caller+1 for every quadword during the sweep. At the end of the sweep, it returns to caller+2. % *Check for all '0' data and check bits; write all '1' check bits. imBS1: T _ WBuf2 _ 100000C, Call[imBlockSet2], At[imBST,2]; WBuf3 _ 1C, Call[imBackSweep], At[imBST,3]; T _ RBuf0, GoTo[imAllZeroTest], At[imBST,4]; *Check for check bits all '1', write all '0' data and check bits. imBS2: T _ WBuf3 _ 0C, Call[imBlockSet1], At[imBST,5]; Call[imBackSweep], At[imBST,6]; LU _ (RBuf0) xor (100000C), GoTo[imChkB1Test], At[imBST,7]; *Check for check bits and data all '0', write all '1' data. imBS3: T _ WBuf3 _ (WBuf3) or not (0C), Call[imBlockSet1], At[imBST,10]; Call[imBackSweep], At[imBST,11]; T _ RBuf0, GoTo[imAllZeroTest], At[imBST,12]; *Check for data all '1'; write 0 check bits and data. imBS4: T _ WBuf3 _ 0C, Call[imBlockSet1], At[imBST,13]; Call[imBackSweep], At[imBST,14]; LU _ (RBuf0) xnor (0C), GoTo[imAllOneTest], At[imBST,15]; %Here, we make the judgment that it is not worth doing the forward sweeps unless we found some problem during the reverse sweeps. It takes about 0.5 seconds/storage board to do all the forward sweeps = about 24 minutes/year for machines with 384k words booted 3 times/day. Forward sweeps are intended to find the higher addresses affected by on-chip addressing problems, but in this case we expect that the lower addresses involved in such a failure would have been detected during the reverse sweeps. % LU _ StorageFaults, At[imBST,16]; Skip[ALU#0]; GoTo[imClearLogSE]; *Check for check bits and data all '0'; write all '1' check bits. imFS1: T _ WBuf2 _ 100000C, Call[imBlockSet2], At[imFST,1]; WBuf3 _ 1C, Call[imForwardSweep], At[imFST,2]; T _ RBuf0, GoTo[imAllZeroTest], At[imFST,3]; *Check for all '1' check bits; write all '0' data and check bits. imFS2: T _ WBuf3 _ 0C, Call[imBlockSet1], At[imFST,4]; Call[imForwardSweep], At[imFST,5]; LU _ (RBuf0) xor (100000C), GoTo[imChkB1Test], At[imFST,6]; *Check for all 0 data and check bits; write all '1' data. imFS3: T _ WBuf3 _ (WBuf3) or not (0C), Call[imBlockSet1], At[imFST,7]; Call[imForwardSweep], At[imFST,10]; T _ RBuf0, GoTo[imAllZeroTest], At[imFST,11]; *Check for all '1' data; zero storage. *NOTE: Storage must be zeroed. imFS4: T _ WBuf3 _ 0C, Call[imBlockSet1], At[imFST,12]; Call[imForwardSweep], At[imFST,13]; LU _ (RBuf0) xnor (0C), GoTo[imAllOneTest], At[imFST,14]; imClearLogSE: *Clear the 'Return on MC2 errors' bit in FFault. RLink0 _ IP[FFault]C, Call[imSetStkP], At[imFST,15]; Stack _ (Stack) and not (100000C); T _ (PageCount) - 1; imClearLogSELoop: MapAddr _ T, GoTo[imDone,Carry']; *Exchange garbage with the contents of the map entry at MapAddr. Call[imWriteMap]; *Rewrite map entry with LogSE turned off. XMBuf0 _ T; XMBuf0 _ LdF[XMBuf0,1,17], Call[imWriteMap]; T _ (MapAddr) - 1, GoTo[imClearLogSELoop]; imChkB1Test: LU _ (RBuf1) xor (100000C), Skip[ALU=0]; GoTo[imQWBad]; LU _ (RBuf2) xor (100000C), Skip[ALU=0]; GoTo[imQWBad]; LU _ (RBuf3) xor (1C), Skip[ALU=0]; imQWBad1: GoTo[imQWBad]; imTestData: Skip[ALU#0]; Return; imQWBad: Transient _ 40000C; ZWord _ (ZWord) or not (0C), Return; imAllOneTest: LU _ (RBuf1) xnor (0C), Skip[ALU=0]; GoTo[imQWBad]; LU _ (RBuf2) xnor (0C), Skip[ALU=0]; GoTo[imQWBad]; LU _ (RBuf3) xnor (0C), DblGoTo[imTestData,imQWBad1,ALU=0]; imAllZeroTest: T _ (RBuf1) or T; T _ (RBuf2) or T; LU _ (RBuf3) or T, GoTo[imTestData]; %Add the number of 1 bits in T to RTemp1 using RTemp as temporary storage. Uses the fact that X and (-X) is equal to the right-most 1 in X. Called by the algorithm which enumerates storage. % imCountErrs: RTemp _ T, LoadPage[opPage2]; T _ (Zero) - T, GoToP[.+1]; OnPage[opPage2]; T _ (RTemp) and T, GoTo[.+3,ALU=0]; RTemp1 _ (RTemp1) + 1, LoadPage[InitialPage1]; T _ RTemp _ (RTemp) and not T, LoadPage[opPage2], GoToP[.-3]; Return; OnPage[InitialPage1]; %If any errors were detected, whether or not they were ignored, show an MP code (400 + 1, 2, 4, 8, 16, 32, 64, and/or 128) to indicate which banks have problems and dally a little while to let the user view the MP. Then show the number of pages with uncorrectable failures; then show the number of pages with correctable failures; finally, show the Blk and Syndrome values of the last error correction fault. As a keyboard boot option, loop the map and storage tests until some failure is detected; then loop the MP display of the four error codes. Otherwise, if 'too many' pages were discarded because of correctable failures, repeat testing with SoftQThreshold .eq. BadQThreshold. 'Too many' is defined as (PageCount .ls. MinGoodPages) & (SoftBadPages .ne. 0) OR (PageCount .ls. 3000b) & ((SoftBadPages > (PageCount rsh 3)) & (SoftQThreshold .eq. 0). % Set[RefConstant,Add[50017,LShift[RShift[RefreshPeriod,6],4]]]; imDone: LU _ StorageFaults; StorageFaults _ (StorageFaults) + (HiA[TestingStorage]), GoTo[imDone0,ALU#0]; *No errors were detected; determine whether to loop or finish. LU _ (BootType) xor (400C); RBuf0 _ HiA[RefConstant], GoTo[imDone2,ALU#0]; MapAddr _ 140000C, GoTo[imMapTest]; imDone0: LoadPage[opPage2]; *Failures were detected. LU _ (BootType) xor (400C), GoToP[.+1]; OnPage[opPage2]; *Check for less conservative repeat. LU _ SoftQThreshold, Skip[ALU#0]; GoTo[imDone1]; *No repeats if looping until failure LU _ SoftBadPages, Skip[ALU=0]; GoTo[imDone1]; *No repeat if we already repeated ZWord _ HiA[MinGoodPages], Skip[ALU#0]; GoTo[imDone1]; *No repeat if no soft bad pages T _ (ZWord) or (LoA[MinGoodPages]); LU _ (PageCount) - T; LU _ (PageCount) - (EnoughGoodPages), Skip[ALU>=0]; *Repeat maximally aggressively if PageCount < MinGoodPages SoftQThreshold _ 100C, GoTo[imRepeat]; T _ RSh[PageCount,3], Skip[ALU<0]; GoTo[imDone1]; *No repeat if enough good pages *No repeat if bad pages .ls. 1/16 to 1/8 of good pages LU _ (SoftBadPages) - T - 1; SoftQThreshold _ BadQThreshold, GoTo[imRepeat,ALU>=0]; *Have failure(s) and not repeating. imDone1: *Show the 400 + sum(2^board) MP value. T _ (StorageFaults) + (LoA[TestingStorage]), Call[imMPDelay]; *Show the count of pages with uncorrectable errors. T _ HardBadPages, Call[imMPDelay]; *Show the count of pages with correctable errors. T _ SoftBadPages, Call[imMPDelay]; *Show Blk.0 and 1 (high true) and the relevant 8-bit syndrome. RLink0 _ IP[xStorageFaults]C; StkP _ RLink0; T _ Stack, Call[imShowOctal]; LU _ (BootType) xor (400C); Skip[ALU#0]; GoTo[imDone1]; *Loop the MP failure display LoadPage[InitialPage1]; RBuf0 _ HiA[RefConstant], GoToP[imDone2]; imRepeat: LoadPage[InitialPage1]; GoToP[imRepeatStorageTest]; *Convert octal number in T into the decimal number which will *have the correct octal characters when shown on the MP. *Then show the number on the MP. Smashes RBuf0-3 and RLink0. imShowOctal: RBuf2 _ T, UseCTask; *Save octal value. T _ APCTask&APC; RLink0 _ T; *Save procedure return T _ LdF[RBuf2,15,3]; RBuf0 _ T; *Accumulate result in RBuf0. RBuf1 _ HiA[1750]; *1000d = 1750b RBuf1 _ (RBuf1) or (LoA[1750]); T _ LdF[RBuf2,4,3], Call[imMulAdd]; RBuf1 _ 144C; *100d = 144b T _ LdF[RBuf2,7,3], Call[imMulAdd]; RBuf1 _ 12C; *10d = 12b T _ LdF[RBuf2,12,3], Call[imMulAdd]; LoadPage[0], GoToP[imMPDelay1]; imMulAdd: RBuf3 _ T; imMulAddLp: T _ RBuf1, Skip[ALU#0]; Return; RBuf0 _ (RBuf0) + T; RBuf3 _ (RBuf3) - 1, GoTo[imMulAddLp]; %Loop to delay for a period of time so that user can notice warning on maintenance panel. imMPDelay delays for (OuterLoopCount+2)*(2*InnerLoopCount+13)+5 cycles and then returns. Values below make the total delay about 32,514 x 311 + 5 = 1.01 sec with 40 mHz crystal. % MC[OuterLoopCount,77400]; *= 32,512d MC[InnerLoopCount,225]; *This makes 2*InnerLoopCount+13 = 311 imMPDelay: RBuf0 _ T, UseCTask; T _ APCTask&APC; RLink0 _ T, LoadPage[0]; imMPDelay1: T _ RBuf0, CallP[PNIP]; RBuf1 _ OuterLoopCount; imSweepDelay: RBuf0 _ InnerLoopCount, Call[imMPD1]; RBuf1 _ (RBuf1) - 1, Skip[R<0]; GoTo[.-2]; APCTask&APC _ RLink0, GoTo[imMPDRet]; *This subroutine returns after 2*RBuf0 + 7 cycles. imMPD1: RBuf0 _ (RBuf0) - 1, GoTo[.,R>=0]; imMPDRet: Return; OnPage[InitialPage1]; imRL0Ret: APCTask&APC _ RLink0; imRet: Return; imSaveRL0: RLink0 _ T, Return; %Change timer period from its current slowed value to the standard value used by the emulators--see comment in Initial.Mc. Then, if PageCount .ls. MinGoodStorage (.eq. 256d pages or 64k words), crash with the NotEnoughMemory MP code. % imDone2: RLink0 _ IP[RTimer]C, Call[imSetStkP]; T _ (RBuf0) or (LoA[RefConstant]), Call[imPush]; ZWord _ HiA[MinGoodPages]; RLink0 _ Sub[IP[xPageCount],1]C, Call[imSetStkP]; *Save test results in high RM registers where emulator can avoid *smashing them more easily. T _ PageCount, Call[imPush]; T _ StorageFaults _ (StorageFaults) - (HiA[TestingStorage]), Call[imPush]; T _ HardBadPages, Call[imPush]; T _ SoftBadPages, Call[imPush]; T _ (ZWord) or (LoA[MinGoodPages]); LU _ (PageCount) - T, LoadPage[InitialPage]; T _ NotEnoughMemory, DblGoTo[InitFail,MemInitDone,ALU<0]; *Make the fault handler send control to MC2ErrRet0 on a fault. *(This is below the label "imTloop".) imSetFaultRet: RLink1 _ (RLink1) or (HiA[MC2ErrRet0]); RLink0 _ IP[RLink1]C; imSetStkP: StkP _ RLink0, Return; imPush: Stack&+1 _ T, Return; imInitForwardPage: ZWord _ 0C, Skip; imInitBackPage: ZWord _ 374C; XMAdLo _ T; Transient _ 100000C, Return; imBSNewPage: Transient _ LdF[Transient,1,17], Skip[R<0]; UseCTask, Call[imMarkSector]; *One or more failures T _ SoftQThreshold; LU _ (Transient) - T - 1; Skip[ALU<0]; *Too many failures or uncorrectable failure UseCTask, Call[imPageBad]; ZPage _ (ZPage) - 1, GoTo[imBSBegin]; imBackSweep: T _ (PageCount) - 1; ZPage _ T, UseCTask; *RLink0 _ where to go to compare data; RLink0+1 is where to go *at end of sweep. T _ APCTask&APC, Call[imSaveRL0]; imBSBegin: T _ LHMask[ZPage], Skip[R>=0]; RLink0 _ (RLink0) + 1, GoTo[imRL0Ret]; XMAdHi _ T; T _ LSh[ZPage,10], Call[imInitBackPage]; *Return here after each successful data compare T _ ZWord, GoTo[imBSNewPage,R<0]; PFetch4[XMAdLo,RBuf0]; ZWord _ (ZWord) - (4C); imFSDisp: APCTask&APC _ RLink0; PStore4[XMAdLo,WBuf0], Return; *MC2 fault return APCTask&APC _ RLink0, At[MC2ErrRet1]; Transient _ (LdF[Transient,1,17]) + 1, Return; imForwardSweep: UseCTask; *RLink0 _ where to go to compare data; RLink0+1 is where to go *at end of sweep. T _ APCTask&APC, Call[imSaveRL0]; ZPage _ 0C, GoTo[imFSBegin]; imFSNewPage: Transient _ LdF[Transient,1,17], Skip[R<0]; UseCTask, Call[imMarkSector]; *One or more failures T _ SoftQThreshold; LU _ (Transient) - T - 1; ZPage _ (ZPage) + 1, Skip[ALU<0]; *Too many failures or uncorrectable one; since the page at the *last good map entry will be substituted for the one removed, *continue testing at the same page. ZPage _ (ZPage) - 1, UseCTask, Call[imPageBad]; imFSBegin: T _ (PageCount) - 1; LU _ (ZPage) - T; T _ LHMask[ZPage], Skip[ALU<0]; RLink0 _ (RLink0) + 1, GoTo[imRL0Ret]; XMAdHi _ T; T _ LSh[ZPage,10], Call[imInitForwardPage]; *Return here after data compare for each word of page. LU _ (ZWord) + (177400C); T _ ZWord, GoTo[imFSNewPage,Carry]; PFetch4[XMAdLo,RBuf0]; ZWord _ (ZWord) + (4C), GoTo[imFSDisp]; %Data on page cannot be read correctly or more than BadQThreshold correctable errors on the page during one of the sweeps. ZPage is the map entry affected; UseCTask in calling mi. % imPageBad: T _ APCTask&APC, Call[imSaveRL2]; *Exchange Vacant with the map entry at PageCount-1. XMBuf0 _ 60000C; T _ PageCount _ (PageCount) - 1, Call[imWriteMapT]; *Exchange the value obtained from the map entry at PageCount-1 *with the contents of the map entry at ZPage. XMBuf0 _ T; T _ ZPage, Call[imWriteMapT]; imRL2Ret: APCTask&APC _ RLink2, GoTo[imRet]; imSaveRL2: RLink2 _ T, Return; %Mark the 128k word sector in StorageFaults. NOTE: imMarkSector must be called before imPageBad. ZPage is the map entry affected; UseCTask in calling mi. % imMarkSector: T _ APCTask&APC, Call[imSaveRL2]; *Exchange garbage with the contents of the map entry at ZPage. T _ ZPage, Call[imWriteMapT]; *Rewrite the map entry. XMBuf0 _ T, Call[imWriteMap]; *'OR' 1 LShift[128k-word region number] into StorageFaults. XMBuf1 _ 100000C; XMBuf0 _ LdF[XMBuf0,4,3]; XMBuf0 _ (XMBuf0) - 1; T _ XMBuf1 _ LCy[XMBuf1,1], GoTo[.-1,ALU>=0]; %Only count bad pages during the first pass of the storage test when any failure is treated as a bad page. During the second pass, pages already counted in SoftBadPages will be retested and may be counted again, screwing up the statistics. % LU _ SoftQThreshold; LU _ (Transient) - (100C), Skip[ALU=0]; StorageFaults _ (StorageFaults) or T, GoTo[imRL2Ret]; StorageFaults _ (StorageFaults) or T, Skip[ALU<0]; SoftBadPages _ (SoftBadPages) + 1, GoTo[imRL2Ret]; HardBadPages _ (HardBadPages) + 1, GoTo[imRL2Ret]; *imBlockSet1 puts T into WBuf0-2. imBlockSet1: WBuf2 _ T; imBlockSet2: WBuf1 _ T; WBuf0 _ T, UseCTask; T _ APCTask&APC, Call[imSaveRL0]; %Now delay about twice the refresh period to allow leaky RAMs to lose charge. During this delay, the only storage references are those by the Timer task's Refresh; at other times, ordinary storage references refresh all words in the referenced row except the one being read. Since the refresh period is 3840*32 = 122880 cycles, the required constant here is (245760-1003)/311 = 787 = 1423b. % RBuf1 _ HiA[1423]; LoadPage[opPage2]; RBuf1 _ (RBuf1) or (LoA[1423]), GoToP[imSweepDelay]; :END[MemInit];(1795)