Recovery from Checkdsk ErrorsMaxc Operations5014. RECOVERY FROM CHECKDSK ERRORSAs explained in Section 6, when Tenex is restarted, whether initially or due to an auto-restart after acrash, the programs Bsys and Checkdsk are run to verify the consistency of the file system. If eitherof these programs detects errors which cannot be corrected automatically, the message "Tenex notavailable: Disk needs fixing" is broadcast to all terminals instead of the usual "Tenex in operation"message, and logins are prohibited from all terminals except the two in the Maxc room.The following procedures require wheel or operator status and are intended principally for referenceby system personnel. With some assistance from a user in the Maxc room, a system maintainer canperform these procedures from a home terminal. Only in extreme circumstances should non-systempersonnel attempt any of these procedures.Errors detected by Bsys (which are usually reflected by some further errors detected by Checkdsk)indicate inconsistencies in the structure of user file directories. Fixing these requires a fairlyintimate knowledge of the Tenex directory structure; this should be left to system personnel.Information about the structure of directories may be found in the Tenex Monitor Manual, sectionVII, pages 2-5. Other helpful information is available in the Bsys manual, pages 33-36. Copies ofboth these documents are kept in the Maxc room book case.Checkdsk errors come in a number of guises. For each file with errors, data will be printed as inthe following example:(NOTE: Before looking at CheckDsk errors you may have to do a "Diablo Copy ON" in the upper AltIO window to get hard copy output of the error messages; then Boot Micro-Exec, Start Tenex.)MESSAGE.COPY;3Filename40050172166 MDA 0} List of errors140050172170 MDA 63}1 PTE} Error2 MDA} summaryIf there are many errors in a single file, Checkdsk will print out only the first few, followed by thesummary. Study the output carefully.First, note that "NOT IN BT" errors have been corrected by Checkdsk, so don't worry about them.If these were the only errors that occurred, Checkdsk wouldn't have complained and the systemwould have flown on.The other kinds of errors reported by Checkdsk are more serious:MDAMultiply-assigned disk addressIDAIllegal disk addressPTEPage table errorMDA errors are the only ones that cause Tenex to prohibit users from logging in, since further fileactivity is likely to make the damage spread.Note that you will have to use some judgment in discriminating between garbaged page tablesand real MDA errors. A file with a garbaged page table will have an enormous error count (in the)fqX;pi _rX" \1se Zf_ XI V14 UV Q&> O/1 M_ L5* H*7 F,7 E-)5 CcB AQ ?9 <\,6 :t48+-7B 3s*+2X*+0;*+.q*+,*+ )4K 'i% ##us "-V b X@~ w Y - ;$us/ p] t )>\Maxc OperationsRecovery from Checkdsk Errors51hundreds) with many categories of errors (IDA, MDA, PTE, etc). This is frequently caused by anuntimely Tenex crash occurring between directory update and page table update during new filecreation, so that the page table for the file will not have been written on the disk yet and whateverwas on that page before will be interpreted as a page table. This type of error may result in manyother files getting bogus MDA errors because some of the entries in the garbaged page table looklike valid disk addresses that happen already to be assigned.For further confirmation that the problem is a garbaged (unwritten) page table, a QFD of thefilename should reveal that it was created within a minute or two before the time of the last systemcrash. Such a file should be deleted using the following procedure:@ENABLE password !CONNECT !DELETE !EXPUNGE !This procedure causes the bad file to be expunged from the directory. A number of valid addressespossibly in use by other files may be deallocated, but don't worry about this. The system willgenerate a number of BUGCHKs for illegal disk addresses, but don't worry about this either. (Besure DCHKSW is set to zero, however, to prevent the system from breakpointing on these errors).Run Checkdsk again after performing this surgery to make sure you did it right and that there isnothing else wrong. Checkdsk will reallocate pages incorrectly deallocated by the precedingprocedure and will type out "NOT IN BT" for these.!CONNECT SYSTEM !CHECKDSK REBUILD BIT TABLE? NSCAN FOR DISK ADDRESSES? N(This currently takes about 15 minutes).After all files with garbaged page tables have been eliminated (if there were any), any further errorsare considerably more serious, particularly MDA errors. MDA stands for multiply-allocated diskaddress, meaning that a particular page has somehow been assigned to more than one file. Foreach such error, Checkdsk has printed out the second file owning the page that it encountered in itsscan of the file system; you do not yet know the name of the other owner of that page. Henceyou should follow this procedure:!CONNECT !COPY GARBAGE !DELETE !EXPUNGE !RENAME GARBAGE Repeat this procedure for all affected files. You should be careful to type the in full, including version number, so you don't mistakenly fix up the wrong file.Next, re-run Checkdsk as explained above. While running, Checkdsk will type out a number of fpi2)qXFp _sA ]K O [23 YI WusK V!= R.. PT ODK8KQiKXIQII&HQGHFHQE FH D} A "@ ?A_ =v@ ;/0 9A 8? 6K22Q22X1Q0 1 /D"."/D-z(-#"-z *( &"D $8' #A !6.us k-1 !/Q#/X7dQ +d%QCTQx  Q'  Y Q V": f >]mRecovery from Checkdsk ErrorsMaxc Operations52NOT IN BT errors whose disk addresses correspond to the disk addresses in the original MDAerror printouts; the filenames typed out will be those of the other owners of the pages that weremultiply assigned. It may not be obvious which owner of a page has the correct copy (a QFD ofthe filenames will include the write dates, which may give some indication; i.e. the file with thenewer write date is more likely to have the correct data), but you have done the best that can bedone by giving non-conflicting copies to everybody involved. Use SNDMSG to notify all userswho have (potentially) lost files. Both the original MDA and the final NOT IN BT files areinvolved in the loss.When you have pieced the filesystem back together to what you believe is a reasonable state, youshould open the system to users by the following procedure:!QUIT ./FACTSW[ 500100,,0 400100,,0 ABORT.^!LOGOUT LOGOUT JOB ......After you type control-C, the auto-jobs should start logging in, and shortly thereafter "Tenex inoperation" will be broadcast.)fqX;pi  _sJ ]K7u s [G YY W:' V!A TV[ R O9' MO;IQIrIXH/GHE\FH !EVFH D&D} B@/@@?Q> ? =S;1; 8<% 6KX 6=2 TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN  TIMESROMAN TIMESROMAN @j/JMaxcOps14.Bravo RWeaver.PAApril 20, 1984 1:59 PM