Building the Cedar TeX82
To        The TeX wizard        Date    February 14, 1984
From    Lyle Ramshaw        Location    PARC
Subject    Building TeX in Cedar    Organization    CSL
XEROX
Filed on    BuildingTeX.tioga  in  Tex.df
Last edited    by Ramshaw.pa, February 15, 1984 9:23:46 pm PST
Abstract    TeX82 is a Pascal program that Michael Plass and Lyle Ramshaw succeeded in porting into Cedar.  This memo describes 
some of the funny things that we did and why we did them.  After a Cedar programmer reads this memo, she should be able to rebuild 
TeX from the sources, taking it step by step through all three of the compilers involved.
The big picture
TeX is distributed from Stanford as a source file in the Web language.  Web is a programming system built by Prof. Donald E. Knuth 
of Stanford University expressly for the purpose of producing a TeX implementation of publishable quality.  A Web source file 
consists of executable code fragments in Pascal and documentation fragments in TeX, combined with macro definitions of various 
flavors.  This source file is operated on by two translators:  Tangle takes it, evaluates all of the macros, throws away the 
documentation, and produces one huge and complete Pascal program;  Weave decorates the source file with still more typesetting 
stuff and produces a TeX source file for a beautiful listing of the program, complete with documentation and indices of various 
sorts.
As a programming language, Mesa is almost a superset of Pascal;  and the syntaxes of the two languages differ only in the details.  
Capitalizing on this, Edward M. McCreight wrote a Pascal-to-Mesa translator some years ago;  it takes Pascal programs, parses them 
by recursive descent, and outputs the source of an equivalent Mesa program.  I converted PasMesa to run in Cedar, and used the 
resulting PasMesa to port Knuth's Tangle processor.  Then, Michael Plass and I used PasMesa and Tangle to port TeX itself.  As of 
this writing, no one has yet bothered to port Weave, although it wouldn't be hard.
Thus, TeX goes through three compilers.  First, it goes through Tangle to be converted to Pascal.  Then, it goes through Pasmesa to 
be converted to Mesa.  And finally, it goes through the Cedar compiler and binder to be converted into a runnable BCD.  At each 
step in the process, there are special files on the side that adjust how things are done and supply small changes.
The introductory documentation for TeX is in the file TeXDoc.tioga, available through TeX.df.    
It's not what you know, it's who you know.
Don Knuth <DEK@SU-AI> at Stanford wrote TeX.  But he is a busy man, and he doesn't know too much about the detailed hassles of 
porting TeX to other machines.  You probably want to ask your question of Dave Fuchs instead <DRF@SU-SCORE>.  He has guided 
innumerable people through ports of TeX, and he is generally very helpful.  If your question concerns fonts, either fonts for 
Dovers or TFM files, you might also be interested in speaking with Arthur Keller <ARK@SU-AI>, who maintains the Dover font 
dictionary at Stanford. 
Fine points about TeX itself
TeX's memory array.
TeX does all of its own storage allocation by working out of a big array.  That makes good sense from one point view, since it 
would have been risky for Knuth to trust the storage allocators of a many Pascal compilers.  But it is rather a pain from our point 
of view, since the Mesa world is not prepared to handle very large arrays.  The biggest block of storage that the normal allocator 
can construct is only 128 KBytes.  Thus, it was not possible to implement TeX's memory array as a Cedar array.  Instead, we took 
advantage of the ProcArray feature of Mesa (which I had implemented to handle a similar problem with Tangle).  When you ask for a 
ProcArray, PasMesa compiles the Pascal array of thingies into a Mesa inline procedure that takes the index type (or types) of the 
array as an argument (or arguments) and returns a long pointer to a thingy as result.  Every use of the array in either a fetch or 
a store gets a derefence tacked on by PasMesa as well, so that everything works.  TeX's main memory is an array of TexTypes.MemMax 
(currently 58,000) 32-bit thingies, implemented as a ProcArray named Mem.  The storage for the array is actually allocated by the 
start code of TexSysdepInlineImpl by calling VM.Allocate.  The body of the inline proc that does the array calculation is in the 
definitions module TexSysdepInline.
By the way, we used inlines only very sparingly in the TeX port because of their tendency to exacerbate the problem of modules too 
big for the Mesa compiler.  The memory ProcArray and a few calls to SirPress for showing characters are the only inlines used.   In 
particular, the file is input via calls to the PascalRuntime that turn into calls on IO.GetChar, doing at least two xfers per 
character.  I have done essentially no performance work on TeX, so I don't know if this is a significant performance problem.       
  
Writing Press files.
The current TeX gives the user the choice of writing either DVI or Press file output.  I left the DVI stuff in just to make the 
TRIP torture test easier to perform;  I presume that everyone will actually use Press format, at least as long as the Dovers are 
our workhorse printers.  When the Imager comes along, TeX should probably be converted to it from its current dependence upon 
SirPress.  I wrote the Press output module for TeX by taking the DVI output module and replicating the procedures hlist_out and 
vlist_out into two procedures:  hlist_out and vlist_out for DVI output and hlist_press_out and vlist_press_out for Press.  The file 
TeX.changes includes a long change that changes the DVI output code from itself to itself;  this is so that any bug fixes in that 
code that might occur in later releases of TeX will cause error messages from Tangle, pointing out where the corresponding Press 
procedures might have to be changed.
In general, the Press code is just a simplification of the DVI code.  For example, the DVI code has to keep track of the stack of 
positions, while Press has no stack, just one global position.  The DVI code also has lots of fanciness in it for doing what is 
essentially register allocation on the DVI abstract machine;  none of that is relevant for Press.  There are two non-trivial parts 
to the Press code.  First, as an efficiency move, I chose to use SirPress pipes to handle the character output.  Thus, if TeX is 
walking along a horizontal list just outputing character boxes, spaces, and kerns, all of these commands will be stored in a pipe 
and dumped all at once by a call to ClosePipe when something else happens, such as a font change or a vertical move.  This adds a 
small amount of complexity to the code, from two directions:  I have to be careful to keep track of whether a pipe is open or not, 
and I have to do my own bounds checking to make sure that the pipe doesn't run off the end.  Since TeX is compiled with bounds 
checking off (it shouldn't need it, and the bounds checking code also exacerbates the problem of the compiler size limits), running 
off the end of a pipe inside of Cedar would crash the world instead of raising a signal.
The other non-trivial part of the Press output code is the units conversion.  Sirpress tries to be very nice about units, working 
in a very small unit and letting you specify your conversion factor.  But pipes are different:  with them, you must use micas.  
Hence, I just call TeX's own arithmetic procedures and do the conversion to micas before calling SirPress.  This appears in the 
code as multiplication by an unexplained and mysterious-looking rational number;  that number is the closest approximation to the 
correct conversion factor between scaled points and micas for which the arithmetic doesn't overflow.
While we are on the topic of units, I should mention that TeX seems to be getting the mica sizes of some fonts off by one from what 
Spruce thinks they are---in particular, the five point fonts at magstephalf.  Spruce won't bitch about an off-by-one-mica request 
for a ten point font, but, by the time you get down to five points, it will bitch (the error threshold is a percentage of the font 
size).  I vaguely recall that I used to get lots more of these font substitution messages, and that I put an extra plus one-half in 
on the call to SirPress.GetFontPipeCode and fixed lots of them.  But I haven't gotten things exactly right yet.  Exactly right, in 
this context, means that TeX does the calculation the same way that the Sail Metafont did when it wrote the OC's.  I'm not sure, by 
the way, whether the Sail Metafont used 1.095 as the magnification for magstephalf or a more accurate approximation to the square 
root of 1.2;  it might make a difference, sad to say.  I wonder why they don't get these messages at Stanford.  You might ask Dave 
Fuchs what their DVI-to-Press converter does for font sizes, if the fix doesn't look obvious. 
Operational hints
Feature changes in TeX.
Suppose that you have fixed a bug in TeX, and you want to release a new one.  But suppose that Tangle and Cedar and PasMesa and the 
other parts of the runtime support system are working and haven't changed.  Then, here is what you do.  First, you type ``Tangle 
TeX''.  This tells Tangle to read TeX.web and TeX.changes and do its processing.  Remember that TeX.web is a big file;  if Tangle 
seems to take a long time getting started, it might be that FS is busy flushing your disk cache in order to free up space for 
TeX.web.  Once Tangle gets started, it plugs along at a respectable rate.  I changed the error messages so that they report errors 
by character number, so that should help in tracing down bugs that Tangle reports.  The abbreviated module names that Knuth 
insisted on using are one frequent source of problems here.  I suspect that Tangle reports at most one error message per pass;  in 
any case, don't be surprised if you fix the one error that it reports and then it stumbles across some totally unrelated error on 
the next try.
When Tangle is done, the next processor that you want to run is PasMesa, and the command is ``PasMesa TeX.mod''.  Be warned, 
however, that PasMesa is quite a hog of virtual memory;  it allocates many, many short ropes and long ropes, holding onto some of 
them well beyond their useful lifetime because of the plethora of global variables in this papered-over Alto Mesa program.  Thus, I 
suggest that you Rollback just before and just after running PasMesa.  The elapsed time will be less if you include the Rollbacks, 
I assure you.
PasMesa starts by reading the file ``TeX.mod'', and gets the rest of its instructions from there.  It ends by writing out many Mesa 
source files.  I haven't ever tried running PasMesa in a working directory, so there might be some problems in that area;  be 
careful.  In the unlikely event that PasMesa finds a bug in TeX.pas, it will probably have something to do with an undeclared 
variable of some kind.  Perhaps you put in a change that calls an external procedure and forgot to declare the external procedure, 
for example.  PasMesa reports one error at a time.
When the Rollback returns, you are ready to put the resulting Mesa modules through the compiler and binder.  PasMesa has written a 
command file named ``CompileTeX.cm'' that will do just the right thing.  This is the longest step of the process, and takes roughly 
15 minutes.  I stopped working on the performance of PasMesa as soon as I got it to run at least as fast as the Mesa compiler, so 
that it was no longer the bottleneck.
A correct compile of TeX includes no errors, of course, but does include several warning messages.  TexScanImpl will report a 
warning of a signed/unsigned ambiguity.  I looked at the types in that expression with great care, and I'll be damned if I can 
figure out why the compiler is getting confused.  Russ Atkinson couldn't figure it out either.  But we looked at the code that is 
generated, and the compiler is doing the right thing.  I recommend that you just tolerate this warning.
Next, there are three modules that will report one warning each for unreachable code:  TexRest2Impl, TexFinalizeImpl, and 
TexMainControlImpl.  The first two arise because the TeX that you are making is an IniTeX.  There are two things that IniTeX can do 
that regular TeX's can't:  build the hyphenation trie and do a \dump.  In each case, the code for IniTeX has an unreachable error 
message in it;  a vanilla TeX would hit the error message instead, and it would report that only IniTeX can do what you ask.  The 
third instance of unreachable code is more subtle, and arises from PasMesa's clever way of translating goto's.  The block in the 
procedure MainControl ends with an unconditional goto;  that is to say, control never exits this block by falling off the bottom.  
But PasMesa has inserted an EXIT into the Mesa code to arrange that control paths that fall off the bottom will get out correctly 
past all of the intermediate blocks that are handling the Pascal gotos.  It is this EXIT that is unreachable code.  Life is hard.
There is one more module with warnings, four of them this time;  but they're very minor.  The module TexExtensionsImpl declares 
four unused variables on purpose so that people who are trying to debug with a Pascal debugger can store integers somewhere.  The 
Mesa compiler quite correctly notices that these variables are unused.  These warnings would go away if TeX's debug switch were 
turned off, which probably wouldn't hurt anything.  The code that it enables is essentially useless in comparison with the Cedar 
debugger. 
The bind of TeX is next, and has no surprises.  When the bind is done, you should have a runnable TeX.bcd, ready to try out.  There 
are two auxiliary files to worry about as well:  TeX.pool and TeX.bcd.  TeX.pool is handed out with TeX, since TeX isn't really 
ready to start reading user profiles and looking around for things on remote servers before the strings are working.  Tangle 
produced a new TeX.pool, and your new TeX will access it by its short name, and all should be well.  When you SModel TeX.df, this 
TeX.pool will go out along with TeX.bcd for other folk to retrieve via their Bringover.
The other basic file is Plain.fmt, and that one works a little differently.  Format files are generally referenced on a remote 
server.  You might get into trouble if you have made any changes that would invalidate the old format file.  If so, and you try and 
test your new TeX by typing ``TeX story'', or something like that, TeX will try and load the default format file, which will be the 
old one (because you haven't SModel'ed a new one yet).  Thus, early on in your testing, you should produce a new format file by 
typing ``IniTeX plain \dump''.  And then reference that format file instead of the default one by trying out the story with ``TeX 
&plain story''.  At least, this is what I did.  You might be able to devise a better procedure now that the default format is set 
in your user profile;  just changing your profile to point to the local Plain.fmt as the default might work.  Note that you 
probably shouldn't put Plain.fmt out on Indigo until you are really ready to SModel the new TeX, because other folk are referencing 
Plain.fmt without an explicit version stamp (to avoid polluting their working directories with the short name Plain.fmt;  if this 
gets to be a problem, change TeX.df to export Plain.fmt and to hell with working directories).
Once you have found a bug, you have to decide how much of this tedious loop you have to go back around in order to fix it and 
continue debugging.  If you are lucky, your bug was in TexSysdepImpl.  You can recompile just that module and rebind TeX, and you 
are back in business.  If your bug is anywhere else, you almost certainly have to run PasMesa again.  And, if the bug involved 
TeX.changes, Tangle as well.  In either of these latter cases, you'll have time for a pleasant coffee break.  It is somewhat 
annoying, by the way, to have to watch your machine for the better part of an hour just in order to type short command lines and do 
rollbacks.  Back in Cedar 4.4, I wrote a program called RollBackHack;  it registered a proc to be called after Rollback that looked 
at a text file of ropes (called ``Com.cm'' for sentimental historical reasons).  If Com.cm had any ropes in it, the Rollback proc 
would take the first one off, write the rest back onto Com.cm, then get a UserExecutive, and hand the rope the that executive.  The 
net result was that I didn't have to babysit the three compilers.  (The Rollback proc was careful to wait for 30 seconds or so 
before starting off, so that I could intervene in the process if necessary!)  I did not update this hack to Cedar 5.0 because it 
depended on the UserExecutive session log stuff:  when you return to your Dorado after an hour, you want to see the typescripts of 
all three compiles in order to check for error messages.  With work, of course, a Cedar 5.0 version of RollBackHack could implement 
session logging on its own.  
A new release of Cedar.
Things are a little bit more complicated when there is a new release of Cedar, especially if interfaces have changed.  Remember 
that Tangle is a bootstrap processor;  think twice before doing anything to it or deleting any version of it.  You must have a 
relatively competent Tangle that you can run in the current Cedar before you can work on fixing bugs in any Web program, including 
Tangle itself.  Fortunately, the worst problems in this area are probably over, since Tangle itself is likely to be very stable 
from now on;  the trickiest bootstrap occurred back in September of 1983, when I wanted to build, in a new Cedar, a new Tangle for 
which the format of change files had changed.
Back to the subject of a Cedar release.  The first thing to do is to convert PasMesa.  This should be old hat, considering how much 
converting of old code to new releases of Cedar we all get to do.  Also, convert the PascalRuntime.  You will find this a more 
tedious job, because the PascalRuntime is in a pretty ugly state at the moment and because a runtime package has to have its 
fingers in many pies.
Note that the runtime services needed by a Pascal program have been divided up into various classes, with different interfaces for 
each class.  Given that there is no way in Mesa to bind up several interfaces into a bigger interface, I couldn't figure out any 
better way to proceed.  The issue is that different Pascal programs want to have different file systems under them, and some Pascal 
programs don't use Sets at all, while the rest of the runtime stuff is common to all Pascal programs.  Hence, there is a 
PascalBasic with the basic stuff, three  different file interfaces, and a Sets interface, along with implementations for each.  The 
PascalNoviceFiles package tries to be really nice to the novice programmer.  There is code to make text files avoiding reading one 
character ahead (as most Pascal files do), so that terminal interaction can work correctly.  In Cedar 4.4, there was code to have 
the PascalOutput file appear in a viewer on your screen as well as in your file system, so that you could see your program at work; 
 but I fear that I didn't get that working in Cedar 5.0 (the Cedar 4.4 implementation used DribbleStreams, which went away).  
PascalWizardFiles is a much thinner layer on top of IO.STREAM;  this is more to the liking of big applications programs like TeX, 
which generally open files and the like by calling Cedar procedures that are declared as external to the Pascal program in any 
case.  PascalInlineFiles is an inline version of PascalWizardFiles.  Be warned that using the inline version will make the modules 
into which you have broken your Pascal program somewhat less likely to make it past the size limits of the Mesa compiler.
Having converted PasMesa and the PascalRuntime, turn your attention next to Tangle.  Start with Tangle.pas, which was carefully 
saved away in addition to Tangle.web and Tangle.bcd by the DF file.  Run Tangle.pas through PasMesa and the compiler to get (I 
hope) a working Tangle.  Then, run this Tangle over Tangle.web, and check (using Waterlily) that the resulting Tangle.pas file is 
identical to the one that you started with.  At this point, you can breathe a little easier;  and you can SModel the new Tangle.
You could jump right on to producing a new TeX, but it probably wouldn't be a bad idea to run through the four little programs that 
constitute the TexWare package first.  This will give you more practice before you hit the big time.  And, if you don't do them 
now, you will be tempted not to do them at all.  They should be easy.  Each of PoolType, DviType, PLtoTF, and TFtoPL has a separate 
DF file;   they each go through the three compilers in order in the obvious way.
Then take TeX through the three compilers as described in the last section.  And you're done.
A new release of TeX itself from Stanford.
All of the stuff from Stanford is described in the DF file TexWeb.df.  This DF file should only be SModel'ed if you have retrieved 
a new version of TeX82 from Stanford, probably over the ArpaNet.  If Maxc is still alive, the right place to go is the directory 
<tex.web> on SU-SCORE.  On that directory, you will find ``-read-.me'' and ``textap.cmd'';  the former describes what is in the 
various files in english while the latter tells you what directory they are on at SCORE.  Note that TexWeb.df has almost all of the 
same files as textap.cmd;  the only differences are:  (i) Tangle.pas and Tangle.changes aren't included, because we have already 
done the Tangle bootstrap, and (ii) the TFM's aren't included because they are included instead in /Indigo/Tioga/TFM/tfm.df.
Log in at SU-SCORE as the user with name ``Anonymous'' and any password.  Retrieve text files in the default mode, but change the 
``structure'' to ``F'' and the ``type'' to ``L 8'' for binary files (such as TFM's).  (I'm pretty sure that ``F'' is the right 
structure;  if it doesn't work, try the other one.  ``L 8'' is definitely the correct Type.)
To retrieve new TFM's, if MAXC is still alive, I would recommend going to directory [tex,sys] on SU-AI, since that is the directory 
that Arthur Keller considers as the ultimate truth for the fonts that he maintains, and he maintains the Stanford fonts.  Remember 
that directories come after the file names on SU-AI rather than before, and remember that font file names on SU-AI are shortened to 
six letters by dropping all but the first three and last three letters of a longer name.  Then, copy the new TFM's from Maxc to 
your Dorado, and SModel the TFM DF file again. SModel will give you warnings about all of the 250 other TFM's that you don't have 
on your disk, but the warnings don't indicate a real problem in this case.
Nifty hacks for writing changes files
The files of changes to the standard Web source files have been formatted to be relatively convenient to browse around in with 
Tioga Levels.  Each change is a top-level branch;  the root node of this branch consists of the position number of the beginning of 
the change in the Web source followed by a comment describing the nature and purpose of the change.  Then come children nodes with 
the change itself.  If you look at the first level only, you see just the index of changes, which is convenient.   If you want to 
add a change, the most convenient way is to type a control-return to get yourself a new top-level node.  Then, type ``web.ch'' 
followed by control-E.  Along with the sources for TeX comes a web.abbreviations file that defines the abbreviation ``ch'' to 
expand into a template for a change branch.
There is also a hack program contributed by Michael Plass to help in dealing with position numbers.  It is very helpful to include 
the position numbers in the changes for several reasons.  First, the file TeX.web is huge, and getting to the right place by other 
means would be a pain.  Second, the changes must be in the correct order in order for the merge performed by Tangle to work.  With 
the position numbers included, it is easy to figure out where to put a new change.  (One could even use the EditTool sort-branches 
stuff to reorder a change file that was out of order, although I have never tried that myself.)  The helpful hack program is called 
ShowPosition.  If you run it, it posts a button at the top of the screen, which is backed up by a register that can hold an 
integer.  The value of the register is displayed as the label of the button.  Left-clicking the button stores into the register the 
position count for the current selection.  Right-clicking the button inserts at the current caret position a six-digit text 
representation of the integer in the register.  Thus, when making a new change, select the location of the change (I use the first 
character of the old text as the reference point);  left-click the ShowPosition button;  select the six-digit number at the 
beginning of the change template in pending-delete mode;  then right-click the ShowPosition button.  
Ideas for improvements to TeX in Cedar.
Automatic spooling
It is a minor annoyance that TeX doesn't send your output to your favorite printer automatically.  It should be possible to fix 
this by programming the shell in some way, at least if the Commander is really getting up into the Unix class.  But I haven't 
designed a scheme for this.  It might be easier to make such a scheme work well if TeX returned an ATOM result from its CommandProc 
that revealed the worst level of error message.  TeX computes this level already, and it would be pretty easy to get it returned, 
although it might demand a minor change to PascalBasic (in order to get the right hook in for the return result in the 
ExclusiveProc).  
Making TeX a server
The InterLisp folk would love it if TeX could be made into a server called via RPC from Lisp before the demise of Maxc.  I estimate 
that such a project would take me a couple of weeks. 
IncludePress
The only way to merge illustration Press files into a TeX document at the moment is by going back to the Alto world and running 
PressEdit/M.  It should be relatively straightforward to do a lot better.  Various schemes are possible.  One big decision is what 
you believe the lingua franca of illustrations it:  Press or Interpress or Imager.  Let us explore first schemes in which Press 
format is the way that all of the programs in the world produce the illustrations (which is pretty close to true at the moment).  
One-page Press files have a weakness as a way to encode illustrations:  there is no standard way to give the bounding box and the 
origin of the illustration on the Press page.   This can be fixed in one of two ways:  either make the user specify these four or 
five dimensions when she invokes the illustration;  or settle on a hacky way to encode the information in the file itself that any 
illustrator will be able to handle, such as the strings ``<<<'' and ``>>>'' in this lower-left and upper-right corners, for example.
Once you decide these questions, the rest of a TeX IncludePress should be a lot like the Tioga version.  In detail, you should pick 
a name to serve as a \special operator, such as ``includepress'', and an argument structure.  I would recommend that 
``includepress'' take just one argument, the name of the press file.  Its semantics would be to copy all of the commands from the 
Press file into the TeX output, shifting the origin of the Press page to the current position.  In the scheme where the user has to 
type in the offsets and bounding box dimensions, the rest can all be done with TeX macros and appropriate use of negative glus.  
The top-level macro call might look something like \includePressIllustration{fig1.press width 5truept depth 1in height 7in xoffset 
3in yoffset 2in}.
The scheme where the Press file includes its own markers is trickier to implement, since the dimensions of the illustration box 
must be known by glue-setting time, which is before the output routine runs.  And TeX, as currently set up, assumes that \special 
processing will be done only by the output routine.  But something could be worked out, I'm sure.
On the other hand, maybe this whole thing should wait until the Imager conversion, and either Interpress masters or Imager display 
lists should become the lingua franca of illustrations.  Then, the design issues would be somewhat different.  

Current bugs
5.0 bugs that 5.1 fixes
Without any change to TeX (other than to the TSetter reference in the DF file), Cedar 5.1 fixes two problems with TeX.  First, a 
bug in SirPress positioning is fixed which caused characters to come out in the wrong places if you backed up to exactly the same 
place you had been before when using a pipe.  Second, a CommandProc can successfully open the terminal for input even if it has 
just been loaded for the first time, which it couldn't in 5.0.  Unfortunately, 5.1 seems to introduce a new bug:  TeX is careful to 
arrange that the commands ``TeX'' and ``IniTeX'' will be made uninterpreted, since both ``&'' and ``\'' are quite useful characters 
to type in command lines with their TeX semantics, and it is annoying to have to quote them.  (ShiftInterp gives the user access to 
the interpretation functionality if that is what they want instead.)  This worked fine in 5.0, but uninterpreted commands seems to 
be broken in 5.1;  I sent Larry Stewart a message about it. 
Output routine narrows
The 32-bit INT's that TeX works with get narrowed to 16 bits at some point on the way into SirPress pipe positioning commands 
(inside of ClosePipe, I think).  Thus, users who position things in funny places off the page may end up looking at a bounds fault, 
which is a little impolite.