HerringDebrief.tedit Fri 20 jun 86 M Herring Herein (1) I try to describe how things are around TASM & related code, prior to my leaving them for others to use &/or maintain. I try to restrict myself to things that I expect might be used, though I will mention other modules just in case. (2) I briefly say what any file I've left around is. TABLE OF CONTENTS Catalog of all the files Modules & major entry points grouped / described TCODEPs CCODEPs D->T translation Tamarin ucode tests Miscellaneous Assembler source code format More on generic handling. Format of true-opcode instructions. Variable references. Jumps. D/UNBIND. Entry Vector. Phases of ASM Misc notes on ASM Error handling. Machine description. Variable Allocation. CATALOG OF ALL THE FILES Everything in any directory can be thrown. {eris}work>dt> *.tedit ASMINTERNALS.TEDIT -- goes into some detail about the insides of ASM, including how data flows thru the program, how errors are flagged, what the phases do, etc. Obsolete in places. D2T.TEDIT -- an attempt to say what DT/DD/ASM do, viewed as a facility for converting CCODEPs to TCODEPs. Obsolete in many details, obsolete point of view. Perhaps useful for orientation though, if you need to dig into that system. D2TOPCODES.TEDIT -- attempt to say what happens to D-machine opcodes when DT translates them, by opcode. Obsolete in places. DOPCODEREC.TEDIT -- A great leap forward for mankind, this document actually says what the possible values for a LEVADJ and OPPRINT fields of an OPCODE record are. A fair slide backwards for mankind: it is probably obsolete w.r.t. Tamarin extensions as used by ASM & PC. HERRINGDEBRIEF.TEDIT -- this document LOG.TEDIT -- things I noticed or did in the last while that I thought needed to be pursued, retrofixed, etc NEWDTNOTES.TEDIT -- how the front-end functions in DT work. Doesn't mention TCO & TAMSV ? The processing-of-lists-of-fns is no longer of use. {eris}work>dt> other ASM -- the body of the assembler. This file is too large for convenience. D2DTEST -- stuff for running the CCODEP->CCODEP tests of the translation system. No longer of use. D2DTESTSUITE -- test samples for CCODEP->CCODEP translation. No longer of use. D2T -- translates D-machine source code to generic source code. For the CCODEP->TCODEP translation. No longer of use. DASM -- defines a D-machine to ASM & PC. No longer of use. DASMTESTSUITE -- test samples for testing DASM. DDISASM -- disassembles a CCODEP to D-machine source code. Originally for the CCODEP->TCODEP translation, but still useful for doing static analyses of the occurrences of code sequences in a sysout. cf DSA DOPVALSEARCH -- accumulated q&d for exploring the existing DOPVALs and OPCODE records. DSA -- D-machine Static Analysis. Use DDISASM to explore the existing code usage in a sysout. DT -- D-to-T translation -- actually mostly front-end routines & apparatus for standardized front-end routines. E.g. CO, TCO, ... FRAMENAMECHECK -- q&d for searching for existing CCODEPs whose FrameName NEQ FunctionName LISTOPCODES -- q&ds for that PC -- putative replacement for PRINTCODE. Knows about Tamarins, etc. Several features dropped, but several bugs removed. More information displayed. DPC in DASM for D-machine. TPC in TASM for Tamarin. T2D -- Tamarin-source to D-source conversion, for CCODEP->CCODEP translation for testing the CCODEP->TCODEP translation machinery. No longer of use, and outdated in many specifics as well. TAMGETMACHINE -- called by ASM & PC to find out what their target machine is like. It calls T.GETMACHINE in TASM or D.GETMACHINE in DASM. TASM -- defines a Tamarin to ASM & PC. {eris}work>microtest> Some tests from older days, based on the idea that CCODEPs could be translated (& that the compiler could be persuaded not to optimize). Obsolete now. {eris}work>simulate> TACCESS contains TFNHDR stuff defining TCODEP fn header to ASM & PC. {eris}tut> TUTBASE -- misc stuff used by "all" the TUT Tamarin Ucode Tests TUT1 -- some Tamarin Ucode Tests {eris}ucode> FORCEOPNRS -- run after RELOADEMULATOR & AUCode to (1) check that there are no major incompatibilities between the emulator's (i.e. the microcode's) Tamarin opcodes & those that ASM, PC, and TOP.EDIT access; (2) if not, then impose the opcode numbers from the former onto the latter. "No major incompatibilities" means: set of opcode names is the same AND which opcodes are opK-format (have an implicit argument in the opcode byte) is agreed on AND number of alpha bytes is agreed on. There are some special cases, e.g. microcode NOPs are JUMPks for ASM/PC. If there are "majorincompatibilities" then one should use TOP.EDIT to remove them, then rerun FORCEOPNRS. If FORCEOPNRS actually does anything then one should use TOP.EDIT to save to the file TAMOPS. OCTALATOR -- some tools I used for converting decimal/hex/octal & absolute/relative byte pointers (PCs) <-> addresses. MICROASSEMBLER -- includes the "Loader" functions discussed elsewhere, e.g. AddCode, LoadFnHdr, LinkCode, AddUfns MODULES & MAJOR ENTRY POINTS GROUPED / DESCRIBED TCODEPs TFNHDR & TFNHDR.EVN in TACCESS define Tamarin function header to ASM & PC. Note that some fields of the fn hdr have different formats in the D-machine (after TASMV, during PC) than in the Tamarin (after loading/linking, during emulation). Some fields are not defined until loading. Opcodes are known to ASM & PC via \TAMOPCODEARRAY and the TOPCODE records of opcodes. Ultimately from the file TAMOPS, maintained by TOP.EDIT. FORCEOPNRS (q.v. under catalog of files) is used to keep the TAMOPS opcode list in line with the ucode opcode list after each AUCode. Much of what a Tamarin is like is made known to ASM & PC via TAMGETMACHINE & the file TASM, especially: T.GETMACHINE loads a lot of descriptive specvars for ASM/PC; the TASMFN properties of CARs-of-instructions are essentially macros for ASM. More under ASM. TASMV in TASM, which is mostly really ASM, assembles "source code" (which is the value of an atom) into a TCODEP + linking/loading information (on the TCODE property of the atom). More on this below. Major brokennesses that occur to me -- non-(lambda-spread) functions semi-disabled; no support of overflow areas; obsolete name table format. TPC in TASM, which is mostly really PC, PrintCodes a TCODEP. Linking/loading is set up for by ASM, then done by functions in MICROASSEMBLER: AddCode, LoadFnHdr, LinkCode. The point is that the actual location of atoms, cons cells, etc is not known til the TCODEP has been placed in Tamarin memory. Hints-- If you want to change the function header format, you will probably have to change TFNHDR, parts of T.GETMACHINE, the emit-fn-hdr part of ASM, PC, LoadFnHdr, tamSetUp.trivialFnHdr in TUTBASE. If you want to add linking/loading for more data types, you will probably have to change the code-generation phase of ASM, and LinkCode. If you want to add more OPPRINTs or LEVADJs for new opcodes, you will probably have to change the stack-modelling &/or code-generation phases of ASM. You should also bring DOPCODEREC.tedit up to date. You will also need to bring TAMOPS up to date via TOP.EDIT. If you want to bring the name tables up to date, you will probably have to change the variable-allocation phase of ASM, perhaps add a variable-reallocation phase after the stack-modelling phase, change the "emit fn hdr" phase, change PC, perhaps change LoadFnHdr. If you want to change variable-reference stuff, you will probably have to change the variable-allocation phase of ASM (at the end of "eat fn hdr"), the varible-reference-generic phase of ASM, the "emit fn hdr" phase of ASM, and probably PC. If you want to change jump opcodes, you may have to change the stack-modelling phase of ASM, the jump-length phase, the code-generation phase, PC. If you want to change vanilla opcodes, you will probably have to change the stack-modelling & code-generation phases of ASM. And perhaps TAMOPS via TOP.EDIT. Load order-- Have TACCESS loaded. Loading DT will bring in everything you need, which is: part of DT; PC, TASM, ASM, TAMGETMACHINE. Unfortunately DT will bring in lots of other stuff to. This could be fixed by editing out of DT those front-end entry-points you don't need, & editing the FILES statements. FORCEOPNRs load explicitly. CCODEPs DDISASM converts a CCODEP to D-machine assembler source code. This can be useful for static analysis of existing code, such as DSA does. D->T translation -- obsolete DT was originally conceived as controlling a sequence DDISASM, D2T, TASM, that would convert a CCODEP to a TCODEP. This is largely broken: too hard given the broad distribution of obscure, macro-generated low-level code, & other problems. As a means of testing this, a "D to D" apparatus was set up: DDISASM then D2T but then T2D and DASM. If everything is working, the resulting function should be equivalent to the original one. Obsolete & broken now. But that's why ASM has to be told what a Tamarin is like (that and, there was once a theory of variant Tamarins). Tamarin ucode tests TUTBASE (1) adds a little to the D-machine opcode set; (2) contains initialization & setup stuff for AssembleOp-ing into the emulator; (3) extends TASMV by adding macros specifically for test-building. More below. TUT1 contains some specific tests built over TUTBASE. Miscellaneous OCTALATOR is useful in probing emulator printouts. Other stuff as described in the Catalog of Files section above. ASSEMBLER SOURCE CODE FORMAT The atom that is an argument to TASMV, has as its value a list that is "assembler source code". This list describes the TCODEP that is to be attached to the atom as its TCODE property. (TASMV will also accept a list of such atoms.) Let me start with a simple case of assembler source code and work up. I will not try to show everything that is possible. A trivial case: a function of no arguments that returns 5. [(LAMBDA: FOO NIL) CODE: (SICX 5) (RETURN] This is a list of (pseudo-)instructions & (pseudo-)labels. The first instruction declares the function name, type, and arguments. It looks a lot like it would in Lisp. CAR can be LAMBDA: or NLAMBDA:, the argument list can be a list of variable names, or a variable name. Only LAMBDA spread functions are fully supported; the others being semi-disabled. For ucode tests only lambda spreads are needed. Then comes a section (null in this example) that declares ivars, pvars, and fvars. Then the label CODE: . Then instructions. The entry vector is build by the use of special labels in this section. Instruction format generally. Comments: (* stuff). True instructions: (opcode [inline args]). TASMFN generics. These may or may not look like true instructions. Simple cases are like SIC (=> '0, '1, SICX, SICXX, or ICONST as needed) and GCONST (even more general). Non-simple cases like MakeTest, described in a following section, which builds non-trivial sequences of instructions. Built-in generics also look like true instructions. Specifically these are variable-reference & jump generics, which are treated specially so they can e.g. generate an appropriate opcode after the code has been studied for a while. Assembler directives have ':'s ending their names. More on generic handling. In an early phase of ASM, those instructions whose CAR has a TASMFN property, get their TASMFN called with one argument: the instruction. The TASMFN can generate code in either of two ways. If it wants to generate instructions whose TASMFN generics will get expanded, it can call ASM.EATCODE with argument = a list of instructions. If it does not want the instructions it generates to be examined for TASMFN generics (it wants the "original" definitions used) then it should return a a list of instructions. Actually this is not either/or: it can call ASM.EATCODE as much as it wants to, which generates that code in order, and it must return a list of instructions, perhaps NIL. Warning: The TASMFNs will in general be called more than once on each instruction. They should generate the same code each time they are called on a particular (instance of an) instruction. ASM.NEXTLABEL, for example, helps in this. Note: current stack modelling information is available to TASMFNs. Then, after this phase, the variable-reference generics get treated. These are deferred to this point so that stack size can be known and variable slots can be pushed into an overflow area before the variable-reference generics turn into actual opcodes. Then the jump generics get treated. These get deferred until last so that the length of all the other code is known, so that the shortest appropriate jump opcodes can be used. Then the code is generated. Note that no system-generic has the same name as a true opcode. Thus one can always specify the exact opcode one wants, but can use the generics when that is more convenient. Format of true-opcode instructions. Is determined by the TOPCODE record of the opcode. If the OP# is a list, then there is an argument which is built into the opcode byte. If OPNARGS > 0 then there is an argument. If there is an argument, the default is that it is presented in the instruction as one number, which supplies all the alpha bytes. This is overridden by the OPPRINT. If an opcode's in-line argument is an atom#, a large integer, or an arbitrary objects, then the opcode's instruction format has an atom, integer, or arbitrary object as the single argument. This requires no further discussion as it works as you'd expect. The exception is that restrictions in the loading/linking apparatus currently require "arbitrary objects" (e.g. PCONST arguments) to be integers or atoms. Variable-reference opcodes & generics, and jump opcodes & generics are treated in two sections below. Variable references. Gvars are not variable-references in this sense. They are atom references. The ovar and mvar opcodes' treatment by the assembler is not debugged at this point. I am hoping to get by with explicit numeric arguments. This leaves ivars, pvars, and fvars. The following function has two of each. All of them will show up in the name table. There is a SETTOPVAL in there too. [(LAMBDA: FOO (I1 I2)) (VARS: I1 I2) (VARS: P1 P2) (FVARS: F1 F2) CODE: (VAR I1) (VAR I2) (PLUS) (VAR P1) (VAR P2) (PLUS) (PLUS) (FVAR F1) (FVAR F2) (PLUS) (PLUS) (GVAR_ G1) (RETURN] All the variables were declared with VARS: or FVARS: statements before the CODE: pseudolabel. The ivars end up being declared twice. This is because the assembler has to be told if they are in the name table. An ivar or pvar need not be in the name table. The variable-declaration section of the code can be split into two parts by a LOCAL: pseudolabel. Then the declarations after the LOCAL: do not cause entries in the name table. ASM tries to allocate the variables' stack-frame slots in the order the variables are declared, but adapting to whatever restrictions it currently believes in, such as pvars before fvars. ASM does not know how to build an overflow area. Note that all the instructions in the above example use generics. One could specify e.g. VARK or VARX, in which case it an error would be signalled if the variable-slot were out of range. Note that a variable name need not be an atom. (1) A variable name can also be a list of an atom and a number (only the atom is entered into the name table; thus one can have more than one variable with the same name-table name. (2) A variable-reference opcode or generic will take a number as argument (the explicit frame-slot number desired). Note that variable-reference generics are turned into opcodes late in the game by a little expert that tries for shortest code sequences. The built-in variable-reference generics recognized are: VAR, VAR_, VAR_^, FVAR, FVAR_, FVAR_^. Jumps. [(LAMBDA: FOO (X)) (VARS: X) CODE: (VAR X) (INTEGERP) (FJUMP nonInteger) (VAR X) (RETURN) nonInteger ('NIL) (RETURN] Any non-list in the assembler source, after the CODE: pseudolabel, is a label. Even numbers. Jump opcodes & generics require labels as arguments. The FJUMP in the example is a generic. Note that jump generics are turned into opcodes late in the game by a little expert that tries for shortest code sequences. The built-in jump generics recognized are: JUMP, TJUMP, FJUMP, NTJUMP, NFJUMP. D/UNBIND. The UNBIND & DUNBIND opcodes can take their stack-level argument in either of two forms. (1) A label. The stack level referred to is that current at that location in the code. (2) A list of one number. The explicit stack level. Note that the UNBIND instruction's argument describes the stack level before old-TOS is pushed back onto the stack. This is different from the UNBIND opcode. You only notice this if you specify the target stack level explicitly. Entry Vector. T.GETMACHINE etc contain an apparatus for getting default entry vector code inserted per function type & number of arguments. This doesn't currently do anything. The default if for all 8 entry points to be at the first byte of the code. This can be overridden in two ways. The rule is: for each number-of-arguments (say 3), the entry point is at: if there is a label ENTRYk (say ENTRY3) then that, else if there is a label ENTRY then that, else the first byte of the code. Thus for example a function that returns T if called with either 2 or 3 arguments, otherwise STOPs: [(LAMBDA: FOO NIL) CODE: (STOP) ENTRY2 ENTRY3 ('T) (RETURN] PHASES OF ASM This is also a running commentary on ASMCOMS. ... ASM binds many specvars, in groups. ASM.1 is the top level flow of control. input header & name tables i.e. read the source code thru the CODE: pseudolabel. Includes allocation of variable slots, at this writing. Includes some consistency checking. Computes some specvars. Does not emit anything. code pass 1 i.e. prepend any default entry vector code, then repeatedly pass the input list (1) expanding macros, emitting to OLT , (2) stack modelling. Until stack modelling is as done as it can be, or we give up in disgust. These passes are normally done with error messages disabled. If there are errors there is one more pass with error messages enabled. Stack modelling is quite detailed. This whole mechanism is hairier than it need be now, but each hair was needed at some point. Usually you can leave the basic apparatus alone. But you often need to add new stack-effect-of-a-LEVADJ cases. Macros are just TASMFNs, as discussed above. code pass 2 i.e. (0) reallocate variable slots, generate overflow areas, etc (1) now that variable-slots are allocated, variable-reference generics can generate code. (2) now that all other code is fixed , jump generics can decide how long they have to be. If I remember correctly, this "pass" is done so: first pass all code, doing variable-reference generics & collecting a list of all jumps & labels. Then iteratively pass this list computing length of jump instructions until stabilizes. emit header & name tables i.e. now we now the size of the code we can make a codearray. Then build header & name tables into it. code pass 3 i.e. emit code. The cases per OPPRINT are something you may well have to add to. The variable-reference & jump-generic "experts" here are also used in "code pass 2". ... The ASM.ERR.xxx functions are those where the same error message may be emitted in the same place on several passes. Doing them thru a function guarantees that EQness can be used to detect duplication. Not really meaningful anymore? ... MISC NOTES ON ASM Error handling. Most errors cause messages to T. ASM actually outputs the TCODEP if it can. Thus TPC together with the messages to T is usually enough to find the problem. If it's not, look at the value of \ASM.OUTPUTLISTING after the ASM. All the stuff that was to run under DT "breaks" via ASM.HELP. If some output file variable is bound then this bitches to that file & throws some kind of error; else it breaks. More on error handling mechanisms in AsmInternals.tedit, perhaps D2T.tedit. Machine description. Of course ASM & PC actually know a lot about Tamarins & D-machines. But much is also parameterized. The mechanism is: there are certain specvars (named "TAM.xxx"); ASM or PC bind some of these then call TAMGETMACHINE in TAMGETMACHINE. This looks at its machine-type argument and calls T.GETMACHINE in TASM or D.GETMACHINE in TASM. These two functions set whatever of the standard set of specvars are bound. They are the only functions that do this. They also document what the specvars mean & what their initialization of them means, to some extent. This is real handy. MASTERSCOPEing for these specvars is also useful. The other places that know details about the Tamarin are: (1) the TOPCODE records, \TOPCODEARRAY, etc; (2) see above section Function Header under TCODEPs under Modules Described; (3) see ... Loader ... . Variable Allocation. At this writing: Stack frame has 6 words of header. Then 2 words unused. Then 7 words reserved for ivars. Then 1 word unused. Var allocation starts with the next (16th) word. No treatment of frame overflow..`Hx(0H(00(0(?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) €GACHA GACHA €?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) +1$   ·ê~ l †HdPx<0h TW\Z̾ ~#˜E9 ò nr1:O |i 6U&1$àgÜ äÉyz (ŠñMØ6 3@f<  ;ZS\+$¼(4¬ ä@ÿ±°#\fLŸ        Ó~^î½\ì       ^5'Í èç £Kïd .MÇ Þ Di ©ëã²L.GÐÔSèjzº