Internal Memo

XEROX
To    From
Interested Parties    Dick Sweet
    CSL


Subject    Date
Dragoman Memory Model    July 31, 1985


Dragoman Memory Model
    The Dragoman interpreter was written to provided an easily instrumented execution engine for PrincOps bytecodes.  The desire was to 
    gather statistics from essentially arbitrary Cedar programs rather than a few hand coded benchmarks.  While PrincOps 
    instructions are not the same as DragOps, you can wave your hands and claim that that statistics obtained from PrincOps 
    simulations are good enough for order-of-magnitude sorts of decisions.
    While the original goal of Dragoman was ease of instrumentation, it was also desired that it run reasonably fast.  It is currently 
    two to three orders of magnitude slower than Dorado microcode.  It is believed that none of my attempts at execution speed 
    tuning have invalidated any of the memory  modelling.  It's fair to say that quite a bit more speed tuning is possible.
    The following sections attempt to describe exactly what the interpreter tells the cache model about its memory usage.
Data reference modelling
    The Dragon is a 32 bit machine; it has a 32 bit word size, and an addressing granularity of 32 bits.  The PrincOps architecture is 
    16 bits on both counts.  Some portion of the local variables of the Dragon are stored in registers.  Attempts have been made to 
    deal with each of these differences.
Pointer references to data
    Whenever a pointer is derefenced, a procedure (DragomanUtil.ReadAtAddress or DragomanUtil.WriteAtAddress) is called with the 
    value of the pointer.  This procedure determines which word of memory would hold the referent if memory were of 32 bit 
    granularity.  It then determines which cache (if there are multiple caches) would hold the data. This is done by selecting on 
    the low order bits of the page number.  Finally, it notifies the appropriate cache model of the Fetch or Store.  For doubleword 
    fetches, it only notifies the cache model for the Dragon word containing the first word of the doubleword.  I had earlier 
    decided to do two fetches if the doubleword crossed a Dragon word boundary, but decided that this gave an unrealisticly high 
    value to memory usage, since the Dragon/Cedar compiler will put such data into a single word.
Local Variable References
    There is a named constant in DragomanUtil called LocalsInRing (equated to 16).  For references to local variables, if the the 
    offset is smaller than LocalsInRing, there is no reporting of the reference to the cache model.  Otherwise, it fabricates the 
    long pointer of the address of the variable and simulates a pointer reference.
Instruction reference model
    All instruction data are obtain from a single procedure, DragomanUtil.NextOpByte.  This procedure attempts to simulate an 
    instruction buffer on the IFU chip.  The size of this buffer, iBufferSize, is a variable, but is defaulted to 16 bytes.  The 
    IFU will periodically fetch words to fill this buffer.  After discussions with Ed McCreight, I decided to model this by having 
    the procedure check each time the first byte of an instruction is fetched; if there is room in the buffer for another word's 
    worth of bytes, then a word is fetched from the cache.  If there is more than one instruction cache being modelled, NextOpByte 
    selects the proper cache by looking at the low order bits of the page address.
    Executing a jump instruction (or an XFER) causes the buffer to be emptied. On the next call of NextOpByte, the word containing the 
    byte associated with the new program counter will be fetched.  Note that if the new pc is not at the beginning of a word, an 
    additional word may have to be fetched to satisfy all of the data bytes of the next instruction (the additional word would 
    otherwise be fetched when the opcode byte of the next instruction is fetched).

Interpretation/direct execution strategies
    There are areas of the system where it is undesirable to run interpretively.  For example, code that acquires system monitors is 
    typically expected to hold them for only a short time.  Another example is a program that computes for a long time then writes 
    out results; it is the computation that wants to be instrumented, not the I/O operations.  For these reasons, Dragoman checks 
    on each XFER to see whether to interpretively execute the target procedure, or to call directly, executing in microcode.  
    Dragoman installs a slightly modified signaller, so that signals raised by the directly executed code can be successfully 
    caught by that which is being interpreted.
    Dragoman bases its interpret/execute decision of the gfi of the destination.  Every module instance in Cedar has a global frame.  
    Associated with this frame is an index into a global frame table.  This global frame index, or gfi, is part of a procedure 
    descriptor.  Dragoman keeps a bit vector of gfi's that it is willing to interpret; for all others, it calls directly.  It might 
    have been better to determine those modules that cannot be interpreted, and keep a bit vector with the opposite sense of the 
    existing one.  Nonetheless, this was not done, so some care must be made in specifying the set of interesting modules.
    As a rule, I have successfully enabled interpretation of all gfi larger than some number, say 400.  I have then used the ability of 
    the interpreter to keep a list of directly called procedures and see if there is something that should really be interpreted.  
    I  have added these modules individually to th e interesting list and iterated this process until the set of executed 
    procedures appears to be suitably down in the bowels of the system that I wouldn't want to interpret them.