Internal Memo XEROX To From Interested Parties Dick Sweet CSL Subject Date Dragoman Memory Model July 31, 1985 Dragoman Memory Model The Dragoman interpreter was written to provided an easily instrumented execution engine for PrincOps bytecodes. The desire was to gather statistics from essentially arbitrary Cedar programs rather than a few hand coded benchmarks. While PrincOps instructions are not the same as DragOps, you can wave your hands and claim that that statistics obtained from PrincOps simulations are good enough for order-of-magnitude sorts of decisions. While the original goal of Dragoman was ease of instrumentation, it was also desired that it run reasonably fast. It is currently two to three orders of magnitude slower than Dorado microcode. It is believed that none of my attempts at execution speed tuning have invalidated any of the memory modelling. It's fair to say that quite a bit more speed tuning is possible. The following sections attempt to describe exactly what the interpreter tells the cache model about its memory usage. Data reference modelling The Dragon is a 32 bit machine; it has a 32 bit word size, and an addressing granularity of 32 bits. The PrincOps architecture is 16 bits on both counts. Some portion of the local variables of the Dragon are stored in registers. Attempts have been made to deal with each of these differences. Pointer references to data Whenever a pointer is derefenced, a procedure (DragomanUtil.ReadAtAddress or DragomanUtil.WriteAtAddress) is called with the value of the pointer. This procedure determines which word of memory would hold the referent if memory were of 32 bit granularity. It then determines which cache (if there are multiple caches) would hold the data. This is done by selecting on the low order bits of the page number. Finally, it notifies the appropriate cache model of the Fetch or Store. For doubleword fetches, it only notifies the cache model for the Dragon word containing the first word of the doubleword. I had earlier decided to do two fetches if the doubleword crossed a Dragon word boundary, but decided that this gave an unrealisticly high value to memory usage, since the Dragon/Cedar compiler will put such data into a single word. Local Variable References There is a named constant in DragomanUtil called LocalsInRing (equated to 16). For references to local variables, if the the offset is smaller than LocalsInRing, there is no reporting of the reference to the cache model. Otherwise, it fabricates the long pointer of the address of the variable and simulates a pointer reference. Instruction reference model All instruction data are obtain from a single procedure, DragomanUtil.NextOpByte. This procedure attempts to simulate an instruction buffer on the IFU chip. The size of this buffer, iBufferSize, is a variable, but is defaulted to 16 bytes. The IFU will periodically fetch words to fill this buffer. After discussions with Ed McCreight, I decided to model this by having the procedure check each time the first byte of an instruction is fetched; if there is room in the buffer for another word's worth of bytes, then a word is fetched from the cache. If there is more than one instruction cache being modelled, NextOpByte selects the proper cache by looking at the low order bits of the page address. Executing a jump instruction (or an XFER) causes the buffer to be emptied. On the next call of NextOpByte, the word containing the byte associated with the new program counter will be fetched. Note that if the new pc is not at the beginning of a word, an additional word may have to be fetched to satisfy all of the data bytes of the next instruction (the additional word would otherwise be fetched when the opcode byte of the next instruction is fetched). Interpretation/direct execution strategies There are areas of the system where it is undesirable to run interpretively. For example, code that acquires system monitors is typically expected to hold them for only a short time. Another example is a program that computes for a long time then writes out results; it is the computation that wants to be instrumented, not the I/O operations. For these reasons, Dragoman checks on each XFER to see whether to interpretively execute the target procedure, or to call directly, executing in microcode. Dragoman installs a slightly modified signaller, so that signals raised by the directly executed code can be successfully caught by that which is being interpreted. Dragoman bases its interpret/execute decision of the gfi of the destination. Every module instance in Cedar has a global frame. Associated with this frame is an index into a global frame table. This global frame index, or gfi, is part of a procedure descriptor. Dragoman keeps a bit vector of gfi's that it is willing to interpret; for all others, it calls directly. It might have been better to determine those modules that cannot be interpreted, and keep a bit vector with the opposite sense of the existing one. Nonetheless, this was not done, so some care must be made in specifying the set of interesting modules. As a rule, I have successfully enabled interpretation of all gfi larger than some number, say 400. I have then used the ability of the interpreter to keep a list of directly called procedures and see if there is something that should really be interpreted. I have added these modules individually to th e interesting list and iterated this process until the set of executed procedures appears to be suitably down in the bowels of the system that I wouldn't want to interpret them.