In making timing estimates, the assumption is that the IFU remains caught up so that no extra cycles are imposed waiting for the IFU to fetch the next instruction. It is possible for the IFU to get behind if it encounters too many 5-byte opcodes, but this should be unusual. The total time for an instruction's execution is the sum of the IFU cycles required plus the number of wait cycles imposed by the memory system. Under the assumption that there are no wait cycles, most instructions require 1 cycle; however, the following categories of instructions require more than 1 cycle: loads, jumps, procedure calls and returns, Xops, conditional jumps, and bounds checks.
X.1 Loads
A read of data that is present in the cache requires 1 cycle for execution. If that data is referenced in the next cycle, by storing it into another location, moving it to another EU register, or combining it with other values, an extra cycle is required to allow time for the data to reach the EU. (In principle, storing the data to another location immediately after fetching need not require an extra cycle, but the bypass paths to make this case fast do not exist in the EU.)
If the address for a read hits in the cache but the data misses, the read costs 7 cycles. If both the address and data miss in the local cache, but the address hits in the map cache, a read takes 10 cycles. If the map cache does not contain the address, the read will take from 100-200 cycles.
X.2 Stores
A write to a memory location that is present in the cache requires 1 cycle in all cases. If the address for a write hits in the cache but the data misses, the write costs 10 cycles. If both the address and data miss in the local cache, but the address hits in the map cache, a write takes 13 cycles. If the map cache does not contain the address, the write will take from 100-200 cycles.
X.3 Jumps, Procedure Calls and Returns, Xops
Jumps, procedure calls and returns, and Xops take 1 extra cycle. If the instruction at the target location crosses a word boundary, an additional cycle is required, so that in the best case an indefinitely long series of jump opcodes, each fitting within a single word, takes 2 cycles per jump. Note that the four jump instructions, J1, J2, J3, and J5, are implemented as no-ops, so they do not require the extra cycle needed by other jumps. KFC is 1 cycle slower than other procedure calls; it requires 3 cycles (4 if the target instruction crosses a word boundary).
Note: The trap location is always the first byte of a word so that for Xops, KFC, traps, and bounds-checking instructions, the first instruction in the trap procedure will cross a word-boundary only if it is a 5-byte long. In addition, the compiler will normally choose to begin all procedures in the first byte of a word, so extra cycles will be unusual on procedure calls.
X.4 Conditional Jumps
A conditional jump correctly predicted to jump takes 2 cycles. A conditional jump correctly predicted not to jump takes 1 cycle. An incorrectly predicted conditional jump takes 5 cycles (6 if the target crosses a word boundary) whether the incorrect prediction was to jump or not to jump.
X.5 Bounds Checks
The bounds-checking instructions, RBC, QBC, and BC, require 1 cycle if they don't trap and 5 cycles if they do. If the first instruction of the trap procedure crosses a word boundary, then the trap requires 1 extra cycle. (Thus, these opcodes have the same timing as incorrectly predicted conditional jumps when they trap.)
X.6 Additional Instructions with Special Timing Considerations
X.6.1 LIP and SIP
The LIP and SIP instructions that address EU registers require 1 cycle.. However, LIP and SIP on any IFU register require 5 cycles. (See the definition for ProcessorRegister in Chapter 2.0 for a complete listing of the operands LIP and SIP instructions can use.) For SIP these extra cycles are needed to ensure no conflict between the IFU register being written and the opcodes which follow the SIP, and to ensure that the K bus is not used by another instruction when the SIP needs it. For LIP these cycles are needed to ensure no conflict with the instructions immediately preceding the LIP.
X.6.2 SFC, SFCI, SJ, and CST
In addition, to the instruactions discussed above, the SFC, SFCI, SJ, and CST instructions have special timing. SFC, SFCI, and SJ require 5 cycles, and CST requires 6 cycles plus the number of wait cycles imposed by the memory. With one level of cache between the EU and the M bus, the number of wait cycles is at least 2 because that is the cost of CST acquiring the M bus lock on its fetch-and-hold memory reference.