BreakpointsDoc.tioga
Peter B. Kessler, April 16, 1990 3:22 pm PDT
Breakpoints (SPARCBreaks)
CEDAR 10.0 FOR INTERNAL XEROX USE ONLY
Breakpoints
Peter B. Kessler
Ó Copyright 1989 Xerox Corporation. All rights reserved.
Abstract: SPARCBreaks provides breakpoints for SPARCs. After some setup, you register a proc to be called (with some client data) when execution reaches a particular instruction.
Created by: Peter B. Kessler
Maintained by: 
Keywords: breakpoint, SPARC, debugging, performance monitoring
Further reading: ``Fast Breakpoints: Design and Implementation'', in Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, June 1990.
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
1. Programmer's Interface
Stuff you are expecting to find
You probably want a procedure to set a breakpoint. Breakpoint.SetBreakpoint is it. Then you'll want to clear the breakpoint. That's Breakpoint.ClearBreakpoint. What could be simpler? The arguments to those procedures, for one thing. Basically you give Breakpoint.SetBreakpoint an address, a procedure and some client data, and it promises that when execution reaches that address, your procedure will be called with your client data. The hard part is specifying the address. There's a lot of hair around separating the access to this code from the world where it's planting the breakpoints, see the section on stuff you wish you didn't need.
Stuff you might not be expecting to find
All the breakpoints currently in place (in a given world) can be enumerated by calling Breakpoint.EnumerateBreakpoints. Note that if more than one program has been planting breakpoints, each program only enumerates its own breakpoints. (However one can have at most one active breakpoint at each address. This is checked for.) When you create the breakpoint with Breakpoint.SetBreakpoint, you also give it some client data about the breakpoint. When you enumerate the breakpoints, you are called back with that client data for each breakpoint. This client data is different from the client data that gets passed to your registered procedure when the breakpoint is hit. The breakpoint enumerator and the breakpoints it plants might be in diferent address spaces. The client data for the enumeration callback is a REF ANY, since it doesn't go over to the address space where the breakpoints are. The client data for the procedure that gets called from the breakpoint is just a CARD32, since it's over there in the breakpoint world.
Stuff you wish you didn't need (but you do)
The breakpoint code is intentionally insulated from the address space in which it is planting the breakpoints. So, you have to provide with accessors to that world. That's what all the stuff in BreakWorldArchitecture is for. You have to provide procedures to
Peek at the contents of a location in that address space,
Poke the contents of a location in that address space,
Get the address of a procedure in that address space,
Find a patch area for an address in that address space, and
Execute a call (locally), while holding a monitor lock on the breakpoint world.
What you do is to collect these procedures (e.g. by loading them and referring to them through some interface) and pass them to BreakWorldArchitecture.CreateBreakWorld, and it hands you back a world data structure. Then you can use BreakWorldArchitecture.NewAddress to construct a BreakWorldArchitecture.Address to hand off to procedures like Breakpoint.SetBreakpoint. For all the standard worlds (e.g. Cirio remote debugging, Cirio local debugging, and setting breakpoints in your very own local address space) these routines have been provided. See Cirio, or FakeCirio and its implementation by FakeLocalCirioImpl and FakeRemoteCirioImpl.
There are two breakpoint packages exported by SPARCBreaks-Suite. The one you probably want is CirioBreaksPackage, which exports to the Breakpoint interface procedures to set and clear breakpoints when you are using Cirio. The other one is SPARCBreaksPackage, which is marginally faster, and is the one I use for getting timings for papers, etc.
2. What really happens
Self-Modifying code
Breakpoints are planted by replacing the instruction at the breakpoint address with a transfer to some code that calls your registered procedure. The breakpoint code is assembled out of line in a patch area left by the incremental loader (or, in the grand and glorious future, allocated by the breakpoint package). Before calling your procedure, the breakpoint code saves all the global state (think, registers) it thinks you can get your hands on, and after your procedure returns the breakpoint code restores that state and executes a replacement for the instruction it removed to plant the transfer to the breakpoint code. Clearing a breakpoint is just replacing the instruction that was removed to plant the transfer. Breakpoint code space can not be reclaimed, though is there's enough pressure, I might write a procedure to free the patch space.
Nitty gritty details
The patch space for each breakpoint contains code to save the global state of the machine, call your procedure, and restore the global state. This code, called the closure caller, has the same form for each breakpoint, though the details of the particular procedure to call, and the client data for the call, are ``compiled into'' the code, so each patch space is unique. There are, so far, two different types of closure callers: one for speed, and one to help out the debugger. the only difference is that the one that helps out the debugger sets the return address register to the address of the breakpoint, so the deubgger can make sense of the extra frame that the breakpoint code pushes onto the register window and memory stacks to hold the saved state around the closure call.
The closure caller looks like (this one has the setting of the return address register):
save %sp,-(stackPointerOffset+stackAllocationForCallee+registerSaveArea),%sp
  -- allocate a register window and a register save area.
call ←save←regs
add %sp, stackPointerOffset+stackAllocationForCallee, %o0
  -- save←regs(@registerSaveArea).
sethi %hi(return�ress), %i7
or  %i7, %lo(return�ress), %i7
  -- set up ``return address''.
sethi %hi(clientData), %o0
callclientProc
or  %lo(clientData), %o0
  -- clientProc(clientData).
callrestore←regs
add %sp, stackPointerOffset + stackAllocationForCallee, %o0
  -- restore←regs(@registerSaveArea).
restore %sp,+(stackPointerOffset+stackAllocationForCallee+registerSaveArea),%sp
  -- deallocate the register window and register save area.
where the constants stackPointerOffset and stackAllocationForCallee are defined and described in SPARCArchitecture.mesa, and registerSaveArea is defined in either SPARCBreakpoint.mesa or CirioBreakpoint.mesa, depending on which closure caller you use. Note that registerSaveArea really depends on the runtime routines for saving and restoring registers (available though CirioThings-Suite.df), and should be gotten by calling ←reg←state←size, but I don't do that because I generate the code on one side of the wall, and ←reg←state←size lives on the other side of the wall. There are actually several resgister saving and restoring procedures available. The most general one, ←save←regs, saves the floating point registers, the in registers of a new register window, and the global registers. If you know you aren't going to be touching, say, the floating point registers in your breakpoint procedure, you could make up a closure caller that called ←save←regs←mini, which just saves the in and the global registers. Then you need to call ←restore←regs←mini to restore only those registers. Not saving and restoring the floating point registers saves more than half the time of the breakpoint code.
Following the closure caller in the patch is the code, called the manger, to simulate the instruction that was replaced with the branch to the patch. So far I've distinguished five different classes of instructions on which breakpoints can be set, and each has it's own manger. Each class of instruction also needs a different instruction to transfer to patch code, so those two issues are discussed together. The different mangers come in different sizes, though for reasons I'll explain below, I always allocate the maximum size.
The simplest manger is for an instruction which does not have a delay slot. All that is needed to transfer to the patch space is a ba,a instruction (assuming the patch space is within 16M bytes of the breakpoint address). The manger for the normal case looks like:
instruction'  -- relocated instruction.
ba,a continue -- branch to the instruction after the breakpoint.
Most control transfers on the SPARC have a delay slot, in which the instruction after the transfer instruction executes before the transfer takes effect. Conditional control transfer instructions must be relocated because they use PC-relative addressing of the destination. We must not execute the instruction in the delay during the transfer to the patch space, but must execute it in the delay slot of the relocation transfer instruction in the patch space. The manger for conditional branch instructions (generically, bicc) looks like:
bicc destination -- relocated conditional branch instruction.
delaySlot  -- the instruction in the delay slot of the original branch.
ba,a continue -- branch to the instruction after the original delay slot.
The transfer to the patch is accomplished with a ba,a instruction, as in the normal case.
The call instruction is a delayed control transfer which also writes the address of the instruction (the return address) into the return address register (register %o7). For various reasons we need the return address pointing at the reall return address, so we can't just use an ordinary call instruction from the patch space. However, we know the address that is being called (by decoding the instruction). The technique we use is to use a call instruction to transfer to the patch space, and have the patch space transfer to the called procedure with a jump. The instruction in the delay slot of the original call is executed in the delay slot of the call to the patch space. That's okay, since the instruction in the delay slot of a call is usually part of the argument set up, since the caller can't know the first instruction of the callee. The manger for the patch a ``call Foo'' contains:
sethi %hi(Foo), %g1  -- set up %g1 with called procedure address.
jmpl %g1+%lo(Foo), %g0  -- transfer to Foo, without saving a return address.
nop  -- needed, since the jmpl has a delay slot.
Calls can also be made with the address of the procedure in a register (or at an offset from a register). These are handled similarly to the direct call case above, with a call to the patch space. The manger for a patch of a ``jmpl %r+offset,%o7'' contains:
jmpl %r+offset, %g0  -- transfer to %r+offset, without saving a return address.
nop  -- needed, since the jmpl has a delay slot.
where the ``%r+offset'' operand is just copied from the original jmpl instruction. As in the direct call case, the instruction in the delay slot of the original call is executed in the delay slot of the call to the patch space.
Returns are handled similarly, since they are almost indistinguishable from indirect jumps, except that they don't save a return address. Since they don't save a return address, the transfer to the patch space is done with a ba,a instruction, and the instruction in the delay slot of the return is relocated to the delay of the return in the patch space. The manger for a patch of a ret instruction (which is a pseudo-op for a ``jmpl %i7+8,%g0'') contains:
jmpl %i7+8, %g0  -- transfer to %7+8, without saving a return address.
delaySlot  -- the instruction in the delay slot of the original return.
The case of a retl instruction (which is a pseudo-op for a ``jmpl %o7+8,%g0'') is handled similiarly, by changing the source operand of the jmpl instruction in the manger.
The different types of mangers are distinguished by writing an extra word at the end of the manger which happens to be a ``sethi %hi(tag), %g0'', where tag is a member of the enumeration SPARCManger.MangerVariant. This instruction is never actually executed, but it is an executable instruction for historical reasons. When it comes time to uninstall a breakpoint, the original instruction can be reconstructed by examining the tag and doing the appropriate reconstruction from the instructions in the manger. Patches are also prefixed (by the Shepherd code) with the address of the breakpoint, so that completes the information needed to uninstall a breakpoint.
3. Testing and Timing
Testing is accomplished by running the TestBreakpoints.cm file. That loads up all the stuff you need. Then you can issue the command ``TestBreakpointFlavors'', to check out all 5 different kinds of breakpoints (including those on conditional branches that do and do not branch), and also test clearing breakpoints. The output should be self-explanatory.
Timing is accomplished by running TestBreakpoint.pcr, in as little of PCR as you can (e.g., ``CedarBasement TestBreakpoint.pcr''). When it is done loading, invoke ``CallAll ←TimeBreakpoint←P720'' (or whatever its unique name is) and it will print out the timing statistics when it's all done (so the printing doesn't perturb the times). As of now, on a SPARCstation1, normal breaks take 24.6 microseconds and fast breaks (not saving the floating point registers) take 11.2 microseconds.
4. Shortfalls and Wishlist
There are probably a few kinds of instructions on which you'd like to plant breaks, but you can't, because I haven't figured out what the manger's for them look like. Let me know and I'll think about how to do it.
You can't plant breakpoints on the instruction in a delay slot. I can't figure out exactly why you'd want to do this, and I've figured out a way to get it to work on a uni-processor, but it seems like it's only of academic interest. If you really need it, let me know.
The amount of patch space per module is (severely) limited. I think the default is to give you space for at least 4 patches per mdoule, plus some number depending on the size of the module you are loading. Alan claims to have a callback in the loader for changing those numbers, but I haven't been able to get him to show me how to change it, so I don't know how difficult it is to change. There's also a plan to ask the storage allocator for some space ``close enough'' to the code for the module to plant breakpoints. That certainly isn't there in the Weiser allocator, and it's not clear it will even make it into any of the early versions of the Boehm allocator. If someone goes and does it, they should remember that I'm willing to take (and use) more space than I ask for, so if they can't find the perfect chunk of memory, they can hand me back a bigger chunk. When this happens, we will have to change the patch data structure to have a link field at the front of it, and teach FirstPatch and NextPatch to walk the links. I suppose I could go put in the field and the code now, but I wouldn't be able to test it, and it might confuse someone reading the code.