<<SPARCBreaksDoc.tioga>>
<<Peter B. Kessler, April 16, 1990 3:22 pm PDT>>
<<>>
SPARCBreaks
PCEDAR 2.0 - FOR INTERNAL XEROX USE ONLY
SPARCBreaks
Peter B. Kessler
Ó  Copyright 1989 Xerox Corporation. All rights reserved.
Abstract: SPARCBreaks provides breakpoints for SPARCs.  After some setup, you register a proc to be called (with some client data) 
when execution reaches a particular instruction.
Created by: Peter B. Kessler
Maintained by: Peter B. Kessler <PeterKessler>
Keywords: breakpoint, SPARC, debugging, performance monitoring
Further reading: ``Fast Breakpoints:  Design and Implementation'', in Proceedings of the ACM SIGPLAN'90 Conference on Programming 
Language Design and Implementation, June 1990.

XEROX            Xerox Corporation
                Palo Alto Research Center
                3333 Coyote Hill Road
                Palo Alto, California 94304

For Internal Xerox Use Only
1. Programmer's Interface
Stuff you are expecting to find
    You probably want a procedure to set a breakpoint.  Breakpoint.SetBreakpoint is it.  Then you'll want to clear the breakpoint.  
    That's Breakpoint.ClearBreakpoint.  What could be simpler?  The arguments to those procedures, for one thing.  Basically you 
    give Breakpoint.SetBreakpoint an address, a procedure and some client data, and it promises that when execution reaches that 
    address, your procedure will be called with your client data.  The hard part is specifying the address.  There's a lot of hair 
    around separating the access to this code from the world where it's planting the breakpoints, see the section on stuff you wish 
    you didn't need.
Stuff you might not be expecting to find
    All the breakpoints currently in place (in  a given world) can be enumerated by calling Breakpoint.EnumerateBreakpoints.  Note that 
    if more than one program has been planting breakpoints, each program only enumerates its own breakpoints.  (However one can 
    have at most one active breakpoint at each address.  This is checked for.)  When you create the breakpoint with 
    Breakpoint.SetBreakpoint, you also give it some client data about the breakpoint.  When you enumerate the breakpoints, you are 
    called back with that client data for each breakpoint.  This client data is different from the client data that gets passed to 
    your registered procedure when the breakpoint is hit.  The breakpoint enumerator and the breakpoints it plants might be in 
    diferent address spaces.  The client data for the enumeration callback is a REF ANY, since it doesn't go over to the address 
    space where the breakpoints are.  The client data for the procedure that gets called from the breakpoint is just a CARD32, 
    since it's over there in the breakpoint world.
Stuff you wish you didn't need (but you do)
    The breakpoint code is intentionally insulated from the address space in which it is planting the breakpoints.  So, you have to 
    provide with accessors to that world.  That's what all the stuff in BreakWorldArchitecture is for.  You have to provide 
    procedures to
    Peek at the contents of a location in that address space,
    Poke the contents of a location in that address space,
    Get the address of a procedure in that address space,
    Find a patch area for an address in that address space, and
    Execute a call (locally), while holding a monitor lock on the breakpoint world.
What you do is to collect these procedures (e.g. by loading them and referring to them through some interface) and pass them to 
BreakWorldArchitecture.CreateBreakWorld, and it hands you back a world data structure.  Then you can use 
BreakWorldArchitecture.NewAddress to construct a BreakWorldArchitecture.Address to hand off to procedures like 
Breakpoint.SetBreakpoint.  For all the standard worlds (e.g. Cirio remote debugging, Cirio local debugging, and setting breakpoints 
in your very own local address space) these routines have been provided.  See Cirio, or FakeCirio and its implementation by 
FakeLocalCirioImpl and FakeRemoteCirioImpl.
    There are two breakpoint packages exported by SPARCBreaks-Suite.  The one you probably want is CirioBreaksPackage, which exports to 
    the Breakpoint interface procedures to set and clear breakpoints when you are using Cirio.  The other one is 
    SPARCBreaksPackage, which is marginally faster, and is the one I use for getting timings for papers, etc.
2. What really happens
Self-Modifying code
    Breakpoints are planted by replacing the instruction at the breakpoint address with a transfer to some code that calls your 
    registered procedure.  The breakpoint code is assembled out of line in a patch area left by the incremental loader (or, in the 
    grand and glorious future, allocated by the breakpoint package).  Before calling your procedure, the breakpoint code saves all 
    the global state (think, registers) it thinks you can get your hands on, and after your procedure returns the breakpoint code 
    restores that state and executes a replacement for the instruction it removed to plant the transfer to the breakpoint code.  
    Clearing a breakpoint is just replacing the instruction that was removed to plant the transfer.  Breakpoint code space can not 
    be reclaimed, though is there's enough pressure, I might write a procedure to free the patch space.
Nitty gritty details
    The patch space for each breakpoint contains code to save the global state of the machine, call your procedure, and restore the 
    global state.  This code, called the closure caller, has the same form for each breakpoint, though the details of the 
    particular procedure to call, and the client data for the call, are ``compiled into'' the code, so each patch space is unique.  
    There are, so far, two different types of closure callers: one for speed, and one to help out the debugger.  the only 
    difference is that the one that helps out the debugger sets the return address register to the address of the breakpoint, so 
    the deubgger can make sense of the extra frame that the breakpoint code pushes onto the register window and memory stacks to 
    hold the saved state around the closure call.
    The closure caller looks like (this one has the setting of the return address register):
    save    %sp,-(stackPointerOffset+stackAllocationForCallee+registerSaveArea),%sp
            -- allocate a register window and a register save area.
    call    _save_regs
    add    %sp, stackPointerOffset+stackAllocationForCallee, %o0
            -- save_regs(@registerSaveArea).
    sethi    %hi(return_address), %i7
    or     %i7, %lo(return_address), %i7
            -- set up ``return address''.
    sethi    %hi(clientData), %o0
    call    clientProc
    or     %lo(clientData), %o0
            -- clientProc(clientData).
    call    _restore_regs
    add    %sp, stackPointerOffset + stackAllocationForCallee, %o0
            -- restore_regs(@registerSaveArea).
    restore    %sp,+(stackPointerOffset+stackAllocationForCallee+registerSaveArea),%sp
            -- deallocate the register window and register save area.
where the constants stackPointerOffset and stackAllocationForCallee are defined and described in SPARCArchitecture.mesa, and 
registerSaveArea is defined in either SPARCBreakpoint.mesa or CirioBreakpoint.mesa, depending on which closure caller you use.  
Note that registerSaveArea really depends on the runtime routines for saving and restoring registers (available though 
CirioThings-Suite.df), and should be gotten by calling _reg_state_size, but I don't do that because I generate the code on one side 
of the wall, and _reg_state_size lives on the other side of the wall.  There are actually several resgister saving and restoring 
procedures available.  The most general one, _save_regs, saves the floating point registers, the in registers of a new register 
window, and the global registers.  If you know you aren't going to be touching, say, the floating point registers in your 
breakpoint procedure, you could make up a closure caller that called _save_regs_mini, which just saves the in and the global 
registers.  Then you need to call _restore_regs_mini to restore only those registers.  Not saving and restoring the floating point 
registers saves more than half the time of the breakpoint code.
    Following the closure caller in the patch is the code, called the manger, to simulate the instruction that was replaced with the 
    branch to the patch.  So far I've distinguished five different classes of instructions on which breakpoints can be set, and 
    each has it's own manger.  Each class of instruction also needs a different instruction to transfer to patch code, so those two 
    issues are discussed together.  The different mangers come in different sizes, though for reasons I'll explain below, I always 
    allocate the maximum size.
    The simplest manger is for an instruction which does not have a delay slot.  All that is needed to transfer to the patch space is a 
    ba,a instruction (assuming the patch space is within 16M bytes of the breakpoint address).  The manger for the normal case 
    looks like:
    instruction'        -- relocated instruction.
    ba,a    continue    -- branch to the instruction after the breakpoint.
    Most control transfers on the SPARC have a delay slot, in which the instruction after the transfer instruction executes before the 
    transfer takes effect.  Conditional control transfer instructions must be relocated because they use PC-relative addressing of 
    the destination.  We must not execute the instruction in the delay during the transfer to the patch space, but must execute it 
    in the delay slot of the relocation transfer instruction in the patch space.  The manger for conditional branch instructions 
    (generically, bicc) looks like:
    bicc    destination    -- relocated conditional branch instruction.
    delaySlot        -- the instruction in the delay slot of the original branch.
    ba,a    continue    -- branch to the instruction after the original delay slot.
The transfer to the patch is accomplished with a ba,a instruction, as in the normal case.
    The call instruction is a delayed control transfer which also writes the address of the instruction (the return address) into the 
    return address register (register %o7).  For various reasons we need the return address pointing at the reall return address, 
    so we can't just use an ordinary call instruction from the patch space.  However, we know the address that is being called (by 
    decoding the instruction).  The technique we use is to use a call instruction to transfer to the patch space, and have the 
    patch space transfer to the called procedure with a jump.  The instruction in the delay slot of the original call is executed 
    in the delay slot of the call to the patch space.  That's okay, since the instruction in the delay slot of a call is usually 
    part of the argument set up, since the caller can't know the first instruction of the callee.  The manger for the patch a 
    ``call Foo'' contains:
    sethi %hi(Foo), %g1        -- set up %g1 with called procedure address.
    jmpl %g1+%lo(Foo), %g0        -- transfer to Foo, without saving a return address.
    nop        -- needed, since the jmpl has a delay slot.
    Calls can also be made with the address of the procedure in a register (or at an offset from a register).  These are handled 
    similarly to the direct call case above, with a call to the patch space.  The manger for a patch of a ``jmpl %r+offset,%o7'' 
    contains:
    jmpl %r+offset, %g0        -- transfer to %r+offset, without saving a return address.
    nop        -- needed, since the jmpl has a delay slot.
where the ``%r+offset'' operand is just copied from the original jmpl instruction.  As in the direct call case, the instruction in 
the delay slot of the original call is executed in the delay slot of the call to the patch space.
    Returns are handled similarly, since they are almost indistinguishable from indirect jumps, except that they don't save a return 
    address.  Since they don't save a return address, the transfer to the patch space is done with a ba,a instruction, and the 
    instruction in the delay slot of the return is relocated to the delay of the return in the patch space.  The manger for a patch 
    of a ret instruction (which is a pseudo-op for a ``jmpl %i7+8,%g0'') contains:
    jmpl %i7+8, %g0        -- transfer to %7+8, without saving a return address.
    delaySlot        -- the instruction in the delay slot of the original return.
The case of a retl instruction (which is a pseudo-op for a ``jmpl %o7+8,%g0'') is handled similiarly, by changing the source 
operand of the jmpl instruction in the manger.
    The different types of mangers are distinguished by writing an extra word at the end of the manger which happens to be a ``sethi    
    %hi(tag), %g0'', where tag is a member of the enumeration SPARCManger.MangerVariant.  This instruction is never actually 
    executed, but it is an executable instruction for historical reasons.  When it comes time to uninstall a breakpoint, the 
    original instruction can be reconstructed by examining the tag and doing the appropriate reconstruction from the instructions 
    in the manger.  Patches are also prefixed (by the Shepherd code) with the address of the breakpoint, so that completes the 
    information needed to uninstall a breakpoint.
3. Testing and Timing
    Testing is accomplished by running the TestBreakpoints.cm file.  That loads up all the stuff you need.  Then you can issue the 
    command ``TestBreakpointFlavors'', to check out all 5 different kinds of breakpoints (including those on conditional branches 
    that do and do not branch), and also test clearing breakpoints.  The output should be self-explanatory.
    Timing is accomplished by running TestBreakpoint.pcr, in as little of PCR as you can (e.g., ``CedarBasement TestBreakpoint.pcr'').  
    When it is done loading, invoke ``CallAll _TimeBreakpoint_P720'' (or whatever its unique name is) and it will print out the 
    timing statistics when it's all done (so the printing doesn't perturb the times).  As of now, on a SPARCstation1, normal breaks 
    take 24.6 microseconds and fast breaks (not saving the floating point registers) take 11.2 microseconds.
4. Shortfalls and Wishlist
    There are probably a few kinds of instructions on which you'd like to plant breaks, but you can't, because I haven't figured out 
    what the manger's for them look like.  Let me know and I'll think about how to do it.
    You can't plant breakpoints on the instruction in a delay slot.  I can't figure out exactly why you'd want to do this, and I've 
    figured out a way to get it to work on a uni-processor, but it seems like it's only of academic interest.  If you really need 
    it, let me know.
    The amount of patch space per module is (severely) limited.  I think the default is to give you space for at least 4 patches per 
    mdoule, plus some number depending on the size of the module you are loading.  Alan claims to have a callback in the loader for 
    changing those numbers, but I haven't been able to get him to show me how to change it, so I don't know how difficult it is to 
    change.  There's also a plan to ask the storage allocator for some space ``close enough'' to the code for the module to plant 
    breakpoints.  That certainly isn't there in the Weiser allocator, and it's not clear it will even make it into any of the early 
    versions of the Boehm allocator.  If someone goes and does it, they should remember that I'm willing to take (and use) more 
    space than I ask for, so if they can't find the perfect chunk of memory, they can hand me back a bigger chunk.  When this 
    happens, we will have to change the patch data structure to have a link field at the front of it, and teach FirstPatch and 
    NextPatch to walk the links.  I suppose I could go put in the field and the code now, but I wouldn't be able to test it, and it 
    might confuse someone reading the code.