<> Chapter 4 Solution Techniques The preceding chapter demonstrated many difficulties with debugging optimized programs. This chapter describes techniques for solving some of these difficulties: first the basic techniques that are useful in providing either truthful or expected behavior, then the techniques that support only truthful behavior, and finally the techniques that support only expected behavior. There are two ideas underlying the basic techniques. First, optimizations remove information from the unoptimized program in creating the optimized version. This information can be saved during compilation or execution, either to present the user with the effects of the optimizations (truthful behavior) or to reconstruct an unoptimized view of the optimized program's execution (expected behavior). Second, optimizations or debugging can be limited in some way to make the presentation or reconstruction job easier. The techniques that support truthful behavior help the user to understand the effects of the optimizations and to manipulate the optimized computation in terms of the unoptimized source text. In contrast, the techniques that support expected behavior help the debugger to translate between the user's view of the unoptimized program and the actual optimized computation. This chapter presents a wide range of ideas and suggestions for debugging optimized programs. A few of these ideas were tested in the Navigator implementation described in the second part of this dissertation. The remainder have not been tested; some are extensions of ideas suggested in previous discussions of debugging optimized programs, while others are new to this work. 4.1 Basic techniques The basic techniques, which can be used to provide either truthful or expected behavior, include adding static information, which the compiler collects and passes to the debugger; adding dynamic information, which the debugger collects during the program's execution; restricting optimization to allow more intelligible debugging; restricting debugging capabilities to allow more extensive optimization; and recompiling (small) portions of the program to turn off optimizations. 4.1.1 Adding static information Static information is collected by the compiler during the symbol table construction, code generation, and optimization phases and is saved for the debugger's later use. Static information should be placed in auxiliary tables rather than in the object code itself, so that the optimized program can execute at maximum speed and minimum space until debugging is requested (at runtime). The most crucial information for an interactive source-level debugger is the correspondence between variable names specified in the source program and storage locations assigned to these variables in the object program, as well as the correspondence between source statements and object code locations. Without correct information of this kind, which can only be provided by the compiler, interactive source-level debugging is impossible: whether or not a program is optimized, a debugger that has no notion of its variable or procedure names or its statement ordering cannot present intelligible information about the executing program. A conventional compiler gives the debugger simple symbol tables and monotonically increasing statement maps, aside from the object code itself. An optimizing compiler must construct much more sophisticated symbol tables and statement maps. If no attempt is made to keep this information current during the optimization process, the debugger cannot communicate intelligibly with the programmer about the executing program. The debugger might have no notion of procedure or variable names, no distinction between compiler temporaries and user variables, and no knowledge of statement (or even procedure) boundaries. At best, the debugger could disassemble the program into some corresponding source-level representation with arbitrary choices of variable names. Given the effects of the optimizations and the variable renaming together, this would be a formidable puzzle. A wide variety of static information can be useful to a debugger. From simpler to more complex, this information includes symbol tables, statement maps, global flowgraphs and local dags, results of flow analyses, and special tables of information specific to certain optimizations. The compiler can also lay the groundwork for the next technique, adding dynamic information, by noting places or items that the debugger might wish to record at runtime. Each of these will be discussed in turn. 4.1.1.1 Augmented symbol tables Information about variable names, types, and storage locations, as well as information to handle specific optimizations, can be stored in an augmented symbol table. Information about variable names and types must be linked to the source text (for instance, by statement ranges), while information about storage locations must be linked to the object code. Control-flow optimizations provide a clear example for this distinction. Consider inline procedure expansion, which creates multiple copies of the object code for each source statement in the procedure's declaration. The variable names and types to which the procedure refers are the same for every copy of the procedure, but the storage locations assigned to those variables depend upon the program region enclosing each expanded call. By contrast, consider cross-jumping, which combines multiple copies of object code, thereby creating multiple source statements that correspond to the same object code. Suppose that the two arms of an IF-THEN-ELSE statement each declare separate local variables. Thus a single object instruction in the merged region corresponds to two different variable contexts, depending upon the source statement on whose behalf it is executing. To emphasize this distinction, storage information could be placed in a separate storage map. Previous researchers have not realized the necessity for this distinction; naturally, it is not necessary for a nonoptimizing compiler. For example, the design of the FDS debugging system included a table that mapped each variable at each source statement to a list of storage locations at the start of that statement. Storage location information arising from register and storage allocation must be reflected in the symbol table or storage map. For instance, if register allocation targets a user variable to a register for the duration of a procedure or loop, the symbol table must reflect this allocation so that the debugger can display the current value of the variable inside that program region. If the debugger permits variable modification, the symbol table must list every storage location that has a current copy of a variable (rather than just one) so that they can all be updated whenever the user changes the variable's value. Furthermore, later optimizations (dead store elimination in particular) can affect the storage location information. To access variable values correctly at other than statement boundaries (at the point of a program exception, during a procedure traceback, or possibly at an asynchronous interrupt), even more information must be saved. Several items specific to other optimizations can also be placed in the augmented symbol table. For constant propagation, the symbol table can hold the constant value associated with a given variable. Similarly, the symbol table can hold the variable name associated with a given variable for copy propagation. For induction variable elimination, the inverse induction function can be stored in the table entry for the eliminated induction variable so that the debugger can compute the value of the eliminated variable from the substituted variables on demand. In general, different extra information must be associated with diferent sections of the program (for example, if the same loop index is used as a different induction variable for multiple loops). "Advising" a symbol table entry in this way can also work for object code optimizations, such as collapsed integer arithmetic: the symbol table could record that the expected value of the variable in a certain object code region is equal to the actual value plus a recorded constant. If the debugger allows variable modification, the symbol table could also contain information for checking second-order effects. In the table entry for a user variable, the optimizer could store properties that it assumed and/or deduced about that variable inside certain object code ranges. For instance, the optimizer might have used the fact that the value of the variable x was even, greater than zero, equal to y, etc. as the precondition for a later optimization. When the user changes the value of x from the debugger, the debugger could then check any applicable properties for x in that program region to ensure that the resulting program is sound. This information would need to be stored efficiently, as there is potentially a great deal of it. 4.1.1.2 Augmented statement maps Information to allow mapping between source statements and object code locations can be stored in an augmented statement map. Some control-flow optimizations force one source statement to have multiple corresponding sequences of object code (e.g., inline procedure expansion); others force multiple source statements to have one corresponding sequences of object code (e.g., cross-jumping). These simple examples show that statement maps must be able to represent many-to-one and one-to-many relationships. Therefore, the simple monotonically increasing statement maps generated by conventional compilers are not sufficient. As was the case for symbol tables, mapping from object locations to source statements correctly at other than statement boundaries (for example, at the point of a program exception) requires that even more information be saved. If the optimizer performs code-reordering optimizations, such as code motion out of loops or local code-reordering, then the statement map must record both syntactic and semantic mappings for moved statements. Recall from Section 3.2.2.3 that the syntactic mapping of a moved statement is its original location. This reflects its position with respect to other surrounding statements and with respect to the degree of progress through the entire computation. The syntactic mapping is useful for answering questions like "How many times does execution reach this point in the program?" [See Section 2.2.2.] On the other hand, the semantic mapping of a moved statement is its new location. This reflects the position of the actual computations specified by the statement. The semantic mapping is useful for reporting the offending statement at a program exception and for answering questions like "What is the value of variable x before this assignment to x?" To answer the question "What is the value of variable x after this assignment to x?" for moved statements, the optimizer must also either record a semantic-after object location or provide a semantic version of the next-statement primitive. Both the syntactic and the semantic definitions of statement mapping are useful; neither can be derived from the other. Because only the user can specify which mapping she desires, it follows that a debugger cannot provide completely expected behavior (in the sense of completely mimicking the behavior of a conventional code breakpoint-oriented debugger) for programs whose statements have been reordered. Other researchers have not noted the distinction between nor the rationale behind the two different statement mapping methods. Hennessy, motivated primarily by the desire to report the causes of program exceptions correctly, provided only the semantic mapping, while the design of the FDS debugger included only the syntactic mapping. 4.1.1.3 Other static information Other information that the compiler could collect for the debugger's later use includes the program's global flowgraph, the results of flow analyses, special tables for particular optimizations, and cues for dynamic information collection. The compiler could store the program's entire global flowgraph. If the compiler uses dags for local code generation, the dags would appear as basic block nodes in the flowgraph. The flowgraph would also contain statement mapping information and symbol table pointers. Special debugger algorithms would process the flowgraph at runtime to respond to user queries. Hennessy used a scheme of this sort. His augmented flowgraph and dag representations, which included object code location information, encoded both the optimized and the unoptimized forms of the program. For most local optimizations and some global ones, the debugger could process this flowgraph to discover whether or not the value of a variable was current. Sometimes the debugger could reconstruct the correct value of the variable from the flowgraph. In conjunction with the augmented symbol tables described earlier, such flowgraphs could handle more optimizations than Hennessy considered [see Section 3.3.2], such as constant propagation and register allocation. Unfortunately, the problem of representing the effects of later object code optimizations back into such flowgraphs remains an open research area. The compiler could record the results of several kinds of program flow analysis, such as reaching definitions, available expressions, live variables, and so on [Aho+77]. In contrast with storing the program flowgraph and processing it on demand at runtime, this information represents partially precomputed answers to questions that a user might ask the debugger. (Of course, the compiler could store both.) Aside from its lists of variable storage locations in the symbol table, the FDS system planned to use stored tables of flow information as its major debugging assistance. During optimization, the compiler would record the movement (as insertions and deletions) of assignments and uses of variables. It would then propagate this information around its flowgraph to create lists of assignments and uses of variables that "chain" to each variable at each statement. [An occurrence of a variable v chains to a program location p if there is a path from the occurrence of v to p (in either direction) that does not assign v.] Figure 4-1 shows the complete list of tables that the FDS compiler planned to collect for its debugger. The compiler could also record special-purpose tables to assist debugging for particular optimizations. For example, for inline procedure expansion, the compiler could record the name of each "deleted" call so that the debugger could reinsert it into its proper place in the procedure traceback when necessary. The Navigator system, described in the second part of this dissertation, does this. Finally, the compiler could record cues for the collection of dynamic information. These cues would tell the debugger what debugging activity should trigger information collection at runtime, what information to collect, and when and where to collect it. This information will be described further in the next section. The Navigator compiler collects such cues, called path determiners, to help distinguish among the multiple source alternatives that arise from the cross-jumping optimization. <<"Statement map" information>> << Statement number >> << Statement number >> << Statement number >> <<"Symbol table" information>> << Variable, Statement number >> << Variable >> <> << Variable, Statement number deleted that chain to that statement>> << Variable, Statement number >> << Procedure >> Figure 4-1. Runtime debugging tables provided by the FDS compiler. 4.1.2 Adding dynamic information Dynamic information is collected by the debugger during the program's execution. The debugger can collect information about the order in which the computation has traversed control flow paths, or it can save old values of variables. The debugger might also collect statement execution counts or data use counts, either selectively or wholesale, but this information does not address any special problems of debugging optimized programs. Dynamic control flow information can be a powerful addition to static information. If the debugger has a global flowgraph or results of flow analysis and it also collects dynamic control flow information, it can always determine when the values of variables are incorrect with respect to the unoptimized program. For example, suppose that code motion has moved a loop-invariant assignment to the variable x outside a loop. If the loop has been executed more than once, then x has the correct value for the remainder of the loop's current execution. As another example, suppose that global dead store elimination has removed a store to the variable y along one execution path but not along another. At a program location between the deleted store and the next assignment to y, if the deleted store's execution path has been traversed more recently than the other path, then the value of y is incorrect; otherwise it is not. The absence of dynamic control flow information in Hennessy's design made it impossible for his flowgraph algorithms to detect all cases in which variables have noncurrent values. As partial compensation, he suggested a modification of the code motion optimization: loops would be unrolled once, thereby automatically determining control flow. Dynamic control flow information would also have helped the FDS debugger design. Its static information about inserted and/or deleted assignments that chain to the value of a variable at the current execution point could be pruned at runtime to list only those assignments along the most recently traversed execution path. The debugger can also use dynamic control flow information to help with statement mapping. For instance, the cross-jumping optimization merges the object code for several source statements in different control flow paths, creating multiple source statements that correspond to a single object instruction. The debugger can determine the source statement on whose behalf that object instruction is currently executing if it knows which of the control flow paths was traversed most recently. The Navigator debugger uses this method to provide expected behavior for cross-jumped programs. Dynamic information about old values of user variables can also be useful. For example, suppose that local code reordering has moved an assignment to the variable z earlier in its basic block. If the debugger records the old value of z before it is overwritten, it can display the expected value of z at program locations between the new earlier location of the assignment and its original location. In many situations, the debugger will also need to record dynamic control flow information to ascertain when to use the saved value for user displays and when to use the actual value in the variable's storage location. This technique works similarly for code hoisting, dead store elimination, and even storage overlaying. It is clear that dynamic information can assist in debugging optimized programs, but when and how can it be collected efficiently? If the debugger is given all of the static information discussed in the preceding section and it also records the complete source-level history of every computation, the debugger could potentially unravel the effects of most optimizations: only optimizations that move computations with side effects toward points later in the program cannot be debugged with expected behavior. Optimizations that remove computations entirely could be handled by interpreting the relevant portion of the flowgraph. (These computations are guaranteed not to have side effects.) Unfortunately, recording the entire history of every execution of an optimized program is more expensive than not optimizing the program in the first place. The debugger could wait to begin recording until the user has issued a debugging command, but even this reduced collection is likely to record unnecessary information and to be excessively slow. On the other hand, it can certainly be described simply: during debugging, collect every user-visible change to the computation state. Since some necessary information might reside in the portion of the history that occurred before recording began, it does not guarantee expected debugging behavior. Instead of collecting the complete history of each computation, the debugger could collect only selected information. At most, the control flow through branches and the old values of variables whose assignments have been moved, deleted, or overlaid need to be recorded. However, this information is probably still too much to record by default, even during debugging. To limit the dynamic information still further, the debugger could collect only the dynamic information that is useful for the currently active debugging requests. Its collection could be requested explicitly by the user (which would be truthful behavior), or the debugger could collect it automatically, triggered by debugging requests in optimized program regions (for expected behavior). For automatic collection, each debugging command would potentially trigger the collection of some future information to be used in its processing. Note that this dramatically reduces the amount of dynamic information collected, but at the same time, it restricts the usefulness of the information to preplanned debugging activity only (that is, breakpoints or explicit user collection requests). For unplanned activity (program exceptions, asynchronous interrupts, and procedure tracebacks), the necessary information would probably not have been recorded, so expected debugging behavior could not be guaranteed. Of course, if whole computations have been removed, simply collecting an existing value is not sufficient. In this case it may be possible to perform the computation at its original location and then collect its result, but further optimizations may have altered values so that this cannot be done. Given the entire history of the computation, the debugger must be able to find the history information relevant to a given debugging request. If debugging activity is to trigger information collection, the debugger must be able to decide what debugging activity should trigger what information collection. The compiler can record static cues for this triggering, or the debugger can analyze its other static information to derive the cues on demand. Furthermore, the debugger must be able to use the recorded information properly, either to give more truthful behavior (exact knowledge of noncurrent variable values) or to give expected behavior (reconstructing the correct values). The dynamic information could be collected as potentially unbounded lists of relevant control flow and data information or, more efficiently, as only the most recent "layer" of that information. The most recent "layer" might be needed in program regions affected by multiple optimizations; it includes all variable values that are possibilities for any future display request and all control flow information that can still be used to answer future debugging queries. For instance, suppose that two assignments to the same variable were hoisted across a given program location. Then both old values would need to be saved: the first to display the correct value at that program location, and the second to display the correct value between the first assignment and the second. Similarly, in program regions that have been cross-jumped repeatedly, it may be necessary to know the direction of several earlier branches. (Furthermore, these are not necessarily the n most recent branches). As an example of debugging activity triggering the collection of old values of variables, suppose that storage overlaying has allocated the same storage location for variables v and w, making the value of variable v inaccessible in some program region. When the user requests a breakpoint in that program region, it triggers the (hopefully future) collection of the value of v before it is overwritten by w. In a program region that has been heavily optimized, a single breakpoint might trigger a large amount of dynamic information collection. To reduce this amount, the user could specify in advance which variables she wishes to examine at the breakpoint. To avoid decreasing the execution speed of optimized programs that are not being debugged, the debugger should whenever possible collect dynamic information only on demand. Therefore, the compiler should not generate object code to collect the information as part of the program itself. This restriction could be relaxed as long as the optimized program with the information collection is still significantly cheaper than the unoptimized program. [See the modified code motion optimization described in the following section.] Four implementations of dynamic information collection lend themselves to being triggered at runtime: tracing, interpreting, breakpoints, and patching. The debugger could use these methods automatically, without notifying the user (except that the user might see a reduction in execution speed during debugging). For invisible tracing, the debugger could trace the program, either at the source level or at the instruction level, recording the desired information. This method would probably be painfully obvious to the user. Similarly, the debugger could interpret the object program invisibly whenever debugging is active; this would likely be even slower. The use of invisible breakpoints can be a much more efficient method. The debugger inserts an invisible breakpoint wherever it wishes to collect control flow or data information. The optimized program runs at full speed between collection points. Obviously there is a tradeoff between the proximity of collection points and the value of full speed execution, especially if breakpoints are expensive to process. Invisible breakpoints could be extended to invisible patching: the debugger could patch in code to record the information, or even to insert the deleted computations themselves. In fact, static and dynamic information could cooperate. For example, in some cases of dead store elimination, the variable's value at a given program location can be computed from other values that are still available there. When the debugger can determine this from the flowgraph, it need not collect the value of that variable, because it can compute the value if it is requested. 4.1.3 Restricting optimization The problem of debugging optimized code can also be attacked by restricting optimizations in some way short of turning them off completely. This method is not as desirable as the preceding methods, so it should be used only when additional static and dynamic information does not permit expected behavior, or when the information is too expensive to collect and manage. There are several possible ways to restrict optimizations. Some optimizations might be very damaging to debugging, while only slightly improving the execution characteristics of the program. These optimizations could be removed from the compiler. A possible example is dead store elimination. Another alternative is to inhibit some optimizations under certain circumstances. The compiler might choose the circumstances (when it believes that debugging will be substantially impaired), or the user might explicitly request restricted optimization. For example, the user might specify that certain variables are debugging variables. As a result, the optimizer would not perform constant propagation or constant folding on these variables, overlay their storage, etc., so that the user can alter the values of these variables during debugging. The user might also specify that certain program regions must be able to be debugged with expected behavior. The user should be able to specify these directives separately from the program itself, so that the process of their removal cannot introduce errors. Of course, it is less desirable to require the user to specify debugging preferences before compilation than at runtime: the user may not know in advance where the program might fail. The compiler could also limit the range of optimizations. That is, it could force the optimized and unoptimized versions of the program to correspond at "fixed points", which could occur at several levels of frequency: at statement boundaries, at basic block boundaries, and at screen/page or procedure boundaries. Three specific problems must be considered for this restriction method. First, whatever method is used to implement the restrictions in the optimizer must limit object code optimizations as well, including instruction collapsing (which might cross a fixed point), branch chaining, and cross-jumping. Second, even if breakpoints are only permitted at the same granularity as the fixed points occur, other debugging events, such as program exceptions, can occur at a finer granularity. (Asynchronous interrupts might also be permitted at a finer granularity, or the debugger might continue execution until a fixed point is reached. However, even if the currently active procedure can be suspended only at fixed points, the procedure call chain will contain procedures that have been suspended between fixed points, unless optimizations never cross procedure callsa very severe optimization restriction.) As a result, some thought must still be given to providing the values of variables (which might be in registers) and to reporting the current execution point for program regions between fixed points. Minimally, for truthful behavior, the debugger must admit that it cannot provide the correct response. To minimize this problem, Hennessy required that the compiler generate code such that changes to the computation state (e.g., stores) occur only at the end of a statement's object code. Third, compiler-introduced temporaries might be exempted from the range restriction, allowing their optimized values to differ from their unoptimized values at a fixed point. For example, common subexpressions could be stored in a compiler temporary for use across several fixed points. If the debugger permits variable modification, this exemption can cause problems during debugging: if the value of a compiler temporary is computed from a user variable, the user could alter the value of that variable between the temporary's assignment and its use. For the remainder of the discussion of optimization range restriction, we return to the assumption of statement breakpoint granularity. When the optimizer places fixed points at every statement boundary, no additional compiler or debugger effort is needed to achieve expected debugging behavior (except for the three problems described above). Optimizations are very limited: one of the major allowable optimizations is common subexpression elimination for data structure accesses; constant folding and some object code optimizations are also permitted. The design of the FDS debugger included a "no source change" mode of this kind; no mention was made of the three problems. Of course, recompilation and re-execution would have been necessary to invoke this mode if an error was encountered during a "full optimization" mode execution. When the optimizer places fixed points at every basic block boundary, more optimizations are permitted, such as: local common subexpression elimination, local code reordering, local constant and copy propagation, as well as object code optimizations. Nevertheless, the lack of unknown control flow makes both detecting and unraveling the effects of the optimizations much easier than for global optimizations. Hennessy's experiments showed that noncurrent variable values could always be detected for local optimizations, and correct values could often be reconstructed. In this progression, the next logical step is the procedure, but an intermediate level has some practical value. The optimizer would place fixed points at the more frequent of procedure boundaries (possibly including inline procedure boundaries) and every 15 to 50 statements (an experimentally-determined number, chosen to maximize optimization benefits and minimize debugging difficulties; not to exceed one screen or page full). [When procedures do not exceed the one-page limit commonly recommended in software engineering books [Kernighan+74], this level and the procedure level coincide.] Since most basic blocks are quite small (the average basic block length is commonly reported to be less than five statements), this level permits global optimizations and their accompanying difficulties. However, this level of restriction can significantly ease the burden that other solution techniques must bear. For instance, to assist new debugging algorithms to reconstruct values of variables, the fixed points limit the number of variables that can have incorrect values at a given program point. To assist in showing the effects of the optimizations, the fixed points limit the number of reordered and altered computations and allow the full region (one fixed point to the next) to be shown in a single screen or page image. To assist dynamic information collection, the fixed points limit the distance that computations can be moved, so that it is more likely that a debugging request will occur before it is too late to record the necessary information. To assist limited recompilation, the fixed points give a reasonably close place to switch between the optimized and unoptimized versions of a procedure; true fixed points might occur only rarely in a fully optimized program. Note that the assistance for dynamic information collection and limited recompilation applies only to preplanned debugging activity. The exact implementation of an optimization can also be restricted, so that the optimization is performed in a way that is less damaging to debugging. For example, suppose that the code motion optimization is altered slightly, so that the hoisted location always saves the old value of the variable in a compiler temporary before performing the new store. This costs one extra store and one extra temporary (which only need be live until the store's original location). This new form of the code motion optimization is still quite effective, because the computation and the store have been moved out of the loop. More important, the debugger is then able to provide high-quality truthful behavior at a program exception or an interrupt: it can show both the current and saved values of the variable. To provide expected behavior, the debugger would need to collect dynamic control flow information to ascertain whether this is the first time through the loop. Similarly, the optimizer could always store hoisted common subexpressions in compiler temporaries rather than in user variables. 4.1.4 Restricting debugging capabilities Instead of restricting optimization to permit more intelligible debugging, the system could restrict debugging capabilities to permit more extensive optimization. The debugger could completely omit certain functions that are very difficult to handle correctly for optimized programs. A prime candidate in this category is variable modification. Hennessy's algorithms did not permit variable modification. The debugger could also restrict its breakpoint granularity, possibly to coincide with the occurrence of optimization fixed points. As for optimization range restriction, the interesting granularity choices include basic block boundaries and screen/page or procedure boundaries. Hennessy enforced a "semantic mapping" restriction on breakpoint placement: when the evaluation of a common subexpression was the first object code for a statement, the user was permitted to set a breakpoint only at the first statement containing that common subexpression. Restricting debugging capabilities does not mean that the debugger is allowed to give incorrect responses to certain debugging queries; instead, it means either that the user is not permitted to present queries that would result in an incorrect response, or that the debugger gives truthful rather than expected responses to more requests. In its proposed "full optimization" mode, the FDS debugger would not really have satisfied this constraint: its responses were too cryptic to be understood readily. 4.1.5 Recompiling a small portion of the program Automatic recompilation of a program region, together with on-the-fly code replacement, can allow for more extensive optimization without requiring elaborate compile-time preparation for the possibility of runtime debugging requests. The ideal optimization restriction for a given debugging request is the restriction that results in the most optimized program that can respond correctly to that request. Only those transformations that cause the request to be unanswerable are inhibited. For example, if the user requests the value of a variable at a point affected by dead store elimination, the most optimized program that can provide the value has exactly that one store added. The primitive debugging method of inserting a print statement will provide this behavior (because the store will not be dead in the recompiled version), but at the cost of recompiling and re-executing the entire program for each request. The recompilation and replacement technique is an outgrowth of this idea, with the size of the recompiled region reduced and little or no re-execution. When the user requests debugging activity in a program region, a portion of the program is automatically recompiled, either with a lower level of optimization or with a fixed point [see the preceding section] at the desired debugging location. The system then replaces the fully-optimized version of that portion by its less-optimized version in the executing program. The size of the portion could vary from a single statement through a basic block or procedure to an entire module for a large program. [A particular system could allow several sizes or choose a single one, such as the procedure.] As a result of the replacement, a less sophisticated debugger can provide expected debugger behavior. The replacement generally requires a fixed point at its beginning and end to allow the new region to mesh well with its old version. The code generator could conceivably be given constraints to permit meshing at other places, but these constraints might interact poorly with the ability to debug. This technique works only for preplanned debugging activity, such as breakpoints or tracing; it is not particularly useful at program exceptions or for examining the state of currently-suspended procedures in the procedure call chain. For such unplanned activity, the program's execution must be restarted. Furthermore, if the current execution point is inside the affected region, the ability to debug can only be assured the next time through the region. When control has already entered the affected region, it may be impossible to switch from executing the fully-optimized version of the object code to executing the less-optimized version. For instance, hoisted computations that appear later in the less-optimized version may already have been performed in the fully-optimized version. These computations may not be repeatable with the same effect (such as x _ x+1). Also, if there are suspended procedure invocations whose execution point is inside the region, the system must keep the fully-optimized version of the region's code with which to continue its execution. (This would not be a problem if the system inhibited optimizations across all procedure calls, but that is a harsh restriction.) Feiler's incremental compilation and debugging system, LOIPE, included optimization capabilities [Feiler82]. Since he had no notion of fixed points nor of multiple copies of object code, the insertion of debugging statements in a procedure active anywhere in the call chain made it impossible to resume execution of the program. This restriction could be avoided for two intermediate levels of program optimization: the "complete display" level required all data values to be stored in their permanent homes before any procedure call or debugging statement, while the "full debugging" level prohibited optimizations across any procedure call or debugging statement. A checkpointing capability would extend the utility of this technique to unplanned or semi-planned activity. If the values of all (or just changed) accessible variables are recorded at intervals throughout the execution of the program, such as at procedure entries or at fixed points, program execution can be restored to the last checkpoint, the affected region can be recompiled and replaced, and program execution can resume. The system would also need to record all input and output so that it could be replayed. Of course, an automatic checkpointing capability would be quite expensive if it were always active. However, the debugger could include commands to enable automatic checkpointing or to record a single checkpoint. If the amount of static information becomes excessive, recompilation can also be used to support the technique of adding static information. The debugger would invoke the compiler to collect more information about a specific region than is normally retained. For example, variable storage locations could be recorded on a procedure or loop basis only; for detailed information about a variable's storage location during a specific statement, the register allocator could be invoked on the source text or the flowgraph of the region, with the initial state as noted in the symbol table. If peephole optimization can affect variable storage locations, then it would be necessary to apply the entire code generator to the region. 4.2 Techniques to support truthful behavior For truthful behavior, the debugger can admit to the user that the program has been optimized and can allow the user to shoulder some of the responsibility of unraveling the effects of the optimizations. Techniques to support truthful debugging behavior include showing the effects of the optimizations, so that the user can debug the program that is actually executing, and adding new debugging capabilities especially designed to handle the problems that arise when debugging optimized programs. 4.2.1 Showing the effects of optimizations A major stumbling block when debugging optimized code is that the programmer does not know what the transformed program looks like. Thus she may request the debugger to stop the program at a statement that the optimizer has deleted because it was redundant. At some time when the program's execution is suspended, she may ask for the value of a variable that has been removed by constant propagation or by induction variable elimination. Even to provide truthful behavior, the debugger must have sophisticated capabilities. Suppose, on the other hand, that there were an automatic source-translator that could give the programmer a new version of the source program that showed the effects of the optimizations. This source-translator could also provide the debugger with the usual straightforward mappings from this new version of the source text to object code and data locations in the executing program. Then the responsibility of reformulating any debugging questions about the program in terms of the executing version could be shifted to the programmer. Loveman and Kuck report on program transformation systems that can recreate a new version of the source program for the user to examine at any point in the transformation process [Loveman77, Kuck+81]. These systems probably use a standard prettyprinting algorithm working on the program graph generated by the optimizer, similar to the prettyprinters used in tree-structured editors [Teitelman78, Donzeau-Gouge+75]. Prettyprinting systems commonly have difficulty with placement of comments; program transformations can only add to these problems. If the source text correspondence can be fairly easily determined by the programmer, this approach might solve the problem of debugging optimized code. However, since optimizations can radically alter the program, the programmer must understand a new program that solves her problem in order to ask meaningful debugging questions. For example, Loveman shows an optimized matrix multiplication program that is much different from its original source version [Loveman77]. Another problem with this approach is that the programmer may have to read through the entire new source text to be able to formulate a debugging request about some small area of the original program. This can happen as a result of inline procedure expansion, the movement of invariant computations out of loops, constant propagation, and other transformations. One way to solve these problems is to annotate statements in the new source text with references to the places in the original program to which they correspond, and with descriptions of the optimizations performed. However, this can result in an enormous amount of information for even a reasonably small program. Furthermore, this solution would probably overwhelm naive programmers. Several other issues must be considered in providing a new version of the source text. First, at what level (between source level and machine level) must this new source text be written in order to allow for a given set of debugging capabilities? The effects of some optimizations cannot be shown using only the source language. Also, supporting assignments to variables in the debugger requires a lower level than supporting only the display of values of variables, because structured variable accesses must be exposed when addresses are cached across statement boundaries. Ideally, the programmer should not be exposed to any more details than the original program mentioned (for example, to the way that structures in the high-level language are stored and accessed). As Figure 4-2 shows, addressing computations can be exposed without displaying multiplications by element sizes or other low-level implementation details. However, only highly motivated programmmers will wish to examine the large amount of additional information that is displayed. To make this information more intelligible, it can be localized. With respect to a given point in the source program, the debugger could show which computations that appear lexically after that point in the original text have already been performed, and which computations that appear lexically before that point have not been performed. (Of course, this requires the definition of a corresponding point in the optimized program.) Hennessy's algorithms can probably be modified for use in this way. It should be possible to obtain such a report about a program section before placing any breakpoints in that area, so that the user would not arrive at a breakpoint and discover that the event she wished to monitor had already occurred. The next section shows an example of localized information presentation. OptTst: PROGRAM = OptTstOptimized: PROGRAM = { { Ten: TYPE = [1..10]; Ten: TYPE = [1..10]; IntArr: ARRAY Ten OF INTEGER; IntArr: ARRAY Ten OF INTEGER; a,b: IntArr; a,b: IntArr; aPtr1, bPtr1: POINTER TO IntArr; i, j: INTEGER; inductionVbl1: INTEGER; -- i induction vbl = inductionVbl1 / 2 -- j constant: deleted j _ 3; i _ 1; inductionVbl1 _ 2; aPtr1 _ FirstEltPtr[a]; bPtr1 _ FirstEltPtr[b]; -- loops have same range, combined WHILE i <= 10 DO WHILE inductionVbl1 <= 20 DO { {-- check aPtr1 IN [a, a+Ten] -- check bPtr1 IN [b, b+Ten] a[i] _ 2*i; aPtr1^ _ inductionVbl1; bPtr1^ _ aPtr1^ + 6; -- 6 = j*2 i _ i + 1 inductionVbl1 _ inductionVbl1 + 2; aPtr1 _ NextEltPtr[a, aPtr1]; bPtr1 _ NextEltPtr[b, bPtr1]; }; }; i _ 1; WHILE i <= 10 DO {b[i] _ a[i] + j*2; i _ i + 1 }; }. }. Figure 4-2. Showing the effects of optimization at a level suitable for variable modification. An added benefit of explaining the effects of the optimizations to the programmer (in addition to allowing debugging) is that this new source view may bring programming errors to the programmer's attention. For example, the deletion of an assignment statement because the assigned variable was never subsequently used often indicates either a logical error or a typographical error. 4.2.2 Adding new debugging capabilities The debugger could provide special commands to assist in debugging optimized programs. For example, the debugger could implement two different types of breakpoints: one based on the syntactic statement mapping and another based on the semantic statement mapping, since neither type of breakpoint alone is sufficient to answer all typical debugging queries. The debugger's documentation must explain the distinction between the two types. Distinct syntactic and semantic breakpoints can be combined with showing the effects of the optimizations to provide a useful debugger. Figure 4-3 shows a program fragment, a source view of its optimized form, and a sample user interaction. Another way to implement the distinction between syntactic and semantic breakpoints is analogous to the distinction between statement breakpoints (specified by number) and procedure entry or exit breakpoints (specified by name) in conventional debuggers. A breakpoint on a selected statement always uses the semantic mapping. A breakpoint inside a control construct, such as a loop or block, is obtained by requesting an INSIDE breakpoint on the construct. This implementation does not directly permit the finer control of the syntactic breakpoint. fig43.press leftMargin: 2 in, topMargin: 1.5 in, width: 4.5625 in, height: 4.5 in Figure 4-3. Local explanation and new breakpoint types. As a more elaborate way to handle the statement-mapping problem, the user could specify a list of constraints, such as AFTER statement s1, BEFORE statement s2, or INSIDE construct c1. Of course, a place satisfying all of these constraints might not exist in the optimized program. A debugger that ordinarily provides expected behavior could also provide a command to display a new version of the source for a region of the optimized program. The debugger might also allow the user to see other static or dynamic information about a given program point, such as use-def chains, dead variables, and recent control flow. As several researchers have noted, the static information collected by an optimizer can sometimes point out programming errors [Lew74, Ferrante83, Ottenstein+83]. For example, a dead variable may signal either a logical error or a typographical error. Several program development systems use this information to warn users (at compile-time) of possible programming errors [Masinter78, Alberga+81]. Requests to save a variable's value or a complete checkpoint for later use, or to explicitly enable dynamic information collection in case a program exception occurs, also fall in this category. Although not in an optimized setting, Feiler's LOIPE system allows a user to save old values of variables for use in specifying assertions, such as loop invariants, to be checked by the system [Feiler82]. LOIPE also allows the user to explicitly request that an execution image be kept so that execution can later be restored to that point. Navigator has a SUSPECT command that enables the collection of dynamic control-flow information for a given procedure. [See Section 5.8.] 4.3 Techniques to support expected behavior For expected behavior, the debugger must hide the effects of the optimizations from the user. Thus the major technique in this category is adding new debugger algorithms to process the additional static and dynamic information, thereby giving the user a sanitized view of the execution state. 4.3.1 Adding new debugger algorithms To provide expected or even truthful behavior from additional static or dynamic information, the debugger must process it. The necessary algorithms frequently resemble flow analysis algorithms. For example, if the optimizer presents the debugger with a program flowgraph, the debugger could walk the flowgraph to determine which variables might have incorrect values at a given program point. When a breakpoint is requested, this information might be used to trigger the collection of old values of those variables and/or the collection of dynamic control-flow information along the relevant paths. When a breakpoint is reached, the debugger would examine any previously-collected data or control-flow information and relate it back to the flowgraph either to tell the user when a variable's value cannot be correctly reported or to reconstruct the expected value of that variable. Hennessy constructed flowgraph-processing algorithms to find the set of roll-back variables, which are variables that have been assigned a new value earlier than they would have been in the unoptimized version of the program, and the set of roll-forward variables, which are variables whose assignments have been delayed. In the presence of global optimizations, these sets could not be positively determined. Hennessy could sometimes reconstruct the expected value from the information in the flowgraph. In the Navigator system, breakpoint insertion can trigger the collection of dynamic control-flow information. The debugger processes the control-flow information to decide among several source alternatives in a cross-jumped region. As another example of special debugger algorithms, data-flow equations can help to determine whether the value of a variable is always correct, sometimes incorrect, or always incorrect at a given program location. A definition d of a variable v reaches a program location p if there is a path in the flowgraph from d to p that does not redefine v. Therefore, reaching definitions data-flow equations show, for each program location p, which definitions d might have created the value of a variable v at p. Reaching definitions data-flow equations are normally created by using the union operator to propagate a definition set through a flowgraph node that has more than one predecessor [Aho+77]. I will call this form may-reach definitions. In contrast, must-reach definitions are formed by replacing the union operator with the intersect operator in the reaching definitions equations (similar to available expressions data-flow equations [Aho+77]). Suppose that may-reach and must-reach definitions are computed for the unoptimized flowgraph, yielding equations UMay and UMust, and are also computed for the optimized flowgraph, yielding equations OMay and OMust. For a program location p and a definition d of a variable v, the following table describes the conditions under which v will have an incorrect value at p in the optimized program. This information could also be gathered directly rather than by comparing before- and after-optimization equations: always-correct and always-incorrect information could be propagated through forks and joins. The debugger could use the information to provide truthful behavior for displaying variable values at p. To provide expected behavior at a breakpoint, the information could be used to determine which definitions of which variables must be saved. v incorrect at p? d always yes yes no no sometimes yes yes no yes sometimes no yes no no 4.4 Summary As many sections of this chapter have mentioned, the different solution techniques proposed here can be used in combination, each technique supporting the rest. With the wide variety of possibilities, it is clear that program development systems need not produce only fully optimized programs with no debugging capabilities on the one hand and completely debuggable programs with no optimization on the other. The selection of a suitable level of use for each technique to provide the optimal mix of optimization and debugging capabilities remains an open research topic. The remainder of this dissertation discusses Navigator, a program development system that usually provides expected debugging behavior for two control-flow optimizations: inline procedure expansion and cross-jumping.