<> Chapter 3 Optimization and the Problems It Causes for Debugging This chapter examines the conflict between optimization and debugging. The first section briefly reviews program optimization. The second section describes the effects of common optimizations on the operation of a conventional debugger. The third section surveys the previous work on debugging optimized programs. The next chapter will discuss general solution methods for the problems described here. 3.1 Program optimization Many current compiler textbooks include detailed descriptions of optimization techniques [Aho+77, Barrett+79]. There are also several books that describe specific optimizing compilers [Wulf+75, Anklam83, Chow83]. As a result, this dissertation assumes a basic familiarity with optimization topics. This section serves as a brief reminder of some of the major ideas. Program optimizations are transformations that are applied to a program, usually by a compiler, to decrease its execution time and/or space consumption. An optimized program exhibits the same input/output behavior as its unoptimized counterpart but is faster or smaller or both. Classical optimizations are mechanical transformations that do not alter the basic algorithm implemented by a program. Optimizations can have a variety of effects. For example, optimizations can move or delete statements. Variables may be assigned values at different relative locations in the computation specified by the optimized program than in the computation specified by the source program, or the program's flow of control may be altered. The relative ordering of execution events (such as calling a procedure) may be changed. Optimizations can also eliminate or overlay variables or allocate them to registers in some program regions. Optimizations can be applied at a number of different levels. Source-to-source transformations are performed upon the source text; source transformation systems have generally been implemented as interactive systems, so that the system can rely upon the programmer to verify the more complex enabling conditions for a given transformation [Loveman77, Standish76, Arsac79, Gilbert82]. An example of a source-to-source transformation is loop jamming [Loveman77], which combines the bodies of loops that have the same number of iterations. Its enabling condition is: for all i > j, the results computed in iteration i of loop A are not needed to compute interation j of loop B, where loop A precedes loop B. In this dissertation, interactive source-to-source transformations are considered to be part of the program development process; the resulting transformed program is the source program. Nevertheless, the category of optimizations discussed here does include source-to-source transformations that are performed completely automatically. Interprocedural optimizations are applied across procedure boundaries and require interprocedural analysis [Barth78, Banning79]. A common example is common subexpression elimination across a procedure call, for which the calling procedure must be certain that the called procedure does not modify any variable that participates in the common subexpression. A basic block is a maximal straight-line section of code with a single entry, a single exit, and no internal control flow. Global optimizations are applied across basic blocks and require program flow analysis [Aho+77]. Local optimizations are applied within a basic block. Intra-statement optimizations are applied within a single source statement, which usually corresponds to a subset of a basic block. However, a statement that includes conditional expressions comprises multiple basic blocks. Optimizations can be performed on a variety of representations of the program: the source text, an abstract syntax tree, a flowgraph representation, a linear intermediate form (such as triples or quads [Aho+77]), or the (linear) object code. Local and global optimizations are often applied to an intermediate form of the program, frequently a flowgraph or quads. Object code optimizations are performed on the generated assembly or machine code and may be local or global. These optimizations are machine-dependent and are often referred to as peephole optimizations [McKeeman65]. Figure 3-1 lists the optimizations under consideration, classifying them into global, local, and object code optimizations. The next section explains and gives examples of many of them in the context of illustrating their effects upon debugging. Detailed descriptions of these optimizations, their enabling conditions, and implementation algorithms can be found in the literature [Aho+77, Allen+72, Wulf+75]. Global optimizations Code hoisting Code motion Constant folding Constant propagation Copy propagation Dead store elimination Dead variable elimination Global common subexpression elimination Global register allocation Induction variable elimination Inline procedure expansion Loop unrolling Storage overlaying Local optimizations Code reordering Eliminated stores Local register allocation Object code optimizations Adjacent-instruction collapsing Bliss-11 auto-increment/decrement Branch chaining Cross-jumping Instruction scheduling (i.e., instruction reordering) Instruction simplification Figure 3-1. Optimizations. 3.2 Problems optimization causes for debugging As we mentioned in Chapter 1, the effects of optimizations cause problems for interactive source-level debuggers. When optimizations move or delete statements, or when they alter the program's flow of control, it may be difficult for the programmer to specify breakpoint locations or for the debugger to report exception locations. When optimizations eliminate or overlay variables or allocate them to registers, the debugger may be unable to display the values of those variables. When variables are assigned values at different relative locations in the compiled code than in the source program, the debugger's ability to display the values of the variables and to report the current execution point may both be compromised. When optimizations change the relative ordering of execution events (including arriving at a breakpoint), the debugger's reports of successive program execution points may confuse the programmer. According to Gaines, the first step in diagnosing a program error is the knowledge that the program reaches an incorrect state during its execution. There are two possible causes for the incorrect state: some variables could have incorrect values, or there could be an error in the program's flow of control [Gaines69]. If a programmer uses a conventional debugger to monitor the behavior of an optimized program, the effects of the optimizations can cause the debugger's responses to: This section categorizes and describes the problems that optimization creates for a conventional debugger. The problems are divided into problems with data, code, and history elements of the program execution. Optimizations that can cause each problem are listed. An example will be given for each difficulty. In these examples, the original unoptimized source text is shown on the left, and corresponding optimized text is shown on the right. Since the optimizations in question are not always expressible in the source language, the corresponding text may not be a legal source program fragment. The source language is extended as necessary to show the result of an optimization. For instance, an optimized version may include register values, address computations for accessing components of structured data, or expression computations. The chosen demonstration language is Cedar, which is a relative of Mesa [Mitchell79, Geschke+77]. Readers familiar with Algol or Pascal should have no trouble with the examples. One minor note regarding syntax: Cedar uses braces ({}) and BEGIN - END pairs interchangeably; a comment is preceded by two dashes (--). The discussion first focuses on a simple debugger with the basic examination and control functions described in Section 2.2.1.1. Recall that these functions are: 1. Report the current execution location in source-level terms. 2. Display values of visible variables in the appropriate format for their declared types. A variable's environment is normally determined by the current execution location. 3. Display the current procedure traceback. 4. Set the variable environment to some other program statement. 5. Signal a user interrupt. 6. Set and remove location breakpoints. A location breakpoint can be placed at any source statement, and it is reached immediately prior to the execution of that statement. 7. Resume the computation. Later sections consider debuggers that allow modifying the values of variables and debuggers with advanced capabilities. 3.2.1 Data problems This section discusses ways in which values of variables might be reported incorrectly when a conventional debugger is applied to an optimized program. Hennessy has considered this problem in some detail [Hennessy79]. Suppose that a debugger does not report the value predicted by the execution model for variable x at some arrival at program location p. From the system's point of view, the incorrect report can have several underlying causes: 1. The variable x has been removed by the optimizer, i.e., there is no storage location for x at p. 2. The correct value of x at p is available, but the debugger is looking in the wrong place for it. The value of x can be in an unexpected location either because the optimizer chose to keep the value of x temporarily somewhere other than where the symbol table reports, or because the debugger can't find the correct symbol table information for point p. 3. The correct value of x at p is not available: either the correct value has not been assigned to x (or even computed) yet (Hennessy calls this a roll-forward variable); or some other value has been assigned to x early, destroying the correct value (Hennessy calls this a roll-back variable). From the user's point of view, the incorrect report can manifest itself in two different ways: 1. The variable x is unknown to the debugger at p. 2. The value of x at this arrival at point p is not what the source-level execution model predicts from the source program. 3.2.1.1 Variable is unknown to debugger Three of the optimizations under consideration can cause a variable to have no storage location at a program location: constant propagation, copy propagation, and induction variable elimination. These are global optimizations. An example of copy propagation appears in the next section. Constant propagation The first example shows a case of constant propagation. The variable debug is constant throughout the program, so the optimizer has replaced all occurrences of debug by FALSE. Thus at any source statement in the program, a conventional debugger would report debug as an unknown variable. 10 debug _ FALSE; . . . . . . 30 IF debug OR a > b THEN 30 IF a > b THEN 31 Print[a]; 31 Print[a]; Induction variable elimination The next example shows an instance of induction variable elimination. The variable i and the computations 2*i and i + 1 have been replaced by the cheaper t1 and t1 + 2. Again, at any source statement in the program, a conventional debugger would respond to a request to display the value of i with an `unknown variable' message (assuming that there are no other uses of i in the program). If there were other uses of i in the program, the optimizer might insert the statement i _ t1 div 2 after statement 14. Then i would be known to the debugger, but it would have an incorrect value between statement 10 and statement 15. 10 i _ 1; 10 t1 _ 2; 11 WHILE i <= 10 DO 11 WHILE t1 <= 20 DO 12 a[2*i] _ 5; 12 a[t1] _ 5; 13 i _ i + 1; 13 t1 _ t1 + 2; 14 ENDLOOP; 14 ENDLOOP; 3.2.1.2 Reported variable value is incorrect The value of a variable can be incorrect at a program location in several ways. The value can be incorrect because an assignment to the variable was delayed (assignment removal is a special case of this) or because an assignment to the variable was hoisted. An assignment is hoisted when it is moved to a point that is earlier in the program's control flow. A variable's value can also be incorrect because of storage allocation. The value can be incorrect every time that the program location is reached, or the value can be incorrect only some times that it is reached. The first two examples demonstrate cases in which a variable's value is always incorrect at a program location; the remaining examples illustrate cases in which the value is sometimes incorrect there. Storage overlaying In the following example, storage overlaying has been performed. The program never uses i and j simultaneously, so the storage allocator has assigned the same memory location for both variables. Therefore, if the value of i is requested at any program point after statement 30, a conventional debugger cannot respond with the correct value. A naive debugger might display j's value instead. 10 FOR i IN [1..n] DO 10 FOR i IN [1..n] DO 11 IF a[i] = key THEN 11 IF a[i] = key THEN 12 {found _ TRUE; 12 {found _ TRUE; 13 EXIT}; 13 EXIT}; ENDLOOP; ENDLOOP; . . . . . . 30 FOR j IN [1..m] DO 30 FOR i IN [1..m] DO 31 b[j] _ 2*j; 31 b[i] _ 2*i; ENDLOOP; ENDLOOP; Copy propagation This example shows a case of copy propagation. The variable x has the same value as variable y, so the optimizer has replaced the use of x by a use of y in statement 11. Therefore x has an incorrect value between statement 10 and statement 30. Both forms of propagation (that is, constant or copy propagation) can cause either an incorrect value or an unknown variable. 10 x _ y; 11 z _ x + 3; 11 z _ y + 3; . . . . . . 30 x _ 39; 30 x _ 39; Local code reordering Local code reordering can also cause the value of a variable to be reported incorrectly. In this code fragment, local common subexpression elimination has made it desirable to store the new value of f before statement 11 rather than after it. Thus the value of f is always incorrect at statement 11. [Of course, the previous value of f might have been b + c.] 10 a _ b + c; 10 R1 _ b + c; a _ R1; f _ R1; 11 d _ 5; 11 d _ 5; 12 f _ b + c; 13 g _ x; 13 g _ x; Global register allocation The following example shows the effect of global register allocation. The optimizer noticed that the variable remxy is used frequently in the WHILE loop, and therefore placed remxy in machine register 1 for the duration of the loop. Suppose that the user places a breakpoint at statement 13 and requests the value of remxy when the breakpoint is reached. A conventional debugger would report the value contained in the normal storage location for remxy, rather than the value contained in register 1. Therefore, the reported value is correct the first time that statement 13 is reached, but on subsequent loop iterations, the reported value does not reflect the subtraction(s) of y. 10 remxy _ x; 10 remxy _ x; 11 y _ F[z]; 11 y _ F[z]; R1 _ remxy; 12 WHILE remxy > 0 DO 12 WHILE R1 > 0 DO 13 remxy _ remxy - y; 13 R1 _ R1 - y; ENDLOOP; ENDLOOP; remxy _ R1; . . . . . . 30 Print[x,y,remxy]; 30 Print[x,y,remxy]; Local register allocation holds the result of a computation in a register and uses it for other computations within a single basic block. It causes the variable's value in memory to be incorrect at every arrival at every program location between the computation and the store of the value. Of course, if the computed value is not used outside the basic block, the store may be deleted altogether. Code motion As the example shown in Figure 1-1 and Figure 2-2 has illustrated, code motion out of loops also causes a variable to have an incorrect value on some arrivals at certain program locations. The code motion optimization can only be applied when a loop-constant piece of code will always be executed before the loop exits. In the execution of the unoptimized version of the program, if the loop's execution begins at execution state s1, and an assignment (moved in the optimized version) is executed at state s2, then the assigned variable will have an incorrect value at every state t between s1 and s2. In the following example, x's value is incorrect only the first time that statement 11 is reached. x _ 4*c; 10 FOR i IN [1..10] DO 10 FOR i IN [1..10] DO 11 a[i] _ i; 11 a[i] _ i; 12 x _ 4*c; 13 b[2*i] _ a[i] + x; 13 b[2*i] _ a[i] + x; ENDLOOP; ENDLOOP; A combination of optimizations: global dead store elimination and code hoisting The following example is fairly complex. The value of variable j defined in statement 15 is never used, so global dead store elimination removes statement 15. The expression b + c is a very busy expression (it is used on multiple flowgraph paths before variables b or c are redefined), so its computation is hoisted to precede statement 10. After statement 15 is removed, the value of that expression can safely be assigned directly to j. Consider the behavior of a conventional debugger on the optimized program fragment. At statements 10 and 14, the reported value of j is always incorrect, because the new store to j precedes those statements on every possible execution path to them, whereas the old store followed those statements on every path to them. At statement 12, the reported value of j is always correct. At statement 16, the reported value of j is always incorrect, both because the dead store at 15 was removed and because the store from 18 was hoisted. At statement 17, the reported value of j may or may not be correct. If execution came through the THEN clause, the value of j is correct. However, if execution came through the ELSE clause, then the situation of statement 16 holds, and the value of j is incorrect. value of j before stmt j _ b + c; 10 IF cond THEN 10 IF cond THEN always incorrect 11 {j _ b + c; 12 w _ j} 12 {w _ j} always correct 13 ELSE 13 ELSE 14 {w _ b; 14 {w _ b; always incorrect 15 j _ 1; 16 y _ c}; 16 y _ c}; always incorrect 17 z _ w + 3; 17 z _ w + 3; sometimes incorrect 18 j _ b + c; 19 x _ j - 17; 19 x _ j - 17; always correct 3.2.2 Location problems The preceding examples were carefully constructed to require only a straightforward mapping between source text and object code. This section describes difficulties with the program location: displaying the current location, displaying the current procedure traceback, setting the variable context, and setting and fielding breakpoints. Recall that the debugger uses the mapping from program statements to object code locations to set breakpoints. It uses the mapping from object code locations to program statements to field breakpoints and display the current execution location. Obviously, if either of these mappings do not correctly reflect the correspondence between the source text and the object code, there will be problems with the associated debugger functions. How should these mappings be defined? A conventional nonoptimizing compiler generates separate object code for each statement, in the order that the program statements appear in the source text. In this case, the beginning (or end) of the object code for a source statement is unique, and each object code instruction is part of the code for exactly one executable program statement. Some pieces of source text, such as ENDs or comments, may not generate any object code. For other pieces of source text, there may be several possible interpretations of what the corresponding object code is. For example, if a breakpoint is requested on a FOR statement, the breakpoint could refer to the initial entry to the statement or to the entry to each iteration. Similarly, if execution is interrupted inside loop control code at the end of a FOR loop, the source location could be reported as inside the last statement in the loop, as inside the ENDLOOP, or as inside the FOR loop header. Nevertheless, for a given language or debugger, once an interpretation for these cases is defined, the mapping between source text and object code is straightforward. Optimization can complicate these mappings in several ways. First, an optimizing compiler may not emit any object code for an executable statement. Second, optimizations may preserve the order of execution of instructions that are meaningful to the programmer, while creating correspondences between the source text and the object code that are no longer monotonic. Finally, optimizations may rearrange the order of execution of programmer-specified instructions, causing several additional problems. These complications will be discussed in order of their complexity. 3.2.2.1 No corresponding object code If an optimizing compiler fails to emit any object code for an executable source statement, it makes it difficult for the debugger to place a breakpoint on that statement. Note that it does not prevent displaying the current execution location properly, because execution can never reach a nonexistent location. Two optimizations can cause this situation: unreachable code elimination, which removes program statements that control will never reach (given the values of the variables known at compile-time), and dead code elimination, which removes program statements that compute values that are never used. If the removal optimization also removes the statement marker, and the statement map uses character positions or line numbers, the statement will look like a comment to a conventional debugger. Therefore, the debugger will place the breakpoint at the statically preceding statement, which may or may not dynamically precede the desired statement. If the statement map uses statement numbers, the debugger can at least determine that there is no entry for that statement, and give a warning message. If the removal optimization leaves the statement marker in place, the breakpoint will be placed at the following statement, which may or may not dynamically follow the statement. No warning will be given to the user, because it will look to the debugger as if the statement were a statement that does not ordinarily generate code. The problems associated with altering variables from the debugger so as to direct program flow into areas that have no corresponding object code are discussed in Section 3.2.4. A similar problem is that a corresponding stopping place may be missing, even though code has been generated to perform the specified operation(s). Such optimizations are typically machine-dependent optimizations. For example, in the following program fragment, the optimizer has generated a single instruction to increment and test the value of i. Therefore, there is no place for a breakpoint corresponding to statement 11. This problem occurs whenever a source statement boundary is embedded within a machine instruction. Entire loops can be eliminated in this way on machines with vector [Russell78] or string manipulation instructions [Morgan+82]. 10 i _ i + 1; 10 IF (i _ i+1) # 0 THEN -- one instruction 11 IF i # 0 THEN 12 x _ y/i; 12 x _ y/i; Optimization can also cause the flow of control to bypass a location that would not be bypassed in the unoptimized version of the program. The following program fragment searches a linked list for a desired record. If the record is not in the list, it is considered an error. The branch-chaining optimization (often a machine-dependent optimization) has been applied to the fragment, eliminating the second test of the variable found. If the programmer wishes to examine the desired record at the loop exit, a natural place to place a breakpoint is at statement 15. However, the optimized version of the fragment arrives at statement 15 only when the list is exhausted without finding the desired record. 10 found _ FALSE; 10 found _ FALSE; 11 WHILE ptr # NIL AND ~ found DO 11 WHILE ptr # NIL DO IF found THEN GO TO 17 12 IF ptr.key = desired THEN 12 IF ptr.key = desired THEN 13 found _ TRUE 13 found _ TRUE ELSE ELSE 14 ptr _ ptr.next; 14 ptr _ ptr.next; ENDLOOP; ENDLOOP; 15 IF ~ found THEN 15 IF ~ found THEN 16 ERROR['Not in list']; 16 ERROR['Not in list']; 17 -- next statement 17 -- next statement 3.2.2.2 Control-flow optimizations: one-to-many and many-to-one source-to-object correspondences A somewhat more complex situation arises when optimizations preserve the order of execution of instructions that are meaningful to the programmer, but create correspondences between source text and object code that are no longer monotonic. I call these optimizations control-flow optimizations; they rearrange the object code for a program either by merging identical code sequences, thereby making the program smaller, or by copying code sequences, thereby making the program faster. Merging optimizations Examples of merging optimizations include procedure discovery and cross-jumping. Procedure discovery locates identical sequences of instructions and forms a single new procedure that is called from each original location. Cross-jumping is a special case of procedure discovery that examines code paths that join. If the tail portions of any two of the paths are the same, cross-jumping moves the join point for those two paths from its original location backward to the earliest identical point and deletes one copy of the identical code. Merging optimizations create a many-to-one correspondence from the source text to the object code. A many-to-one source-to-object correspondence creates problems with setting breakpoints and displaying the current execution point at a breakpoint or program exception. If a merged region contains a procedure call, it can also cause problems with a procedure traceback. The following example shows an instance of cross-jumping. Part of statements 12 and 14 and all of statements 13 and 15 are identical, so the earlier copy has been replaced by a jump to the later copy. In a conventional system, one of the merged regions appears in the statement map as though it has been deleted; therefore, a request for a breakpoint on a source statement in that region will result in a breakpoint on a statically adjacent statement. In this case, the debugger will place a breakpoint for statement 13 on statement 14 (which, unfortunately, will never be executed after statement 12). The debugger would probably provide feedback about where the breakpoint was actually placed, but a user would find it frustrating if not confusing to be prevented from setting a breakpoint at what appears to be a reasonable place. A breakpoint requested inside the other merged region (at statement 15, for example) will be placed "correctly" in terms of the desired semantics, at least. However, because there is actually only one piece of object code for the merged region, the breakpoint will be reached more often than it should be (in this case, twice as often). Also, even though user-specified computations have not been rearranged and values of variables are not being kept in cached locations, the values of variables will appear to be incorrect whenever the arrival at the breakpoint does not actually correspond to the requested breakpoint location. Thus the user expects the value of i to be odd every time that the breakpoint is reached; half of the time i will be even. Each time, the debugger will report that the current location is statement 15. A conventional debugger will not be able to display the location of a program exception inside a merged region correctly. Suppose that the array x has been declared to have subscripts in the range [1..5]. The last time through the loop, the store to x[i] will attempt to store into x[6]. If array bounds checking is in force, the resulting program exception will be reported as occurring inside statement 14, even though the value of i is even. This occurrence demonstrates that the mapping from object locations to source statements must have a finer granularity than the reverse mapping, since a conventional debugger can correctly place a breakpoint on statement 14. 10 FOR i IN [1..6] DO 10 FOR i IN [1..6] DO 11 IF Even[i] THEN 11 IF Even[i] THEN 12 {x[i] _ i; 12 {; GO TO L; 13 filled[i] _ TRUE} 13 } ELSE ELSE 14 {x[i] _ 2*i; 14 {; L: ; 15 filled[i] _ TRUE}; 15 filled[i] _ TRUE}; ENDLOOP; ENDLOOP; There is also a connection between reporting the correct source location and displaying the values of variables. Suppose that the THEN arm of the IF statement declared a local variable y, while the ELSE arm declared a local variable z. If the debugger always thought that the execution of filled[i]_TRUE corresponded to source statement 15, then it would never allow the user to display the value of y there, because y is not in statement 15's context. Copying optimizations Examples of copying optimizations include loop-unrolling and inline procedure expansion [Allen+72]. Loop-unrolling makes multiple copies of the statements inside a short loop in order to reduce the effects of loop overhead. Inline procedure expansion, also known as procedure integration, replaces a call to a procedure by an instance of the actual code of the procedure in order to save the execution time associated with procedure linkage (such as moving parameters, saving and restoring registers, and allocating a new activation frame). Inline procedure expansion may also provide opportunities for further optimizations. Copying optimizations create a one-to-many correspondence from the source text to the object code. Such a correspondence makes it difficult to set breakpoints and to provide procedure tracebacks. Inline procedure expansion is illustrated in the following example. Procedure P1 has been expanded at both of the calls to it from P3. Since the program contains no other calls to P1, no freestanding code has been generated for it. Therefore, any breakpoint requested inside P1 will be placed on statement 14. In order to achieve an effective breakpoint at statement 11, the user must be aware of the optimization and place two breakpoints: one on statement 31, and one on statement 33. An effective breakpoint on statement 12 cannot be obtained. If a program exception occurs inside an expanded procedure, it will be reported at its top-level invocation point, causing the user to think that something has gone awry in the procedure linkage. Finally, procedure traceback information will never include an expanded procedure. If an inline procedure calls another procedure (which may or may not also be expanded), an expanded procedure could logically appear anywhere in the procedure call chain, and inline calls could be nested. In the first call to P1, the parameter (1) is sufficiently simple that the compiler substituted it directly for the use of i. This optimization is called subsumption; it is similar to copy propagation. This substitution would create problems for displaying the value of i if a breakpoint could be made to occur within that expansion of P1, or if a program exception occurred there. 10 P1: PROC[i: INT] = 11 {a _ i; 12 count _ count + 1}; 13 P2: PROC = 14 {x _ 5}; 14 {x _ 5}; . . . . . . 30 P3: PROC = 31 {P1[1]; 31 {a _ 1; count _ count + 1; 32 b _ 2; 32 b _ 2; 33 P1[5 - b]; 33 i _ 5 - b; a _ i; count _ count + 1; 34 x _ a + b}; 34 x _ a + b}; 3.2.2.3 Code rearrangement: problematical correspondences between source and object The user who monitors a computation need rarely be concerned when compiler-introduced computations for addressing, subexpression computations, and so on are moved or eliminated; these compiler temporaries are outside the source-level model of execution anyhow. [As discussed below, such movement can cause history problems for a monitoring debugger or problems with modifying the values of variables for more sophisticated debuggers.] However, if the optimizer rearranges the order of execution of programmer-specified instructions, several new problems arise. Conventional debugger behavior when entire statement moves as a unit Let us first consider the case in which optimizations move an entire statement as a unit. Suppose that code motion has moved an assignment statement with a loop-constant expression out of a loop, as shown below. If the optimizer leaves the statement marker in its original position, this movement will look the same in the statement maps as a deleted statement. Therefore a breakpoint requested on statement 32 will be placed at the same spot as a breakpoint on statement 33. To decide whether this placement causes a problem, the questions that the programmer wishes to answer via this breakpoint (as discussed in Section 2.2.2) must be considered. When the breakpoint at statement 32 is reached, the programmer has several expectations concerning the program execution state. These expectations are a direct reflection of the unoptimized program's expected behavior, i.e., the program execution model. First, she expects that the assignment to x will not have been executed yet, so x should have its previous value, 1. Second, if a breakpoint is also placed on statement 33, she expects that x's value will change between the first and second breakpoints on the first time through the loop. Third, if the debugger allows variable modification, the user expects that changing the value of c at the breakpoint will be reflected in a change in the value of x. There is also a more subtle expectation in this particular example: if a breakpoint is also placed inside the function Fun, the breakpoint on statement 32 should precede that breakpoint. None of these expectations will be valid, and considerable confusion could result. Suppose that range-checking is enabled and Fun[c] returns a value that is outside x's range. A runtime exception will be triggered, and the debugger will report that the current execution point is inside statement 29. This report will lead the user to think that computing d+1 has caused the illegal value. 10 x _ 1; 10 x _ 1; . . . . . . 29 c _ d + 1; 29 c _ d + 1; x _ Fun[c]; 30 FOR i IN [1..10] DO 30 FOR i IN [1..10] DO 31 a[i] _ i; 31 a[i] _ i; 32 x _ Fun[c]; 32 33 b[2*i] _ a[i] + x; 33 b[2*i] _ a[i] + x; ENDLOOP; ENDLOOP; Code hoisting and code reordering can cause similar problems. Meaning of mapping between source and object: semantic and syntactic mappings It may be difficult to decide what object code corresponds to a given program statement. For simplicity, we first consider the case in which optimizations move an entire statement as a unit. In this case, the object code location corresponding to the statement can be defined in two distinct ways. One natural definition yields a semantic mapping: namely, the corresponding object code is the object code that performs the operations specified by the program statement. Suppose that code motion moves an invariant assignment statement s out of a loop. Then, if execution of s causes a program exception, that exception should be reported as occurring within s. (Reporting the location of an exception utilizes the object-to-source mapping.) However, the information that a user wants to discover about her program as a result of a location breakpoint may not be discernable from any version of semantic mapping. For example, the semantic mapping of a loop-invariant assignment statement is executed only once per entire loop execution. Therefore, a breakpoint based on a semantic mapping cannot be used to examine variables each time around a loop if it is unknowingly placed on an invariant statement. The syntactic mapping has quite different properties. It preserves the position of a statement with respect to its neighboring statements; it can also be viewed as an indicator of "how far into the source program" the statement is. Notice that source and object entries increase monotonically together in the syntactic mapping for computation-reordering optimizations, as in statement mappings for conventional compilers. In contrast, the semantic mapping can be ordered either by source location or by object location, not both at once. Semantic mapping when entire statement moves as a unit When optimizations rearrange statements, the object location corresponding to the end of one source statement is no longer equivalent to the object location corresponding to the beginning of the next source statement. Therefore, for the source-to-object mapping (i.e., setting location breakpoints), we must again consider the information that the user is attempting to gather with the breakpoint. If the user wants to examine the variable state before the execution of a statement, either to see the value of a variable before an assignment to it, or to see (and possibly modify) values of variables that will be used to evaluate the statement, the breakpoint should be placed before the new location of the object code. If the user wants to see the effects of executing a statement, the breakpoint should be placed after the new location of the object code. With a conventional debugger, a semantic-after breakpoint is typically placed on the following statement(s), or is performed by placing a breakpoint on the desired statement and executing a single-step command upon arrival at the breakpoint. The following example illustrates semantic and syntactic mappings for a program fragment to which code motion has been applied. Statement 12, x _ 4*c, is loop-invariant and has been moved to precede the loop. semantic-before[12]10 FOR i IN [1..10] DO semantic-after[12]11 a[i] _ i; a[i] _ i; 12 x _ 4*c; 13 b[2*i] _ a[i] + x; syntactic[12] ENDLOOP; ENDLOOP; Semantic mapping when statement is scattered Let us now consider the situation in which optimizations scatter the execution of a statement both statically and dynamically, creating disjoint groups of instructions that carry out the actions indicated by the statement. The definition of the semantic mapping for such a statement is not fully specified by the discussion above. Recall that in a conventional system, a location breakpoint for a statement will occur before the execution of any object code for the statement. The possibilities for the appropriate location of a semantic breakpoint are: 1a) Before first subcomponent evaluation, or on the first instruction that corresponds to the evaluation of any subcomponent of the statement. This is the strictest definition, and is necessary if the user wants to alter values that will be used by the statement. If global common subexpression elimination is applied, such a breakpoint may be placed far away from any state changes associated with the statement. This may mean that not all variables accessed by the statement have yet been given the value that will be used by the statement. 1b) Before each subcomponent evaluation. Definition 1a can be extended to place a breakpoint on the first instruction of every corresponding group of instructions. Such a definition would not give expected performance, since a single breakpoint on a statement that is executed only once in the program's execution could cause many suspensions of the program, but there are cases in which it would be useful for a sophisticated user. However, combined with debugger-generated displays of the current syntactic location and lists of expressions and assignments that have been evaluated early or will be evaluated late, it could give a good picture of the effects of optimization on the statement. 2) Before first state change, or on the first instruction that corresponds to any alteration of user-visible execution state specified by the statement. An alteration of user-visible execution state includes function activations and changes to user-declared variables visible from the statement's scope. It is important to precede function activations because the function may have side-effects, or the user may have placed a breakpoint inside the function or one of its callees (and this breakpoint should be reached before that inner breakpoint). It is important to precede changes to visible variables so that embedded assignments, such as FOR loop variable assignments, assignment expressions, or assignments to visible variables caused by function calls, will not yet have occurred. Definition 2 attempts to bring a location breakpoint closer to the state-changing portion of the object code for a statement. It relaxes the necessity to place a location breakpoint for a statement that includes the evaluation of a simple common subexpression (that does not involve function calls) on the object code that evaluates that subexpression. For example, the subexpression's code may have been moved outside a loop. However, this definition can cause unexpected behavior for variable modification from the debugger: if a subexpression has already been evaluated, then using the debugger to modify its components can no longer affect its value. However, relaxing this constraint could cause confusion even if the debugger does not allow modifying the values of variables. Suppose that a simple common subexpression causes a program exception, such as zero divide or overflow. Then the exception will occur before the breakpoint, rather than after the breakpoint, as it would in a conventional system. The IBM FDS system's "no source change" mode would allow only simpler common subexpressions to appear before the breakpoint for a statementnamely, those common subexpressions that involve only compiler temporaries. This mode could still create an exception before the breakpoint, unless the optimizer restricted common subexpressions even further, to those common subexpressions that could be guaranteed not to cause exceptions. 3) Before first variable state change, or on the first instruction that corresponds to any alteration of user-visible variable state specified by the statement. An alteration of user-visible variable state is any change to a user-declared variable visible from the statement's scope. Thus embedded assignments will not have occurred yet, but some breakpoints or exceptions corresponding to subcomponents of the statement may already have occurred, as noted in the previous case. 4) Before key statement effect, or on the first instruction that corresponds to the core semantic effect of the statement. The definition of the key statement effect for each statement would be given, ideally as a part of the language definition. For example, the key statement effect of an assignment statement is the assignment, and the key statement effect of a GOTO statement is the control transfer. These possible breakpoint definitions can be extended in the obvious way to handle end-of-statement location breakpoints. 3.2.3 History problems The preceding sections have shown the problems that optimizations cause for displaying values of variables and displaying current execution locations. The history element of a computation reflects sequences of variable values and/or execution locations, so naturally it will be affected as well. However, optimizations can change the history component of a computation without changing the other components. For example, the effects of altering the order of evaluation of expressions can be visible even in a debugger with statement granularity. Suppose that register scheduling has interchanged two function calls inside a source statement so that fewer registers will be needed to perform the computation, as shown below. If the user traces the computation at medium or fine granularity, the execution ordering of the optimized program will be different than the user might expect. Similar changes would be seen if the user requests two breakpoints: one inside F and another inside G. 10 v _ F[a-b] + G[c+d]/(c*e); 10 v _ G[c+d]/(c*e) + F[a-b]; Many languages refuse to specify the order of evaluation of expressions so that the optimizer can perform such rearrangement freely. Nevertheless, similar problems with breakpoint or trace ordering arise if any expression containing a function call is moved, either to an earlier or later program point. For instance, if a common subexpression contains a call to the function H, hoisting its evaluation could cause breakpoints inside H to occur before other breakpoints. When such a common subexpression is stored to a compiler temporary, only these history aspects of the computation are affected. In contrast, hoisting an entire assignment statement causes incorrect variable values [as discussed in Section 3.2.1]. 3.2.4 Allowing the debugger to modify values of variables Additional complications arise if the debugger allows the values of variables to be modified. Recall that a roll-back variable is one whose assignment has been performed early, and a roll-forward variable is one whose assignment has been delayed. If a value is assigned to a roll-back variable from the debugger, then the variable will fail to take on the new value at the appropriate spot. If a value is assigned to a roll-forward variable from the debugger, the value will later be overwritten with the delayed store. In induction variable elimination, using the debugger to assign a value to the original induction variable will have no effect on the derived induction variable. In common subexpression elimination and hoisted stores, assigning to a component of the expression from the debugger will not affect the computed result (because it has already been performed). Similarly, using the debugger to assign a value to a component of an expression whose computation is delayed will cause a different value to be computed. Using the debugger to assign a value to a variable whose current value is in a register (or other faster memory) will have no effect. Using the debugger to assign a value to a variable that has participated in storage overlaying can have very strange results: the desired variable's value is not altered, and some other variable's value is changed. If the variable being assigned from the debugger had a constant value in that area of the original source, its value may have been propagated to other statements via constant propagation. Assigning a value to it from the debugger will have no effect; this situation is a special case of roll-back variables, in which the assignment is performed at compile time. If any of these values is later used to affect control-flow, even odder results may be seen. These situations can be even worse if the programmer computes the new value to be assigned from the debugger by constructing expressions that use variables whose values are incorrect. A similar problem is created when the value of a variable that remains constant throughout the program causes different code to be generated. Suppose that a variable has a constant value at compile-time, implying a particular control-flow. Then code to perform tests may be skipped over, or worse, deleted. The worst case occurs when a debugging variable is set to false in the beginning of a program and is used throughout the program to control debugging output. An optimizing compiler would typically remove all of the statements that produce debugging output as unreachable code; therefore, using the debugger to set the debugging variable to true cannot have the desired effect. Special debugging routines for printing or checking the integrity of data structures may also be removed by this transformation, and will hence not be available from the debugger. The compounded effects of constant propagation or other assumptions can also trigger different instruction selection during code generation or peephole optimization. If a variable's value is known to be 1, then an increment instruction may be used. Obviously the real problem here is that the variable's value was frozen at compile-time. Similarly, if a variable's value is known at compile-time to fit in one byte, then the compiler will allocate storage and generate instructions accordingly, with disastrous results if a value set from the debugger exceeds the one-byte size limit. This brings up the subject of runtime checking. Checks may be moved outside of loops or removed altogether based on the structure of the written program. Altering the values of variables from the debugger may well invalidate those assumptions, and the protection of the checks may be subverted. It is difficult (perhaps even impossible) to detect these second-order effects. 3.2.5 Effects of optimization on advanced debugger capabilities This section considers the effects of optimization on several more complicated debugger capabilities: single-stepping, data breakpoints, and assertion-driven breakpoints. It also considers the use of a debugger as a performance monitoring tool. 3.2.5.1 Single-stepping A major assumption of single-stepping is that the change in execution state (and the actions performed) between two debugger activations reflect the semantics of the program statement at the first activation. The change in execution state includes both data changes and control-flow changes. Optimizations can easily void this assumption. Branch chaining or combining simple source statements into powerful machine instructions can cause statements to be skipped during single-stepping. Code-reordering optimizations cause control-flow (i.e., statement ordering) anomalies and problems with values of other variables if a semantic mapping method is chosen; they cause statement semantics anomalies and problems with the values of specified variables if a syntactic mapping method is used. For fine-grained stepping, in which called functions are also single-stepped, the order of arrival at function applications is an additional problem. 3.2.5.2 Assertion-driven breakpoints By now, the reader may have begun to suspect that setting location breakpoints and displaying the values of variables are overly primitive operations for debugging optimized programs, and that higher-level facilities such as assertion-driven breakpoints might map more cleanly from the unoptimized program's execution to the optimized version. Unfortunately, this does not appear to be the case. It is easy to construct examples in which the unoptimized version of a computation will satisfy an assertion that will never be satisfied by the optimized version and vice versa. There may also be difficulties with the program location at which the assertion is satisfied, with the values of other program variables when the assertion is satisfied, and with the number of times that the assertion is satisfied. In the following example, the optimizer has noticed that the variable finished is always set to TRUE before the loop exit, so code motion and dead store elimination combine to set finished to TRUE directly at the loop entrance. 29 count _ 0; 29 count _ 0; 30 finished _ FALSE; 30 finished _ TRUE; DO DO 31 Read[char]; 31 Read[char]; 32 count _ count + 1; 32 count _ count + 1; 33 IF char = BLANK THEN 33 IF char = BLANK THEN 34 finished _ TRUE; 35 EXIT; 35 EXIT; 36 word _ Append[word,char];36 word _ Append[word,char]; . . -- process word -- process word . . ENDLOOP; ENDLOOP; Suppose the user requests a breakpoint when the assertion (count = 10) AND (finished = FALSE) holds, in order to examine words longer than ten characters. This assertion will be satisfied in the unoptimized program, but not in the optimized version. We now turn our attention to what previous researchers have done to solve the problems that optimization creates for debugging. 3.3 Survey of previous work There has been very little previous work on debugging optimized programs. Chapter 1 briefly described Hennessy's work on displaying the values of variables for optimized programs and Warren and Schlaeppi's proposed debugger for optimized programs. 3.3.1 The FDS debugger: (semi-)truthful debugging The Firmware Development System (FDS) was an experimental programming environment for microprocessors developed at IBM Yorktown [Warren+78]. Microprograms were written in a version of PL/I tailored for systems programming, compiled by a cross-compiler on a larger computer, and downloaded onto the microprocessor. Optimizations were considered essential for the microprocessor environment. Since programming errors were acknowledged as inevitable, some provision for debugging optimized programs had to be made. The FDS system was intended to furnish two modes of compiler/debugger operation. In "full optimization" mode, the proposed debugger would have tersely explained the ways in which optimizations had affected the program. Even for sophisticated users, this mode would have fallen short of providing truthful behavior. In "no source change" mode, optimizations could involve only compiler-introduced temporaries, or could occur only within a statement. In this mode, the FDS debugger could almost provide expected behavior. Complete recompilation and re-execution would have been required to switch from one mode to the other. The FDS system intended to implement a wide range of optimizations, from global optimizations to register allocation to object code optimizations, as well as a complete set of debugging capabilities, including breakpoints, displaying and modifying values of variables, and single-stepping. Its foremost goal was to help experienced programmers construct correct optimized programs for a microprocessor; it had no underlying philosophy or rules of debugging, such as providing expected or truthful behavior. Its primary approach to debugging optimized programs was to add static information: In "full optimization" mode, the optimizer would collect a variety of pre-computed tables during its optimization phases, such as an augmented storage map containing lists of storage locations for each variable at each source statement. In "no source change" mode, the optimizer would not collect special information. The FDS debugger planned to provide expected behavior only in its references to variables at breakpoints: using the augmented storage map, the debugger would have automatically translated a reference to a variable into an access of its current storage location(s). For other user interactions, the debugger would have provided only semi-truthful behavior. The debugger used only a syntactic statement mapping, so that execution would appear to progress in the expected way through the source program rather than jumping about randomly [see Section 3.2.2.3]. To handle code-reordering optimizations, the compiler would have recorded the movement of variable assignments and uses (by number only) that could affect or be affected by the values of variables at a given source statement. When the user asked to display or modify the value of an affected variable, the debugger would have listed all relevant assignments and uses that the optimizer had inserted or deleted. This technique was a step in the right direction, but it was hampered by the debugger's sole use of the syntactic statement mapping, as well as some skimpiness and inefficiency in its pre-computed tables. No attempt was made to reconstruct variable values. Furthermore, the FDS system made no provision for describing the state of a computation that was suspended between statement boundaries (for example, due to a program exception), either in statement mapping or in variable storage mapping. Optimizations that create one-to-many and/or many-to-one correspondences from source statements to object code locations, such as inline procedure expansion and cross-jumping, would not have been handled well either. Unfortunately, the FDS debugger project lasted only long enough to produce a design document, so no information is available regarding its performance or its effectiveness [Warren82]. 3.3.2 Hennessy: displaying expected values of variables Hennessy considered a more restricted problem: providing the most critical debugging features (displaying variable values, setting breakpoints, and reporting error locations) in the presence of some selected local and global optimizations [Hennessy79]. The local optimizations include common subexpression elimination, redundant store elimination, and code reordering. The global optimizations include code motion, induction variable elimination, and dead store elimination. These optimizations are sufficient to cause large differences in code ordering between the source text and the optimized object code. Because of the more complex problems associated with modifying the values of variables from the debugger, Hennessy disallowed that feature. Rather than pre-computing a variety of tables that might be useful at runtime, as the FDS system did, Hennessy assumed that the compiler would record the augmented global flowgraph used during the optimization process and the debugger would process the flowgraph as needed. Each node of the augmented flowgraph was a directed acyclic graph, representing one basic block of both the unoptimized and the optimized programs. Hennessy developed algorithms for processing the flowgraph at runtime that can usually detect when a variable's value is incorrect in terms of the source program. Furthermore, his algorithms can sometimes correct such values to display the expected value. He did not directly address the problem of finding the correct storage location for a variable's current value. In contrast to the FDS system, Hennessy used only a semantic statement mapping. As a result, the user could set a breakpoint to view the old value of a variable before an assignment without any concern about the effects of optimizations. Also, when reporting the location of a synchronous program exception, the semantic mapping indicates the statement responsible for the exception. However, the use of the semantic mapping restricted breakpoint placement for some common subexpression evaluations: when the start of multiple source statements corresponded to the same object code, a breakpoint was permitted only on the lexically first such statement. Hennessy's algorithms were implemented and tested on a small group of programs to demonstrate their correctness and feasibility, but they have not yet been incorporated into any program development system. 3.3.3 Teeple and Anderson: statement mapping for cross-jumping In an unpublished manuscript, Teeple and Anderson briefly discussed an even more restricted problem: how to set breakpoints in the presence of the cross-jumping optimization [Teeple+80]. As described in Section 3.2.2.2, this optimization merges identical tails of joining code paths, thereby causing one object code instruction to correspond to multiple source statements. At a breakpoint in the common code, the debugger needs help to determine which source statement is executing. Teeple and Anderson sketched an algorithm to identify "disambiguator points" for one of the two paths to each merged region during the application of cross-jumping. They also noted flowgraph conditions under which no disambiguator points exist. Finally, they suggested a way that the debugger could use the disambiguator points at runtime to determine which statement is really executing. Their scheme was based upon tracing and breakpoints that are invisible to the user. If there were no disambiguator points, the debugger would reply with a list of the possibilities. This work provided the initial idea for the way that Navigator handles cross-jumping, which is explained in the second part of this dissertation. Since their technique was never fully designed or implemented [Teeple81], its description is incomplete. Among other things, they did not realize that merging less than an entire statement still requires path determination, and they did not discover that repeated cross-jumping can cause nested determiners. 3.3.4 Feiler: local recompilation, optimization, and debugging As in the FDS system described earlier, Feiler also suggested that debugging for optimized programs should be provided by using multiple levels of optimization [Feiler82]. For each procedure or module, the user would select one of four optimization levels. The more optimized a region of the program, the less debugging support it would receive. While the FDS environment is reasonably conventional, Feiler's environment, called LOIPE, has some novel features: it adds debug statements to the source language and is based entirely upon incremental compilation. The debugger has no special access to the compiled object code. LOIPE is an interactive programming environment for incremental program construction based on compilation. It provides a uniform view of all system actions through a syntax-directed editorthe user never sees the traditional edit-compile-execute-debug cycle. The user makes a debugging request by editing a debug statement, such as BREAK or TRACE, into the source program. In response, the system automatically recompiles and relinks that portion of the program and then resumes program execution. Similarly, the user can modify the executing program to insert new declarations or arbitrary program statements. The modified program can be resumed from the point of suspension as long as no active procedure has been changed in a semantically-inconsistent way. When this fails, the procedure call chain can sometimes be unwound to the earliest call of a modified procedure. Otherwise, the program must be restarted from the beginning. Note that the ability to resume execution after inserting a debug statement and recompiling the enclosing procedure is central to this debugging approach. The system does not permit user interrupts. Procedures compiled without optimization would have full debugging support for program modification and resumption, as described above. Program execution can always be resumed after debug statements are added or removed. In contrast, an optimized procedure would receive restricted debugging support. No attempt would be made to use information collected by the optimizer to hide the effects of the optimizations or to present alternatives. Nevertheless, Feiler claims that the debugger would provide a limited form of truthful behavior: it would only report variable values and current execution locations that can be shown consistently. [He does not describe how the debugger would determine which items fall into this class.] For example, only variables whose current values are not in registers can be displayed, and the current execution point might not be reported uniquely. The system would not support execution traces of optimized procedures. Furthermore, if the programmer makes any modification to an active optimized procedure (including adding or removing debug statements), that procedure would be recompiled without optimizations, and the procedure call chain would be unwound to allow resumption from the calling statement. Two intermediate optimization levels would mitigate the adverse effects on debugging facilities. At "full debugging" level, the optimizer would not allow any optimization to cross any procedure call or debug statement. At "complete display" level, the optimizer would not cache data values in registers across procedure calls or debug statements. As a result, values of variables could be displayed for all active procedures at a breakpoint. However, this level does not address the issue of incorrect values resulting from computation-reordering optimizations. All of this discussion represents design work; the prototype implementation of LOIPE did not include specific support for debugging optimized programs. 3.4 Summary This chapter has presented a broad range of examples of ways in which common optimizations can cause a conventional debugger to respond in neither the expected nor even the less-desirable truthful manner. It has also reviewed the previous work on debugging optimized programs. The next chapter discusses general techniques for achieving improved debugger behavior.