<<CHAPTER 1:  INTRODUCTION>>
<<>>
Chapter 1

Introduction

Computer programs rarely perform their intended function the first time they are presented to the computer.  A significant fraction 
of a program's development time is normally spent debugging it.
Conventional interactive source-level debuggers are inadequate for debugging optimized programs because the effects of 
optimizations disturb the correspondence between the program's source text and its object code.  Nevertheless, the ability to apply 
an interactive source-level debugger to an optimized program is important.  Interactive source-level debuggers allow for a marked 
increase in programmer productivity [Evans+66], and compilers that perform optimizations are becoming increasingly common.  Some 
reasons for the trend toward optimizing compilers are the emphasis on portability and modularity in current compiler construction 
practices, the increased speed and reliability of optimizing transformations, and the continuing need for efficient use of a 
computer's time and space resources [Harrison81].
This dissertation demonstrates that effective interactive source-level debugging can be provided for optimized programs.  It 
investigates the previously unstudied interactions between optimizing transformations and debugging capabilities, identifies 
general techniques that can be used to solve these problems, and demonstrates the practicality of these techniques by implementing 
selected ones in a prototype debugging system called Navigator.

1.1 Debugging and optimizations
A debugger is a tool that helps a programmer examine, monitor, and possibly modify the execution of a program, usually for the 
purpose of isolating and correcting mistakes in the program [Johnson82a].  In an interactive debugger, the user performs these 
operations during the program's execution, so that the response to a command can guide the formulation of later commands.  In a 
source-level debugger, the user refers to features and events of the program's execution in the familiar terms of the source 
program.  The user need not be aware of pieces of low-level machine state that do not appear in the source program (such as 
registers, condition codes, and numeric code and data addresses).  Typical debugger features include reporting the current 
execution location, displaying or modifying the values of variables, executing new statements in the current context of the 
program, and setting and removing breakpoints.  Breakpoints are locations in a program at which the execution of the program will 
be suspended and debugger commands will be performed.  Breakpoint and program exception locations are specified as positions in the 
source text.  The programmer refers to variables by their names; the debugger displays their values in appropriate formats for 
their declared types.
This dissertation considers source-level debuggers for high-level languages such as Algol, rather than debuggers for low-level 
assembly or machine languages.  Source-level debuggers for high-level languages are sometimes called high-level debuggers.  This 
dissertation uses the term "source-level" because the term "high-level" is also used to refer to debugging at a higher conceptual 
level, closer to the problem specification level than the program implementation level [Gramlich83, Hamlet83].   In particular, 
this dissertation concentrates on debuggers for Algol-like languages, such as Algol, Pascal, and Ada.  These languages are 
block-structured, permit recursive procedures, and have a fairly rich type structure including pointers and records.  FORTRAN and 
BASIC can be considered members of this family, but are simpler in several ways.  Examples of languages outside the family of 
Algol-like languages include LISP, APL, and SNOBOL.  Some of the concepts discussed here would also apply to such languages.
Program optimizations are transformations that are applied to a program to decrease its execution time and/or space consumption.  
Typically the compiler itself performs the optimizations, but sometimes they are performed by a pre- or post-compilation processor. 
 An optimized program exhibits the same input/output behavior as its unoptimized counterpart but is faster or smaller or both.  
However, basic changes to the algorithm implemented by a program, such as replacing an O(n2) algorithm by an O(n) algorithm, are 
considered to be outside the realm of program optimization.  The study of such basic changes is called automatic programming 
[Green+79].  Program optimization should be called program "improvement", because the resulting program is seldom optimal in any 
sense; nevertheless, "optimization" is the accepted term in the compiler literature.  Optimizations can be performed on a variety 
of representations of the program: the source text, an abstract syntax tree, a flowgraph representation, a linear intermediate form 
(such as quads [Aho+77]), or the (linear) object code.
Optimizations can have a variety of effects.  For example, statements may be moved or deleted.  Variables may be assigned values at 
different relative locations in the optimized code than in the source program, or the program's flow of control may be altered.  
The relative ordering of execution events (such as calling a procedure) may be changed.  Optimizations may also eliminate or 
overlay variables or allocate them to registers in some program regions.
The effects of optimizations cause problems for interactive source-level debuggers.  When optimizations move or delete statements, 
or when they alter the program's flow of control, it may be difficult for the programmer to specify breakpoint locations or for the 
debugger to report exception locations.  When optimizations eliminate or overlay variables or allocate them to registers, the 
debugger may be unable to display the values of those variables.  When variables are assigned values at different relative 
locations in the compiled code than in the source program, the debugger's ability to display the values of the variables and to 
report the current execution point may both be compromised.  When optimizations change the relative ordering of execution events 
(including arriving at a breakpoint), the debugger's reports of successive program execution points may confuse the programmer.
In fact, the basic presuppositions of optimization and interactive source-level debugging conflict.  An optimizer assumes that the 
purpose of a program is precisely to compute the program's output, and that the exact intermediate steps required to achieve this 
goal are irrelevant.  The optimizer rearranges and restructures the program, removing information about many of the program's 
intermediate steps.  Therefore, a particular intermediate point in the unoptimized program may not directly correspond to any point 
in the optimized version.  In contrast, an interactive source-level debugger allows access, in source program terms, to a large 
class of program points, including some that the optimizer may have changed.  The optimizer cannot avoid altering all program 
points that the programmer wishes to examine from the debugger and still do its job well, because it cannot know which points these 
are.  Most programs execute several times before they are recompiled, and the points of interest can change from one execution of 
the program to another.
According to Gaines, the first step in diagnosing a program error is knowing that the program reaches an incorrect state during its 
execution.  There are two possible causes for the incorrect state: some variables could have incorrect values, or there could be an 
error in the program's flow of control [Gaines69].  If a programmer uses a conventional debugger to monitor the behavior of an 
optimized program, the effects of the optimizations can cause the debugger's responses to:


1.2 Example
A small example illustrates the kind of problems that arise.  Figure 1-1 shows a simple FOR loop that reads items into an array.  
The unoptimized code appears on the left, and a source-level representation of the program after optimization appears on the right. 
 [The source-level view of effects of the optimizations appears for purposes of explanation; it is not available to the user.]  
Because the assignment x _ 1 (at statement 18) is not affected by the execution of the loop, the code motion optimization can be 
applied.  This optimization moves x _ 1 outside the loop, thereby reducing the program's execution time.  After the assignment to x 
has been moved to precede statement 16, the value of x assigned in statement 15 (0) is not used before it is reassigned (to 1).  
The optimizer uses the dead store elimination optimization to delete statement 15.
                                          
source program fragment    after code motion & dead store elimination

15 x _ 0;
        x _ 1;
16 FOR i IN [1..3] DO    FOR i IN [1..3] DO
17    Read[a[i]];       Read[a[i]];
18    x _ 1;
19    ENDLOOP;        ENDLOOP;
Figure 1-1.  Effects of optimization on the reported value of x at statement 17.
                                          

Note: in program examples throughout this dissertation, source program statements are numbered for ease of reference.  An actual 
programming environment may identify statements by some other means.
Suppose that the user stops the program at statement 17 on the first time through the loop.  In the unoptimized version of the 
program, the value of x would be 0, but in the optimized version, x's value is 1.  Because the user debugs the program using the 
original source text, she expects the value of x to be 0; she would be confused to be told that its value is 1.  This research 
explores ways to provide effective debugging capabilities for optimized programs without confusing users in this manner.
1.3 Motivation
Providing debugging facilities for optimized programs is important.  Optimizers should be more widely used during program 
development; the lack of debugging facilities is one reason why they are not used more often.  Furthermore, it can be inconvenient 
or impossible to delay optimization until a program is known to be correct.  Currently available methods for debugging optimized 
programs do not fill this need.
1.3.1 Current program development paradigm
The usual method for developing a program that may later require optimization is to compile the program with optimizations disabled 
during its creation, testing, and debugging phases.  Often a special version of the compiler, called a checkout or debugging 
compiler, is used to compile the program during the development phase.  A checkout compiler generates object code that performs 
runtime testing to facilitate error detection; common tests include array bounds checking, checking that the value of a variable 
stays within its declared range, checking for uses of uninitialized variables, and checking for dereferencing a null pointer.  A 
checkout compiler may also generate code to collect program execution statistics, such as the number of times that each statement 
or procedure executes [Satterthwaite75].  After the program is "known to be correct", it is compiled one last time, with 
optimizations enabled.  The optimized version of the program is used thereafter.

There are two major reasons for this strategy.  First, an optimizing compilation is usually more expensive than a nonoptimizing 
one.  That is, an optimizing compilation takes longer, and it may also use more space.  This reason is becoming less valid because 
of advances in optimization and compiler technology, as the following two paragraphs explain.  Second, the use of an interactive 
source-level debugger requires compiler-provided mappings between source lines or statements and machine code locations and among 
variable names, types, and data locations; optimizations can alter these mappings in ways that have not previously been understood. 
 The remainder of this dissertation disputes the second reason by analyzing the ways in which optimization affects debugging and by 
describing methods for debugging optimized programs.
One argument supporting more use of optimizing compilation is that some optimizations are not expensive to apply.  Extensive 
research in both practical and theoretical aspects of optimization [Allen+72, Lowry+69] has made some optimizations fast and 
reliable to perform.  Among these optimizations are transformations performed on the machine code produced by a compiler, as well 
as transformations performed before or during code generation.  Efficient machine code optimizations include retargeting branches 
to branches or collapsing multiple instructions into one specialized instruction [Wulf+75, Zellweger80, Lamb80, Davidson+80].  
Efficient transformations that occur earlier in the compilation process include delaying stores within straight-line code and 
allocating important quantities to high-speed registers [Sites78].  In fact, application of these optimizations can decrease rather 
than increase compilation time because less intermediate code is processed for register allocation, less pseudo-object code 
participates in final address assignment, and less object code is written to the output file [Cocke+80].  
Another argument supporting more use of optimizing compilation is that optimization phases are sometimes used to make a compiler 
simpler or more portable.  For example, some current compilation environments contain machine-independent code generators 
[Glanville+78, Cattell80] or source code pre-processors [Kernighan+76] that generate fairly naive and inefficient code.  In these 
situations, the unoptimized program would be sufficiently large or slow that it is worthwhile to spend the extra compilation time 
for optimization.

1.3.2 Why debug an optimized program?
In some situations, it is inconvenient or even impossible to delay optimization until a program is known to be correct:
1) Some optimizations may be an integral part of the compilation process, either because of their high payoff/speed ratio or 
because of the organization of the compiler (such as use of a local code generator based on directed acyclic graphs (dags) 
[Aho+77]).  It may not be possible to inhibit these optimizations.
2) Without optimization, some programs are too large to fit on the host computer or too slow to test effectively, especially in the 
late development stages.
3) Errors can be discovered at any time, even in a production program.  In fact, optimization can indirectly contribute to their 
discovery: optimization of range and subscript checks can allow production programs to execute with checking enabled without paying 
a prohibitive price [Sites78, Markstein+82].  When an optimized program halts with an error, it is desirable to collect as much 
information as possible about the error while its context is still available.  Being able to use a debugger immediately, rather 
than recompiling and re-executing the program to the point of error, would be helpful for several reasons.  The program may have 
executed for a substantial amount of time, the sequence of inputs may be difficult or impossible to duplicate (for example, in an 
operating system or in any other program accepting data from the real world), or the bug may simply be intermittent.  Furthermore, 
recompilation to enable debugging may be expensive, particularly if other program modules must be recompiled or reloaded as a 
result. 
4) Incompatibilities between the checkout and the optimizing compilers may make it painful to switch from one to the other.  Even 
if the two versions of the compiler accept precisely the same input language, a program might run correctly under the checkout 
compiler, yet still contain an error.  For example, if the checkout compiler sets the values of all variables to zero before 
beginning execution and does not specifically check for uses of uninitialized variables, a use of an uninitialized variable might 
not be caught.
5) Debuggers are frequently the basis for program monitoring tools that provide statement execution counts or timing information.  
It would be useful to be able to apply these tools to optimized programs.
1.3.3 Current support for debugging optimized programs
When optimizations cannot be disabled, the absence of specific support for interactive source-level debugging of optimized programs 
forces programmers either to insert source-language debugging statements (and then recompile the program) or to use an interactive 
machine language debugger.
Inserting source-language debugging statements is the most primitive source-level debugging facility: the new statements can verify 
assertions, print intermediate results, or trace execution flow.  Because the programmer must modify and recompile the program, 
this method destroys the current context of the error.  Even more important, it is not interactive: if some debugging output 
prompts a new question about the internal workings of the program, the modify/compile/execute cycle must be entered again, making 
it take longer to locate the error.  In addition, the programmer must remove the changes after the error has been found.
From the optimizer's viewpoint, requiring recompilation to answer a debugging query is an advantage.  Because the added statements 
exist at compile time, the optimizer knows during the optimization process which variable values are desired where, and which 
locations in the control flow are of interest to the programmer.  By referencing variables or generating output, these statements 
change the intermediate steps in the program, possibly inhibiting optimizations in the surrounding area so that those variables or 
locations appear properly in the optimized program.  However, the recompiled program may be less efficient than the program that 
exhibited the error and may take longer to reach the error point.
Using an interactive machine-level debugger is probably even less inviting than inserting debugging statements.  This method is 
available only if the programmer understands basic code generation techniques and the host machine language, which are details that 
the high-level language is intended to suppress.  A stronger requirement is that the programmer must also be an optimization 
expert: she must be able to unravel the effects of the optimizations to determine the machine code location corresponding to a 
given source statement or to determine the storage location that holds the current value of a given variable.  Even such minimal 
aids as symbolic procedure and variable name translations may be invalidated by optimizations such as inline procedure expansion 
and allocation of frequently-used variables to registers during loops.

1.4 Brief discussion of previous work
Researchers have only recently begun to consider the problem of interactive source-level debugging of optimized programs, either in 
studying the interactions between optimization and debugging or in designing and implementing debugging systems for optimized 
programs.  As of 1981, only two papers explicitly discussed debugging optimized programs.  A detailed description and critique of 
these papers and other related work is delayed until the end of Chapter 3 so that concepts defined in this dissertation can be 
exploited in the description.
The first paper, by Warren and Schlaeppi, is a design document for a debugger for the Firmware Development System at IBM Yorktown 
Heights [Warren+78].  The proposed debugger was to have a full range of debugging capabilities, but would probably have confused 
all but those programmers with optimizer implementation experience.  A second debugging mode, which would inhibit optimizations 
across statement boundaries, was to be provided for less sophisticated users.  This debugging system was never implemented, so no 
experience concerning its utility was gained [Warren82].  
In the other paper, Hennessy describes algorithms for interactively exploring an augmented global flowgraph of a program at runtime 
to determine which variables have incorrect values [Hennessy79].  A variable in an optimized program has an incorrect value if it 
has a different value than it would have had in an unoptimized version of the program.  Each node of the flowgraph is a directed 
acyclic graph, which represents one basic block of both the unoptimized and the optimized program.  Hennessy also presents 
conditions under which the value of the variable with respect to the unoptimized program can be reported.  These algorithms were 
tested, but were not incorporated into any programming environment.  
In addition to the lack of implementation experience, the previous research did not fully explore the interactions between 
optimization and debugging.  For example, both papers claimed that machine-dependent optimizations do not affect debugging.  This 
is not true: statement boundaries can be removed by collapsing a number of instructions into a single powerful machine instruction, 
causing problems if the user desires a breakpoint at one of these boundaries.  This problem is exacerbated by so-called "exotic 
instructions" that can perform the function of an entire loop in one instruction (for example, the Search Linked List instruction 
on the Burroughs B4800) [Morgan+82].  Branch chaining, another well-known machine-dependent optimization [Wulf+75], can

also make breakpoint insertion difficult, because it allows some execution control paths to bypass a program point.
Another key issue that these papers ignored is how to map locations between the source program text and the optimized object code: 
how does the debugger translate a breakpoint request at a source statement into the appropriate object location(s), and how does 
the debugger tell the user where in the program execution has been suspended?  These early researchers thought that some simple 
partial solutions would be adequate, but in fact there are common optimizations and debugging requests for which these solutions 
fail.  Section 3.2.2 discusses the location-mapping problem further.
1.5 Scope and goals of this work
The goal of this dissertation is to show that effective interactive source-level debugging can be provided for optimized programs.  
Since debugging optimized programs has been a largely unexplored research area, the first part of the dissertation maps out the 
problem area.  The second part of the dissertation describes implementation experience with a small subset of the problem, 
supporting the discussion and design in the first part.
The dissertation addresses three subgoals:
1) to investigate the interactions between optimizing transformations and debugging capabilities, delineating the problems that 
arise when trying to debug optimized programs.  Chapter 2 discusses interactive source-level debugging of Algol-like programs and 
arrives at a notion of "effective debugging".  Chapter 3 discusses program optimization and examines in detail the problems that 
optimization creates for debugging.
2) to explore general techniques that can be used to solve these problems, culminating in the design of effective debugging tools 
for optimized programs.  Chapter 4 describes general solution techniques and illustrates their use.
3) to demonstrate the practicality of these debugging techniques by implementing selected ones in a real programming environment.  
Chapters 5 through 7 describe efficient ways to provide source-level debugging capabilities in the presence of two simple but 
nontrivial optimizations: inline procedure expansion and cross-jumping.  A prototype implementation of these methods, in a system 
called Navigator, has been developed in the Cedar programming environment at the Xerox Palo Alto Research Center.  Chapter 5 
presents an overview of Navigator, Chapter 6 describes its implementation details, and Chapter 7 presents a theoretical analysis of 
the key Navigator algorithms and discusses improving the algorithms.
The dissertation emphasizes debugging over optimization for two reasons.  First, because the theory and philosophy of optimization 
is better understood than that of debugging, debugging requires more attention.  Second, exploring the meaning of debugging in 
general helps to guide the selection of an appropriate definition for "effective debugging".
Many dissertations on debugging categorize programming errors and discuss the relative utility of various debugging techniques for 
discovering errors in the different categories [Gaines69, Davis75, Model79].  This dissertation does not consider these topics; 
instead, it discusses ways to achieve a sensible interaction between the programmer and the debugger during the error detection and 
correction process.  For example, although runtime checking to ensure that a program follows the language semantics during its 
execution (such as array bounds checking) is an effective error detection method, whether or not such checking is enabled does not 
affect the problems described here.
A note on terminology: throughout this dissertation, the words "debugger" and "compiler" always refer to programs.  People are 
referred to as "programmers" or "users" (intended to evoke "debugger users"), even though they may be doing debugging or compiling. 
 In addition, I intend no distinction between a person who originally wrote a program that is being debugged and one who maintains 
or examines that program.  
I also assume that optimizing transformations are applied correctly and that all of their enabling conditions have been checked.  
This work is not intended to create tools for debugging faulty optimizers, except in so far as additional information supplied by 
the debugger allows the programmer to notice optimizer errors more easily.