<<CHAPTER 2:  DEBUGGERS FOR ALGOL-LIKE LANGUAGES>>

Chapter 2

Debuggers for Algol-like Languages

This chapter discusses debugging in more detail, laying the groundwork for the remainder of the dissertation.  The first section 
presents a program execution model that can form the conceptual basis for building source-level debuggers for Algol-like languages. 
 The second section is a user-oriented overview of a broad range of popular debugging features; it also shows how a programmer uses 
these features to answer some typical debugging questions.  This overview prepares the reader for Chapter 3, which describes the 
effects of common optimizations on a conventional debugger's operation.
In a compilation-based environment, the compiler must tell the debugger how to translate the low-level machine state into 
source-level terms; the debugger must perform the indicated translations, interact with the user, and do some bookkeeping.  The 
third section discusses conventional compiler and debugger implementations, focusing on a few basic debugging operations that will 
play a major role in the implementation portion of this dissertation.  This section provides useful background and contrast for the 
description of Navigator's compiler and debugger implementations.
The fourth section describes the properties that a source-level debugger for optimized programs must have to be considered 
effective, and it defines two key terms for evaluating a debugger's responses: "expected" and "truthful".
 The chapter closes with a brief survey of the history of debuggers and recent research directions in debugging.  Previous work on 
debugging optimized programs is discussed at the end of the next chapter.

2.1 Source-level model of execution for Algol-like languages
A program is a static entity, while its execution is a dynamic one.  The semantics of a programming language defines a source-level 
model of computation for programs written in that language.  In his proposal for adopting a functional style of programming, Backus 
characterized conventional programming languages quite succinctly [Backus78]:
"Von Neumann programming languages use variables to imitate the computer's storage cells; control statements elaborate its jump and 
test instructions; and assignment statements imitate its fetching, storing, and arithmetic. ...[A von Neumann program] is dynamic 
and repetitive. One must mentally execute it to understand it. ...Its statements operate on an invisible `state' according to 
complex rules."
The source-level model of computation is a combination of that `state' and its complex rules.
2.1.1 The source-level model
One way to define the behavior of a program is via a set of input/output pairs: the program transforms a given input into its 
corresponding output.  This definition is adequate for a program's user, who wishes only to see the program's result.  However, a 
view of the computation's internal state is essential for a program's implementor or maintainer, who must locate the problem when 
the program fails to produce the desired result.  On the other hand, this model is not intended to capture the particulars of a 
machine-level view of a program's computational behavior.  A machine-level model would include many details about an individual 
implementation, such as the machine's instruction set, that do not appear in and are not specified by a high-level language.
The purpose of the model is to define the view of execution that should be presented to a user by a source-level debugger.  The 
debugger typically constructs its source-level view of a computation from a lower-level representation that is hidden from the 
programmer.  The model developed in this section will be used in Chapter 3 to determine what the debugger's response to a given 
command should be at a given point in a computation.
This model was produced by combining Horning and Randell's view of sequential processes [Horning73] and the standard semantics of 
Algol-like languages.  Although the resulting model is neither as general nor as precise as Horning and Randell's, it is 
well-suited for describing the expected behavior of a source-level debugger.  A more formal treatment of much of the material in 
this section, though with a somewhat different emphasis, appears in Chapters 3 and 4 of Satterthwaite's dissertation 
[Satterthwaite75].
Programmers often envision a program's execution behavior in terms of a processor interpreting the source statements of the 
program.  This processor has a set of state variables, which hold values of user-defined variables and auxiliary variables (used by 
the processor to execute the program).  The program's execution can be described in terms of a sequence of values of these state 
variables.  This sequence is called a computation.
An action specifies the transition from one state to the next: it assigns new values to some state variables; the remaining state 
variables keep their old values.  The final state of a finite computation is a state for which no action is defined.  A 
computation's action sequence is the sequence of actions performed to generate the computation.  In general, a given action 
sequence does not completely specify a computation, because it could be applied to several initial states.
A program is a compact description of these state transitions and their sequencing.  A program is executed by a processor.  The 
state variables of a processor include:


which include the program counter, the procedure call chain, and data from currently-active procedures, are discussed below.
In an Algol-like language, the transition from one state to the next involves changes to only the variables and the auxiliaries; 
the statements are constant over the program's execution.  That is, the program cannot create new statements and execute them.  
(This is not true of LISP, for example.)
The most important auxiliary variable is the program counter, which identifies a program "portion" that is about to be executed.  
In general, the granularity, or level of detail, at which a program's execution can be viewed is dependent on the language 
definition.  For a language that specifies the execution order within a statement, or for an expression language (such as Bliss 
[Wulf+71]), it might be an expression.  For a language that does not specify the execution order of expressions within a statement, 
the granularity must be statement level or higher; in the typical language a statement is a reasonable program "portion".  (Most 
Algol-like languages define "statement" recursively.  Therefore the execution of a composite statement, such as a loop, generally 
implies the execution of several nested statements.  A more complete computational model for Algol-like languages could also mark 
the end of execution of each composite statement.)  The remainder of the discussion assumes statement granularity unless otherwise 
noted.
With the exception of dynamically-allocated variables (which can be handled by a simple extension), a user-defined variable can be 
thought of as having a conceptual lifetime that extends over the entire computation.  Each variable has a type, which specifies the 
range of values it can assume.  However, a single variable identifier can refer to different variables at different points in a 
computation.  An environment is a mapping from identifiers to a subset of the variables.  If a given environment associates a 
previously-used identifier with a new variable, the old variable is inaccessible in that environment.  Algol-like languages are 
statically scoped, meaning that the environment is a function of a point in the program text (i.e., the program counter).  In 
contrast, most dialects of LISP are dynamically scoped, meaning that the environment is a function of the point in the computation.
Variable binding is a related concept for the handling of dynamically-allocated variables.  Algol-like languages permit recursion, 
implying that multiple instances of a particular variable can exist at the same time.  For a given identifier, the correct instance 
of its variable (i.e., the binding of the identifier) is a function of the history of the currently active procedure calls, that 
is, the order of their occurrence and their data.  At a given point in a computation, statements in an Algol-like language cannot 
refer to a different instance of a variable than that specified by the binding rule.  To implement the binding of non-local 
variables, the processor keeps a history of the currently active procedure calls, called the procedure call chain, as part of the 
auxiliaries.  At procedure entry and exit, the processor manipulates these auxiliaries to maintain the proper binding.
Therefore, the source-level view of a computation in an Algol-like language can be completely described by the constant program 
text (T) together with the sequence of values of the user-defined variables (V), the program counter (PC), and the procedure call 
chain (C).  These are the items that the user will want to observe via the debugger to monitor the program's behavior.  The first 
line of the following figure shows a fully-specified finite computation.  The second line shows that the transformation from state 
si to its successor si+1 is caused by the execution of the piece of program text at PCi.
<<{T,V0,PC0,C0}, {T,V1,PC1,C1}, ... , {T,Vn,PCn,Cn}>>

  T(PCi)[{T,Vi,PCi,Ci}] = {T,Vi+1,PCi+1,Ci+1}
Note that the location of the statement that is about to be executed (i.e., the program counter) is an explicit element of the 
computational state.  Therefore, a computation that executes the same actions on the same data, but that uses different copies of 
those actions (i.e., actions at some other location in the program text), is a different computation.
For ease of later reference, we informally partition the model into three parts: locations, data, and history.  The location 
elements of a computation are the program counter and the action about to be executed at some state in the computation.  The data 
elements of a computation are the variables and their values at some state in the computation.  The history elements of a 
computation refer to the sequencing of code or data elements, such as the procedure call chain or the number of times that a given 
action was executed.
This source-level model is easily extended to handle multiple processes.  Each process creates a separate computation, with its own 
program counter and procedure call chain.  Variables and program text can be shared among more than one process. [We are not 
interested in methods for interprocess communication here.]
2.1.2 Behind the source-level model
This source-level computational model is an abstraction; an executing program demonstrates observable behavior that is outside the 
model.  Furthermore, the actual sequence of steps required to execute a program on a computer is much more detailed than this model 
suggests.  To present a source-level view of a computation, a debugger must map these details into a correct source view and it 
must hide the parts of the implementation that are extraneous to the source view. 
A program's computational behavior does not include any encoding of program or output timing, data or program storage particulars 
that the user might be able to deduce from the program's physical behavior, or audible cues from specific hardware (such as disks). 
 These and other similar items are not directly specified by the high-level specification of the program; rather they are implied 
by the low-level implementation of the programming language, the operating system, the hardware, and so on.  The definition of 
computational behavior constructed here expressly avoids defining all such behavior so as to avoid constraining the low-level 
implementation.
A computer could directly execute a program written in an Algol-like language, either by interpreting it or by having an 
instruction set that implemented the language.  For efficiency and flexibility, a high-level program is usually translated into a 
lower-level representation that is more suited to the particular machine, frequently the native instruction set of the machine.  
This translation is performed by a compiler.  Of course, a program's execution on an actual computer can be viewed at many 
different levels: the sequence of program statements, the sequence of machine instructions, the sequence of steps to perform each 
machine instruction (often in the form of microcode), the physical behavior of the circuits in the machine, etc.  The source-level 
view is thus only one of several possibilities; it has the advantage of being the view formulated by the programmer.
A compiler translates each source statement into a sequence of lower-level instructions that carry out the actions specified by the 
statement.  There are usually many distinct ways to accomplish these same actions; which one the compiler chooses does not affect 
the definition of program behavior we have already constructed.  The compiler frequently uses temporary storage to hold 
intermediate results; where and how many of these temporaries exist is immaterial to the user.  The compiler also selects 
particular arrangements of storage to hold values of user-defined variables.  For example, most users are not concerned with 
whether array elements are stored by row or by column.
2.2 Overview of debugging features
An interactive source-level debugger can be used for many different purposes, depending upon its capabilities: debugging, 
development, testing, and performance analysis.  Debugging is the process of finding and fixing program errors.  Program 
development is the process of adding new functionality to a program.  Program testing involves more or less systematic examination 
of external program behavior using various inputs; performance analysis examines internal program behavior using various inputs.
2.2.1 Monitoring a computation
A single source program represents a possibly infinite number of computations, corresponding to different initial states (i.e., 
different inputs).  If the program text contains no branches or procedure calls, then it specifies exactly one action sequence, 
which contains the same number of actions in precisely the same order as the program.  But even a single action sequence can 
represent multiple computations for different input.  If the program text contains conditional branches, the program specifies many 
different action sequences, depending on the program's input.  If the program contains loops or procedure calls, the sequence of 
actions executed for a given input can be longer than the total number of actions in the program.  When a program fails to specify 
a termination condition, the resulting computations are infinitely long.  Analytically determining the sequence of actions and 
states that arise from a given input is at best tedious (for simple programs) and at worst virtually impossible (for complex ones). 
 In fact, the major use of a debugger is to help programmers monitor program behavior, to pinpoint where in a computation a program 
begins to stray from its intended behavior.
A debugger that can monitor a computation can help a user with debugging, testing, and performance analysis activities.  The 
important features for debugging are the following basic operations: reporting the current execution point, displaying the values 
of variables, setting and fielding breakpoints, and displaying the procedure call chain.  For testing, a debugger can collect 
information for statement execution counts or data use counts, to ensure that all control paths and data configurations have been 
tested.  Execution summaries can also isolate program areas that might profit from tuning.
2.2.1.1 Basic monitoring functions
The basic monitoring functions of a debugger include examination commands, which display information about a single state of the 
computation, and control commands, which specify the set of states to be examined.  For an interactive debugger, the controlling 
commands suspend and resume the computation, so that the single (current) state is usually the last state that has been generated 
so far in the computation.
These are the basic examination commands:
1. Report the value of the current state's program counter in source-level terms.  This function reports which source program 
statement in which procedure is either executing or about to execute.
2. Display the value of a variable v in the current state, in the appropriate format for the declared type of v in that 
environment.  In an Algol-like language, the current program counter uniquely defines an environment, or a set of visible 
variables.  The effect of this function is as if the programmer had included a statement to print the value of v in the original 
source program at the value of the current state's program counter.  With the aid of an expression parser and/or interpreter, a 
debugger can use this basic function to display the values of expressions.
3. Display the value of the procedure call chain in the current state (also called the procedure traceback).  This function reports 
a list, usually in reverse order of invocation, of the names of the currently active procedures, their points of suspension (i.e. 
the calling statement), and possibly their arguments and local variables.
4. Set the variable environment to some other program statement.  This function is defined for statically or dynamically enclosing 
program statements, or for program statements in procedures that declare static or own variables.  The user can then display the 
current values of variables in that environment.  There is no analog of this function in the source program.
These are the basic control commands:
1. Suspend the computation immediately (called a user interrupt).
2. Suspend the computation at some specified future state (a breakpoint; see below).
3. Resume the computation.
Many programming environments automatically invoke the debugger when an erroneous state of some kind (i.e., a program exception) is 
detected.  These computations usually cannot be resumed in a sensible fashion.
A breakpoint is placed on a program location.  As a result, the debugger gives control to the user whenever the value of the 
program counter equals that program location during the remainder of the computation.  Traditionally, the computation is suspended 
before the execution of the action at that program location, but it could be suspended afterwards.  Some debuggers provide both 
before-statement and after-statement breakpoints [Johnson81].
The definition of a breakpoint mechanism includes what values of the program counter can be specified.   Two debuggers for the same 
programming language could define a different set of legal stopping places.  Legal stopping places cannot be any more frequent than 
the granularity of the source-level computational model.  On the other hand, the implementation of a particular debugger may 
dictate less frequent stopping places.  For example, a very simple debugger may allow breakpoints only at procedure entry and exit. 
 Another common breakpoint granularity that is dictated solely by the implementation is the source line.  Source line granularity 
differs from source statement granularity in that the user can only request a breakpoint on the first statement in a line.  Since 
most Algol-like languages allow free-format input with multiple statements on a single line, a debugger with source line 
granularity does not permit the user to place a breakpoint on each statement.  Such a debugger may also restrict certain program 
locations, such as continuation lines of multi-line statements or lines containing only ELSE or END.
2.2.1.2 Composite monitoring functions
Many useful composite functions can be built from the basic monitoring functions.  Some examples are dumps, traces, statement 
execution summaries, data use summaries, data breakpoints, assertion-driven breakpoints, and conditional breakpoints.
A dump is a listing of program variables and their values.  A full dump shows all of the variables (including dynamically-allocated 
variables) and their values, and is likely to contain more information than the user either wishes to see or can reasonably absorb. 
 A full dump is most commonly provided after a computation has terminated abnormally, when it is called a post mortem dump.  A 
selective dump shows a specified (and typically small) subset of the variables.
A trace shows the complete sequence of states of some finite computation or subcomputation.  It can show a varying amount of 
information about each state.  An action trace shows the sequence of actions that are about to be executed (i.e., the value of the 
program counter, either alone or with its text) in each state.  A result trace shows the sequence of actions together with their 
results.  A full trace shows the sequence of actions, including the values of variables that are referenced in each action and the 
results of the action's execution.  Another axis upon which the amount of trace information can vary is the granularity of the 
trace.  Tracing with fine granularity shows the execution of every action in the source-level computation.  Tracing with medium 
granularity shows the execution of every branch point, which is an action that changes the program counter to something other than 
the lexically following statement.  Tracing with coarse granularity shows only procedure calls and returns.
A full trace with fine granularity of an entire nontrivial computation generates much more information than a full dump, and is 
therefore likely to be harder to use.  Satterthwaite's idea of the k-limited trace, implemented in the Algol W debugger, limits the 
amount of information produced by such a trace to (approximately) k times the size of the program source.  It traces the first k 
executions of a statement, except that a procedure is traced the first k times that it is invoked from each separate call location. 
 When k is greater than one, the intuitive basis for this idea comes from mathematical induction: the first time a statement is 
traced forms the basis step, and the second (or kth) time shows the application of the induction step to the first (or k-1st) time.
To single-step a computation means to watch it execute.  The debugger stops the computation before (or possibly after) every 
statement is executed so that the user can examine features of the sequence of states in detail.  It can be thought of as enabling 
a breakpoint at every possible successor to the current statement; when some successor is reached, all of the breakpoints are 
removed.  Since the "statement" unit is recursively defined in the source program, the notion of "successor" has several possible 
meanings.  For single-stepping with fine granularity, the successor state is the next state in the source-level computation.  For 
single-stepping with coarse granularity, the successor state is the state following the end of all computations implied by the 
statement.  Thus procedure evaluations or loop iterations can be considered as a unit.  Note that single-stepping with coarse 
granularity is dependent on the current nesting level in the source program.  If the user specifies that the debugger is to proceed 
automatically from each such breakpoint, the resulting output is essentially an action trace at some granularity.
Dumps are more frequently associated with non-interactive debuggers, while breakpoints and single-stepping are more frequently 
associated with interactive debugging.  Traces are effective in either environment.
Statement or procedure execution counts show the number of times that each statement or procedure was executed during some finite 
computation or subcomputation.  It therefore summarizes the information presented by an action trace, removing its sequentiality.  
A user can examine such summaries to ensure that a test suite exercises all of the program statements or to locate possible areas 
of inefficiency in the program.  The Algol W debugger presented this information embedded in a listing of the program.  This 
format, called an execution flow summary, was very effective.
Data use counts are a related concept.  They show the number of times that a variable was accessed or the number of times that it 
was stored.  A user can examine these use counts to ensure that a program accesses data in the expected way.  Gaines mentioned data 
use counts in his dissertation [Gaines69], but I am not aware of any debugging system that has implemented them.
Other elements of a computation than the program counter can be used to determine when to suspend a computation, defining two 
different kinds of breakpoints: data breakpoints and assertion-driven breakpoints.  Also, a programmable breakpoint can do more 
than simply suspend the computation.  When confusion might result below, the "program counter" breakpoint described in the previous 
section is called a location breakpoint.
A data breakpoint is placed on a user-defined variable.  There are two possible definitions of data breakpoint.  One specifies that 
the debugger gives control to the user whenever the value of the variable is stored.  (Note that the cases in which the variable's 
value is altered is a subset of this condition because the store might store the same value.)  The other specifies that the 
debugger gives control to the user whenever the value of the variable is accessed.
A conditional breakpoint is an extension of a location (or data) breakpoint in which the debugger gives control to the user when 
the program location is reached (or the variable is stored or accessed) and the execution state of the program satisfies a 
specified condition.  The values of variables, the program location, and some elements of the execution history (such as values in 
the procedure call chain, statement execution counters, or data use counters) may be legal components of this condition.
An assertion-driven breakpoint causes the debugger to give control to the user whenever the execution state of the program 
satisfies a specified condition.  It differs from a conditional breakpoint in that its condition is evaluated at every state, while 
a conditional breakpoint's condition is evaluated only in the states specified by the program counter or variable.  Note that if an 
assertion-driven breakpoint's condition can refer to the value of the program counter or the access of a variable, then all types 
of breakpoints are special cases of assertion-driven breakpoint.
Any of these breakpoints can be extended by allowing the user to associate a debugging procedure with the breakpoint, to examine 
and display elements of the execution state.  These breakpoints are called programmable breakpoints.  Ideally, any command that 
could be entered interactively to the debugger could also be specified in a debugging procedure.  Thus a programmer could construct 
a special trace routine, for instance.
Finally, the debugger could record some or all of the data and location changes in a computation for later access by various 
history commands.  For example, a historic display of the execution path of a program is called a retrospective trace [Johnson82a]. 
 This information can be recorded at varying granularities.  Possible granularities for retrospective traces include: a list of 
recently-executed statements, a list of recently-executed branches, and a list of recently-executed procedure calls.  For a typical 
debugger, the primary form of retrospective trace is the procedure traceback, which lists the active procedure calls.  Note that 
the recently-executed procedure calls and the active procedure calls are incommensurate: some recent calls may have completed, 
while some active calls may not be at all recent.  It is rarer for a debugger to record old values of variables.
It is important to note two basic facets of debugger use here.  Some debugging activity is preplanned.  That is, the user specifies 
interest in a given event before it occurs.  For example, breakpoints and traces represent preplanned debugging activity.  In 
contrast, other debugging activity is unplanned.  Program exceptions, user interrupts, and procedure tracebacks fall in this 
category.  In many debugger implementations, the debugger can give more information about preplanned events because it knows ahead 
of time to collect relevant information.  Similarly, preplanned debugging activity necessarily occurs at the debugger's 
implementation granularity, because the user cannot specify otherwise.  In contrast, unplanned debugging events may frequently 
occur at program locations that are below the level of the source-level model.  The debugger must report these locations 
intelligibly.
2.2.2 Sample uses of some monitoring features
To construct a program that embodies some function, the programmer constructs a series of steps that are intended to transform the 
input into the output, little by little.  In the popular top-down programming methodology, the programmer decomposes the desired 
function into a series of large steps, then decomposes each of these steps into a series of smaller steps, and so on, until each 
step is a legal statement in the chosen programming language.  As a result, the programmer is familiar with the program's steps and 
its data structures as they appear in the program text.  It is a debugger's responsibility to communicate with the programmer in 
precisely these terms.  For example, the programmer frequently associates assertions with specific program sections, such as "the 
value of y is non-negative here", or "the value of x is monotonically increasing in successive iterations of this loop".  These 
assertions may be explicitly stated in the program (as special assertion statements [Sites72] or simply as tests that print a 
warning message) or they may be implicit, appearing only in comments or in the programmer's mind.  A debugger must report the 
current execution location and the values of variables correctly to help the programmer verify implicit assertions of this kind 
during the program checkout process.
A programmer formulates questions about the computational behavior of a program and then translates the questions into the 
available debugging commands.  The following examples show specific questions that a programmer might want answered, as opposed to 
general questions (such as, "Why does the value of x differ from the value that I expected at this program point?").  A 
sophisticated debugger might be able to answer these general questions directly.
Questions corresponding to location breakpoints:
1. What is the value of variable x before a particular assignment to x ?
To answer this question, a programmer usually places a location breakpoint on the assigning statement.
2. What are the values of variables used to compute the new value of x ?
Again, the location breakpoint is usually placed on the assigning statement.
3. What is the value of variable x after a particular assignment to x ?
In this case, a programmer usually places a location breakpoint on the statement following the assigning statement, because most 
current debuggers do not allow a user to specify that a breakpoint is to occur after the execution of a statement.
4. What are the values of some variables (other than variables assigned or used in this statement) at this point in the program?
5. How many times does execution reach this point in the program?
6. By what control path did execution reach this point in the program?
The response to this question depends on the granularity of the debugger's location-oriented history commands.  The procedure 
traceback represents the canonical granularity.
Questions corresponding to data breakpoints:
1. Where in this computation is the value of variable x altered?
2. How many times is the value of variable x altered?
3. What are the values of some other variables when the value of variable x is altered?
4. Where is the value of variable x used?
5. How many times is the value of variable x used?
Questions corresponding to assertion-driven breakpoints:
1. Where in this computation does the program's execution state satisfy the given condition?
2. How many times does the program's execution state satisfy the given condition (when the previous state did not)?
A question corresponding to tracing or single-stepping:
1. What control path does the program follow for the given input?
This question can also be answered with location breakpoints, but that method is more work for the user.
2.2.3 Modifying a computation
Many debuggers permit the user to modify as well as monitor a computation.  This section briefly describes the range of 
modification functions and their uses.  These functions are discussed in less detail than the monitoring functions because they are 
somewhat less important, both for debugging and for this dissertation.  Although Chapters 3 and 4 consider the effects of 
optimization on some modification functions, the implementation portion of this dissertation concentrates on the basic monitoring 
functions.
A debugger that can modify a computation can help a user with debugging, development, and testing activities.  The ability to 
assign a value to a program variable or return to a point in the procedure call chain with a return value can accelerate debugging, 
allowing a programmer to get a program's execution back to its intended state so that further programming errors can be discovered 
during the rest of that computation.  A debugger can be used for interactive program development if it allows a user to assign 
values to variables, call procedures, or create new variables, statements, or procedures whenever the execution of the program is 
suspended.  A debugger that allows procedure calls and the assignment of values to variables aids the program testing process by 
allowing easy checkout of individual program components.
The basic modification commands are:
1. One-time data change. Assign a value to a visible variable.
2. One-time code change. Execute some statement in the programming language, including a branch, an arbitrary procedure call, or a 
return.
3. Alter the data. Declare new variables, either for the current environment or for some other environment.
4. Alter the program. Insert new statements in the program, either at the current program counter or at some other point.
5. Alter the computation. Restore the state of the computation to some previous state, sometimes called reverse execution.
When a debugger allows computation modification, its programmable breakpoints can be more powerful.  A programmable breakpoint that 
modifies the execution state and then resumes program execution without user intervention acts as a runtime program patch, and is 
sometimes called advising a procedure, program point, or variable.
2.3 Conventional compiler and debugger implementations
To support interactive source-level debugging in a compilation environment, a conventional nonoptimizing compiler supplies the 
debugger with mappings between source statements and object code locations, and among variable identifiers, types, and data storage 
locations.  Assuming the existence of these two abstract mappings, Section 2.3.1 briefly describes how the debugger implements the 
basic monitoring functions.  Section 2.3.2 is more concrete: it discusses several ways that a compiler might supply the statement 
and variable mappings and describes how each form of mapping is used.  Readers will find it instructive to compare the conventional 
implementation described here with the version for optimized programs presented in Chapters 5 and 6.
This section does not consider machine architecture requirements to support interactive source-level debugging; Johnson presents a 
good treatment of that issue [Johnson82b].

2.3.1 Abstract implementation of basic monitoring functions
The basic debugging operations are to set a (location) breakpoint, to determine the current execution point, to display the value 
of a variable, and to display the procedure traceback.
To set a breakpoint, the debugger uses the mapping from source statements to object code locations to determine the object location 
corresponding to a given source statement.  It places a breakpoint there (i.e., it ensures that the program execution will stop 
there so that the debugger and possibly the user can gain control of the computation).  It also records the existence of the 
breakpoint at the source statement in an internal database, so that the user can list or remove the breakpoint later.
To determine the current execution point, the debugger uses the mapping from object code locations to source statements, applying 
it to the current program counter.  This mapping usually has multiple levels, such as a mapping from absolute object locations to 
module names, a mapping from module-relative object locations to procedure names, and a mapping from procedure-relative object 
locations to source statements.  [Note: these levels correspond roughly to loading, linking, and compiling, respectively.]   The 
debugger can then report the source statement, procedure name, and module name to the user.  Some debuggers use this function to 
field a breakpoint: the debugger determines the current execution point and searches its breakpoint database to see whether the 
user has requested a breakpoint at that source statement.  Because unplanned debugger activity may occur at an object code location 
representing the middle of the execution of some program portion, the debugger may need to record more information to permit 
mapping object locations to source statements than it needs to permit mapping in the opposite direction.
To display the value of a variable, the debugger must know what source statement defines the variable environment to be used.  By 
default, the debugger sets the environment from the current program counter, determining the current execution point as described 
above.  Given a source statement, the debugger uses the mapping from identifiers to user-defined variables to get the type 
information for the variable and find its storage location.  For a non-local variable, this operation involves accessing the 
procedure call chain in exactly the same way that a compiled variable reference might.  The debugger then prints the contents of 
the storage location formatted according to the type.
To display the procedure traceback, the debugger accesses the current procedure call chain, which typically records object code 
return addresses.  For each object code location in this chain, the debugger reports the corresponding source information, as 
described in the paragraph on determining the current execution point.  The debugger can also report the values of local variables 
or parameters by using the source information to determine the correct environment.
2.3.2 Concrete versions of mappings and corresponding debugger use
The compiler is responsible for constructing mappings that allow a debugger to communicate with the programmer about a computation 
in source language terms.  These mappings include information about the user-defined variables, the procedure history, and program 
locations.  Some of this information is needed whether or not a debugger is present.  The compiler needs the variable information 
to generate code correctly; most language support systems do not need this information during execution.  The language support 
system needs the procedure call history information during execution to manage the program's control flow and to bind variables 
properly.  Use of this information is compiled into the program's object code.  Location information is needed by the debugger only.
Debugger implementations can be categorized by the amount of extra object code they require specifically for debugging support, in 
addition to the object code needed to compute the statements specified in the source program.  Invasive debugger implementations 
include explicit calls to the debugger in the object code generated for the program.  Semi-invasive debuggers require extra object 
code, but do not explicitly call the debugger.  As a result, for invasive and semi-invasive debugger implementations, the user must 
specify a desire to debug a program before compiling it.  To support a non-invasive debugger, a compiler places all debugging 
information in auxiliary tables, keeping the object code as small and as fast as when not debugging.  The auxiliary tables can be 
kept in secondary storage until a debugging request is issued.  A compiler supporting a non-invasive debugger can make the 
generation of the auxiliary debugging tables optional, thereby decreasing compilation size and speed when no debugging is desired, 
or it can create them during every compilation.  If the compiler always generates the auxiliary tables, no special compilation 
directives are needed to allow later debugging activity.
This dissertation emphasizes non-invasive debuggers because they are more in keeping with the philosophy of optimization: the extra 
object code generated for invasive debuggers can consume a significant amount of execution time and space, even when no specific 
debugging requests (such as a breakpoint request) are active.  Although advanced debugging capabilities, such as single-stepping, 
data breakpoints, or assertion-driven breakpoints, are easier to implement with invasive debuggers, they can also be implemented 
with non-invasive debuggers.  
2.3.2.1 Variable mapping
For a debugger to refer to data items in source terms, the debugger must have precisely the same information that the compiler used 
to generate variable accesses and assignments during compilation, and it must have this information about every user-defined 
variable in the entire program.  Four pieces of information must be available at runtime about each variable.  The first is the 
identifier associated with the variable.  The second is the scope of the variable, so that the debugger can access the proper 
binding of the identifier when the user requests its value.  The third is the type of the variable.  The debugger uses the type to 
print variable values in an appropriate format and to check the legality of expressions that the user presents to the debugger's 
interpreter.  Some debuggers also display type information upon request; this is particularly useful for structured types or 
procedure types.  The fourth is the runtime storage location of the variable (for accessing simple static variables) or a storage 
template (for accessing fields of structured variables).  The debugger uses the procedure call chain to implement the binding rules 
for non-local variables, finding the most recently activated frame of the defining procedure, which holds the current value of the 
variable.
This information is collected during the parsing and storage allocation phases of compilation, and is typically stored in a symbol 
table.  For a conventional debugger, the most complex part of this information is the scope of a variable.
For correct compilation in a conventional one-pass compiler, only the currently accessible variables need be present in the symbol 
table at any given time.  The scope information can be captured either in a nesting level number for each variable or, implicitly, 
in a linkage structure that links together all variables at a given nesting level.  Variables are removed from the symbol table 
when that scope is closed (by exiting a block or procedure).  To handle the redefinition of an identifier in an inner scope, all 
definitions of the same identifier are typically linked together so that the innermost (i.e., the current) definition is found 
first in a symbol table lookup.
To provide a symbol table for a debugger to use during the program's execution, this simple symbol table management must be 
augmented.  One possible data structure gives each variable a scope number in the symbol table.  Note that a scope number is not a 
nesting level number because many different scopes have the same nesting level.  A separate table describes the tree of scope 
numbers.  At runtime, the debugger must be able to determine the scope corresponding to any given program location.  The scope tree 
can include ranges of source statements during which each scope is the innermost.  Another possible data structure for the scope 
nesting and statement information is a list of scopes with ranges of applicable source statements.  In this structure, the scope 
nesting is implicit in the nesting of the statement ranges.  The debugger must search scopes from inner to outer to find the proper 
definition of multiply-defined identifiers.
Symbol tables generated by conventional compilers generally equate source statement ranges and object location ranges, since these 
two representations are in lock-step for straightforward compilation.  These symbol tables frequently represent scope information 
directly, as object code addresses, so that the debugger can omit the extra translation from object code location to source 
statement when accessing information about variables.  Furthermore, these compilers assume that each variable has a single unique 
storage location that is valid for the entire scope.  As we shall see, neither of these assumptions hold for optimized programs.
2.3.2.2 Statement mapping
A debugger uses a mapping from source statements to corresponding object code locations to set location breakpoints.  A mapping 
from object code locations to corresponding source statements is used to report the source location when a computation is suspended 
for any reason.  Indirectly, this second mapping is also used to find the correct variable environment.
 In an invasive debugger, the generated code for each source statement begins with a call to the debugger, providing the number or 
some other identification of the source statement as an argument.  Thus the mapping between source and object locations is embedded 
in the object code itself.  Since these debuggers are activated at each statement boundary, they can implement all types of 
breakpoints or single-stepping easily.  The debugger keeps lists of events of which the user wishes to be notified.  Each time the 
debugger is called, it checks to see whether any of these events have occurred.  For example, it can determine whether a given 
statement is about to be executed, whether the value of a set of variables has changed since the last statement, or whether a 
certain condition now holds.  Implementing data access breakpoints is a little more difficult: the compiler generates all data 
accesses indirectly, as calls to a data-accessing procedure in the debugger to which the desired variable is a parameter.  To 
single-step, the debugger simply gives control to the user each time it is called.  In case a program exception occurs during the 
execution of the object code associated with a statement, the debugger also stores the current statement number internally.  
Although invasive debuggers are simple and powerful, they tend to be very slow. 
Semi-invasive debuggers exist primarily to overcome difficulties in the machine architecture with respect to debugging.  If the 
machine cannot determine an exception location after an exception handler has been invoked, the object code corresponding to a 
given source statement begins with instructions to store the statement number in a known location.  If the machine's trap 
instruction requires more than the minimum number of instruction bytes, or if it would be difficult to execute the original 
instruction in context when resuming the program's execution, the object code corresponding to a given source statement begins with 
a null instruction.
In a non-invasive debugger implementation, the compiler creates a separate table that shows the relative program offset of the 
start of the generated object code for each source statement.  This table is called a statement map.
The information for a statement map is typically collected in three major steps.  The first step takes place while the compiler is 
scanning and parsing the source text: the compiler records a statement (or line) number with the subtree for each statement in the 
resulting abstract syntax tree.  The second step occurs while the compiler is generating the abstract code stream (by walking the 
tree): the compiler associates a statement number with the first instruction generated for each statement.  The third step takes 
place while the compiler is writing the completed object code (after final object instruction selection and address assignment) to 
the output file: for each source reference in the object code, the compiler records the current relative program offset and the 
statement number in the statement map.  No source information appears in the executable object code output by the compiler.  [The 
procedure described here would obviously be simplified for a one-pass compiler.]
A non-invasive debugger uses a statement map in a straightforward manner.  To set a breakpoint at a given source location, the 
debugger searches the map, finds the source location, and places the breakpoint at the corresponding object location, usually by 
replacing the existing instruction by a trap instruction.  [The machine architecture can permit other methods: for instance, some 
machines have a special trap bit in each instruction or each word.  The latter allows for efficient data breakpoints.]  If the 
debugger replaces an object location with a trap instruction, the debugger must save the original instruction and execute it when 
the user resumes execution from the breakpoint.  To report the current

execution point when a breakpoint or exception occurs, the debugger finds the nearest preceding object code location in the map and 
reports that the corresponding source statement is executing.  Optimization may compromise this interpolation step.
In an alternative non-invasive implementation, the debugger interprets each object code instruction.  By using the auxiliary tables 
provided by the compiler, such a debugger can perform all of the functions of an invasive debugger without requiring special 
compiled code.  For instance, these debuggers simply enter a breakpoint in a breakpoint list rather than enabling a hardware trap.  
When debugging requests are active, this kind of implementation is even slower than an invasive debugger.  Because it is preferable 
that object code run at close to full speed even when debugging requests are active, non-invasive interpreting debuggers are 
ordinarily less desirable than the non-invasive trap-oriented kind.
Conventional compilers generate statement maps in which source and object entries both increase monotonically.  This fact is a 
result of simple code generation schemes that generate code for each statement 1) without reference to other program statements and 
2) in the order that the statement appears in the source text.  Conventional debuggers use this fact either to speed up searches 
through the statement map, by using binary search, or to make the statement map smaller, by encoding the map as (source increment, 
object increment) pairs.  Statement maps for optimized programs typically do not have this property.
Figure 2-1 shows a small program, a high-level view of the object code generated by a conventional nonoptimizing compiler, and a 
conventional statement map and symbol table.

fig21.press  leftMargin: 2 in,  topMargin: 0.625 in,  width: 4.9375 in,  height: 8.5 in
Figure 2-1.  Unoptimized code generation with monotonically increasing statement map and simple symbol table.
2.4 Properties for effective debugging of optimized programs
This section delineates the properties of a programming environment that are essential for effective debugging of optimized 
programs, the properties that are desirable, and the properties that are relatively unimportant in this context.
2.4.1 Essential properties
The primary objective of this research is to provide effective debugging tools for optimized programs.  An essential property of 
any effective debugging tool is that it must be intelligible to a relatively naive user.  A tool that can only be understood by 
compiler and optimizer experts is not sufficient.
A user will find it easiest to debug an optimized program if the debugger always responds exactly as it would for an unoptimized 
version of the same program.  I call this property expected behavior for a given optimization.  A debugger with expected behavior 
must present a restructured view of the actual state of the optimized programa view that conforms to the source-level 
computational model.  It hides the changes caused by the optimizations from the user, much as a source-level debugger hides the 
details of a lower-level implementation of an executing program.  A less desirable but still acceptable alternative is to provide 
truthful behavior for an optimization.  This means that the debugger avoids misleading the user: either it displays (in source 
program terms) how optimizations have changed the program portion under consideration, or it admits that it cannot give a correct 
response.  Thus a debugger with truthful behavior acknowledges that optimizations have altered the program and its computational 
behavior.  The user must participate in the restructuring of the program state.  A debugger that has neither expected nor truthful 
behavior is likely to give incorrect and/or confusing responses to queries about an optimized program.
An example will help to characterize the distinction between expected and truthful behavior.  Figure 2-2 is a repetition of Figure 
1-1, in which the code motion optimization moves the assignment x_1 at statement 18 outside the loop, thereby reducing the 
program's execution time.  The dead store elimination optimization then removes the unnecessary assignment  x_0 at statement 15.  
As noted in Chapter 1, if the optimized program is stopped at statement 17 on the first time through the loop, a conventional 
debugger would report that the value of x is 1.  According to the unoptimized program, x's value at that point would be 0.  
Therefore, a debugger with expected behavior for code motion would report that x's value is 0, even though x's storage location 
contains 1.  Note that at subsequent arrivals at statement 17, the value of x is the same in the optimized and unoptimized program 
versions.  To achieve expected behavior, the debugger must realize that the value of x is incorrect with respect to the source 
program on the first time through the loop; it must also be able to discover what the value of x should be.  Truthful behavior is 
not as well-defined as expected behavior.  Its primary goal is a negative one: to avoid hindering the user by reporting 
source-level information that may not be true; it can achieve this goal in a wide variety of ways, ranging from the extremely 
taciturn to the extremely voluble.  In order of increasing informativeness, some examples of truthful responses are:


source program fragment    after code motion & dead store elimination

15 x _ 0;
        x _ 1;
16 FOR i IN [1..3] DO    FOR i IN [1..3] DO
17    Read[a[i]];       Read[a[i]];
18    x _ 1;
19    ENDLOOP;        ENDLOOP;
Figure 2-2. [Repeat of Figure 1-1.]  Effects of code motion.
                                          

An argument stressing the importance of expected behavior, although in a setting of optimizations at the machine architecture level 
rather than at the program transformation level, appears in Johnson's paper on architectural requirements for source-level 
debugging [Johnson82b]:
"Good debugging requires that the `semantic gap' between the source and object languages be as small as possible, since the user's 
mental picture of how a program executes is based on the structure of the source program, not the structure of the object program 
as it actually executes.  If the underlying hardware changes the user's high-level view substantially (for example, by instruction 
pipelining or by other optimization features), the onus of closing the semantic gap falls on the debugger."

A practical implementation of a system for providing expected debugging of optimized programs must have two additional properties.  
First, an optimized program that is capable of being debugged should not be larger or slower than an unoptimized version of the 
same program (although larger but faster might be tolerable on a large machine, and slower but smaller might be tolerable on a 
small machine).  Ideally, adding debugging capabilities for optimized programs should not cost any extra execution time or space 
unless the debugger is actively responding to a user request.  Second, the modified compiler and debugger should still be 
reasonably efficient in their use of time and space.
2.4.2 Desirable properties
To be able to exploit the context of a program exception, as discussed in Chapter 1, the debugger must be available at any time 
during any execution of a program.  Therefore, if a program exception is signalled, or if the user simply decides to monitor the 
progress of the computation, the debugger can display information about any piece of the executing program.  This philosophy is 
particularly important in an experimental programming environment (such as the Cedar programming environment at Xerox PARC), in 
which a user typically calls upon several packages developed by other programmers.  Such packages fail frequently enough to warrant 
instant availability of the debugger to explore the program state, but infrequently enough that implementors would not enable 
expensive prearranged debugging features.  Using the debugger to monitor a program's use of such a package can also be a valuable 
tool for learning about the package.
2.4.3 Unimportant properties
The primary concern of this research is the achievement of certain levels of debugger functionality in an optimized setting.  
Several features of the debugging environment are relatively unimportant to this work; they may affect the implementation of a 
debugger for optimized programs, but they do not affect the basic concepts discussed here:

in the source text with a mouse or by typing in a statement number), and 

2.5 Historical connections
This section connects the present work to other debugging research by presenting a brief history of debuggers and discussing recent 
research directions in debugging.
2.5.1 Brief history of debuggers
Programmers have long realized the benefits of debugging a program interactively using the language in which it was written.  
However, partially because of excessive optimism about error rates, debuggers have tended to trail behind advances in other parts 
of programming environments.
Interactive source-level debuggers for machine languages.  The first programmers debugged their programs at the console of the 
computer.  They watched console lights for a binary representation of the current program counter and register contents, and they 
set console switches to examine memory locations or to alter register or memory contents.  They could start, stop, or single-step 
the machine directly.
Non-interactive low-level debuggers for assembly languages.  After a time, programmers were banned from the machine room.  By this 
time, programs were commonly written in an assembly language.  If a program halted with an error, the programmer was given an octal 
dump of registers and memory to pore over in solitude.  (In fact, this debugging method has survived until today on some machines, 
even for debugging high-level languages like Fortran or C [Thompson+75].)
Interactive source-level debuggers for assembly languages.  With the advent of time-sharing machines, users could again debug their 
programs (written in assembly language) interactively.  Two important debugging features were introduced with the FLIT debugger at 
MIT in 1959: the user-controlled breakpoint, and symbolic variable and procedure names [Evans+66].  Contents of registers and 
memory could be printed in a variety of formats: instruction format, characters, and octal and decimal numeric representations; 
after selecting the proper format, the programmer could examine program instructions or data in the same form as they appeared in 
the source program.  Users could also modify values of variables and patch the program.  To maintain the correspondence between the 
source and object versions of the program, advanced debuggers of this time copied patches created during the debugging process into 
the source representation of the program [Lampson65].
Non-interactive source-level or interactive low-level debugging for high-level languages.  With the development of high-level 
languages, there were more alternatives.  The easiest method was to insert statements to print values of variables or indications 
of control flow.  This method required no knowledge of debugger commands (indeed, it works even when there is no debugger), but was 
not interactive.  Another non-interactive method was exemplified by the Algol W debugger.  When a runtime error was encountered, 
this debugger showed the program statement in which the error occurred and provided a symbolic memory dump: variables were 
identified by name and their values were formatted according to their declared types.  The Algol W debugger also allowed contextual 
tracing of source statement executions, an idea based on proof by induction [Satterthwaite75].  [Note: Satterthwaite's dissertation 
is well worth reading.  It gives an extraordinarily careful explanation of his implementation and its theoretical and historical 
underpinnings.]  Users who insisted on interactive debugging capabilities were forced to use a machine-language debugger.  These 
adventurous users had to understand compilation techniques to locate statement boundaries or to access array elements or fields 
within records.
Interactive source-level debuggers for high-level languages.  As high-level languages matured, interactive source-level debuggers 
for high-level languages became available.  By suppressing extraneous details and by interpreting low-level information from the 
high-level viewpoint, these debuggers translated the useful techniques that were available in earlier assembly language debuggers 
into high-level language terms.  These techniques include breakpoints, displays of variable values, single-stepping, and traces.  
There are now many examples of interactive source-level debuggers for high-level languages, such as BAIL (a debugger for the SAIL 
language) [Reiser75], and pdx (a debugger for Pascal) [Linton81].
2.5.2 Recent debugging research
Much of debugging research over the past fifteen years has addressed the following fundamental debugging problem: an error may be 
detected much later in the program's execution than the time at which the program began to diverge from its intended behavior.  
This research centered on providing sophisticated monitoring capabilities and reverse execution.
Several systems were based on the idea of collecting a total history of a single computation; in such a system, the debugger is a 
post-processor that manipulates the completed "history tape".  These systems freed users from examining a computation closely just 
in case an error might occur.  However, this approach does have drawbacks: a post-processing debugger is necessarily read-only; it 
cannot modify a computation.
Balzer's EXDAMS system worked by modifying source programs to collect history information [Balzer69].  Executing the program 
created a history tape, which recorded statement executions and changes to variable values (similar to our computational model).  
The debugger answered user queries by examining the completed history tape.  Because it dealt only with the history tape, the 
EXDAMS debugger was language-independent and extendable.  The history tape could be played backward and forward at will.  As a 
result, the debugger could perform complex analyses that were impossible for a conventional debugger, such as performing flowback 
analysis to determine what statement gave a variable its value.  It could also present the succession of values of a given variable.
Pattis proposed a more ambitious history-based system [Pattis81].  It included a time-oriented langauge for constructing complex 
queries about a completed computation, such as: "Is ClearLine ever called twice in a row without an intervening call to FillLine?"  
It also proposed an efficient way to gather and manage the large amounts of history data, including background processing during 
user interactions and lazy evaluation.
The HELPER system could list the last several (50) assignment statements executed [Kulsrud69].   The AIDS system allowed retreating 
some number of instructions in a computation; a common use of this system was to specify that whenever the program terminated 
abnormally, the debugger should retreat and then trace forward to the error point [Grishman70].
Zelkowitz extended these ideas, providing a non-interactive system for reversible execution in Cornell's PL/C environment 
[Zelkowitz71].  The system stored the last 200 source-level steps of the computation to allow for reverse execution.  The 
programming language was extended to include a RETRACE statement, which could reverse the program's execution 1) to a specified 
label, 2) for a certain number of steps, or 3) until some condition held.  The RETRACE statement also specified a new statement to 
execute at that point; the statement could try to get the computation back on track or it could enable tracing or other debugging 
actions.  The computation would then continue

forward from that point.  In case the error occurred again in the re-execution of the computation, the RETRACE statement could 
include several reversal conditions and new statements; each would be tried in turn.
Davis implemented a knowledge-based debugger for the PLATO system at the University of Illinois [Davis75].  Targeted at 
introductory computer science students, the system executed programs interpretively.  When a program exception was encountered, the 
system used reversible execution and a database of common programming misconceptions to diagnose the source of the error and 
suggest a correction.  Flowback analysis was a major part of this system also.
Several researchers built systems to display the values of variables graphically.  Earlier display-based systems, such as the RAID 
system at Stanford [Petit75], had been able to monitor the values of a set of variables continuously without changing their 
locations on the display screen.  Yarwood's non-interactive system drew pictures of arrays as sequences of values in a box.  
Variables used as array indices were shown as flags marking their current position in the array [Yarwood77].  The Incense system, 
implemented by Myers at Xerox PARC [Myers80], could display complex structures built up of records with embedded pointers.  The 
user could also program Incense to display the values of certain variable types analogically, such as by a pie chart or bar chart.  
For instance, a variable of type Time could be displayed as the face of a clock.
Other recent debugging research has attempted to provide debugging capabilities in new environments, such as programming 
environments for robot arms and multiprocessor environments.  These projects have generally focused upon providing the conventional 
debugging capabilities.
Mitchell L. Model addressed the problem of providing interactive source-level debugging for sophisticated artificial intelligence 
languages [Model79].  As an example of the kind of new problems these languages raise, displaying the changing locus of control is 
difficult for such languages because control is de-centralized.  The problem faced by Model is quite similar to the one addressed 
in this dissertation: how to present a high-level view of a program's execution that is not a simple mapping from its actual 
implementation.  The first half of his dissertation, in which he strongly motivates the need for an expected high-level view, is 
both interesting and enjoyable to read. 
Goldman implemented a debugging environment for programming robot arms [Goldman82].  It included special-purpose data displays, 
such as a graphical presentation of the forces experienced by a given joint during a motion.

Researchers have recently begun to tackle the problems involved in debugging multiple processes and distributed systems, such as 
nondeterministic choices (that depend upon interprocess communication) and the inability to discover the relative execution order 
of statements in separate processors [Baiardi+83, Bates+83, Smith83].  Several proposed solutions involve recording interprocess 
events.
Another area of recent research involves unified program development environments that have no separable "debugger" [Archer81, 
Feiler82].  Instead, debugging commands are added to the source language.  These environments typically support incremental program 
construction with the ability to modify a program and resume its execution.
A few researchers have considered the problem of debugging optimized programs.  Their work is surveyed in Section 3.4.
2.6 Summary
This chapter has discussed debugging features and their implementation.  It has also defined a source-level model of the execution 
of a program and has tied the expected behavior of a source-level debugger to that model.  The next chapter presents a broad range 
of examples of ways in which common optimizations can cause a debugger to respond in neither the expected nor the less-desirable 
truthful manner.