A program is a static entity, while its execution is a dynamic one. The semantics of a programming language defines a source-level model of computation for programs written in that language. In his proposal for adopting a functional style of programming, Backus characterized conventional programming languages quite succinctly [Backus78]:
"Von Neumann programming languages use variables to imitate the computer's storage cells; control statements elaborate its jump and test instructions; and assignment statements imitate its fetching, storing, and arithmetic. ...[A von Neumann program] is dynamic and repetitive. One must mentally execute it to understand it. ...Its statements operate on an invisible `state' according to complex rules."
The source-level model of computation is a combination of that `state' and its complex rules.
2.1.1 The source-level model
One way to define the behavior of a program is via a set of input/output pairs: the program transforms a given input into its corresponding output. This definition is adequate for a program's user, who wishes only to see the program's result. However, a view of the computation's internal state is essential for a program's implementor or maintainer, who must locate the problem when the program fails to produce the desired result. On the other hand, this model is not intended to capture the particulars of a machine-level view of a program's computational behavior. A machine-level model would include many details about an individual implementation, such as the machine's instruction set, that do not appear in and are not specified by a high-level language.
The purpose of the model is to define the view of execution that should be presented to a user by a source-level debugger. The debugger typically constructs its source-level view of a computation from a lower-level representation that is hidden from the programmer. The model developed in this section will be used in Chapter 3 to determine what the debugger's response to a given command should be at a given point in a computation.
This model was produced by combining Horning and Randell's view of sequential processes [Horning73] and the standard semantics of Algol-like languages. Although the resulting model is neither as general nor as precise as Horning and Randell's, it is well-suited for describing the expected behavior of a source-level debugger. A more formal treatment of much of the material in this section, though with a somewhat different emphasis, appears in Chapters 3 and 4 of Satterthwaite's dissertation [Satterthwaite75].
Programmers often envision a program's execution behavior in terms of a processor interpreting the source statements of the program. This processor has a set of state variables, which hold values of user-defined variables and auxiliary variables (used by the processor to execute the program). The program's execution can be described in terms of a sequence of values of these state variables. This sequence is called a computation.
An action specifies the transition from one state to the next: it assigns new values to some state variables; the remaining state variables keep their old values. The final state of a finite computation is a state for which no action is defined. A computation's action sequence is the sequence of actions performed to generate the computation. In general, a given action sequence does not completely specify a computation, because it could be applied to several initial states.
A
program is a compact description of these state transitions and their sequencing. A program is executed by a
processor. The state variables of a processor include:
" the program statements,
" the user-defined variables, and
" a set of auxiliary variables, used by the processor for the bookkeeping necessary to execute the program. These auxiliaries, which include the program counter, the procedure call chain, and data from currently-active procedures, are discussed below.
In an Algol-like language, the transition from one state to the next involves changes to only the variables and the auxiliaries; the statements are constant over the program's execution. That is, the program cannot create new statements and execute them. (This is not true of LISP, for example.)
The most important auxiliary variable is the program counter, which identifies a program "portion" that is about to be executed. In general, the granularity, or level of detail, at which a program's execution can be viewed is dependent on the language definition. For a language that specifies the execution order within a statement, or for an expression language (such as Bliss [Wulf+71]), it might be an expression. For a language that does not specify the execution order of expressions within a statement, the granularity must be statement level or higher; in the typical language a statement is a reasonable program "portion". (Most Algol-like languages define "statement" recursively. Therefore the execution of a composite statement, such as a loop, generally implies the execution of several nested statements. A more complete computational model for Algol-like languages could also mark the end of execution of each composite statement.) The remainder of the discussion assumes statement granularity unless otherwise noted.
With the exception of dynamically-allocated variables (which can be handled by a simple extension), a user-defined variable can be thought of as having a conceptual lifetime that extends over the entire computation. Each variable has a type, which specifies the range of values it can assume. However, a single variable identifier can refer to different variables at different points in a computation. An environment is a mapping from identifiers to a subset of the variables. If a given environment associates a previously-used identifier with a new variable, the old variable is inaccessible in that environment. Algol-like languages are statically scoped, meaning that the environment is a function of a point in the program text (i.e., the program counter). In contrast, most dialects of LISP are dynamically scoped, meaning that the environment is a function of the point in the computation.
Variable binding is a related concept for the handling of dynamically-allocated variables. Algol-like languages permit recursion, implying that multiple instances of a particular variable can exist at the same time. For a given identifier, the correct instance of its variable (i.e., the binding of the identifier) is a function of the history of the currently active procedure calls, that is, the order of their occurrence and their data. At a given point in a computation, statements in an Algol-like language cannot refer to a different instance of a variable than that specified by the binding rule. To implement the binding of non-local variables, the processor keeps a history of the currently active procedure calls, called the procedure call chain, as part of the auxiliaries. At procedure entry and exit, the processor manipulates these auxiliaries to maintain the proper binding.
Therefore, the source-level view of a computation in an Algol-like language can be completely described by the constant program text (T) together with the sequence of values of the user-defined variables (V), the program counter (PC), and the procedure call chain (C). These are the items that the user will want to observe via the debugger to monitor the program's behavior. The first line of the following figure shows a fully-specified finite computation. The second line shows that the transformation from state si to its successor si+1 is caused by the execution of the piece of program text at PCi.
<<{T,V0,PC0,C0}, {T,V1,PC1,C1}, ... , {T,Vn,PCn,Cn}>>
T(PCi)[{T,Vi,PCi,Ci}] = {T,Vi+1,PCi+1,Ci+1}
Note that the location of the statement that is about to be executed (i.e., the program counter) is an explicit element of the computational state. Therefore, a computation that executes the same actions on the same data, but that uses different copies of those actions (i.e., actions at some other location in the program text), is a different computation.
For ease of later reference, we informally partition the model into three parts: locations, data, and history. The location elements of a computation are the program counter and the action about to be executed at some state in the computation. The data elements of a computation are the variables and their values at some state in the computation. The history elements of a computation refer to the sequencing of code or data elements, such as the procedure call chain or the number of times that a given action was executed.
This source-level model is easily extended to handle multiple processes. Each process creates a separate computation, with its own program counter and procedure call chain. Variables and program text can be shared among more than one process. [We are not interested in methods for interprocess communication here.]
2.1.2 Behind the source-level model
This source-level computational model is an abstraction; an executing program demonstrates observable behavior that is outside the model. Furthermore, the actual sequence of steps required to execute a program on a computer is much more detailed than this model suggests. To present a source-level view of a computation, a debugger must map these details into a correct source view and it must hide the parts of the implementation that are extraneous to the source view.
A program's computational behavior does not include any encoding of program or output timing, data or program storage particulars that the user might be able to deduce from the program's physical behavior, or audible cues from specific hardware (such as disks). These and other similar items are not directly specified by the high-level specification of the program; rather they are implied by the low-level implementation of the programming language, the operating system, the hardware, and so on. The definition of computational behavior constructed here expressly avoids defining all such behavior so as to avoid constraining the low-level implementation.
A computer could directly execute a program written in an Algol-like language, either by interpreting it or by having an instruction set that implemented the language. For efficiency and flexibility, a high-level program is usually translated into a lower-level representation that is more suited to the particular machine, frequently the native instruction set of the machine. This translation is performed by a compiler. Of course, a program's execution on an actual computer can be viewed at many different levels: the sequence of program statements, the sequence of machine instructions, the sequence of steps to perform each machine instruction (often in the form of microcode), the physical behavior of the circuits in the machine, etc. The source-level view is thus only one of several possibilities; it has the advantage of being the view formulated by the programmer.
A compiler translates each source statement into a sequence of lower-level instructions that carry out the actions specified by the statement. There are usually many distinct ways to accomplish these same actions; which one the compiler chooses does not affect the definition of program behavior we have already constructed. The compiler frequently uses temporary storage to hold intermediate results; where and how many of these temporaries exist is immaterial to the user. The compiler also selects particular arrangements of storage to hold values of user-defined variables. For example, most users are not concerned with whether array elements are stored by row or by column.