Chapter 6 Complete Navigator Algorithms This chapter describes the Navigator implementation, expanding on the overview given in the previous chapter. It presents the Navigator compiler and debugger algorithms for handling inline procedure expansion and cross-jumping, including the complications caused by the language, the system environment, and multiple optimizations. Chapter 7 analyzes the compiler and debugger path determination algorithms, which distinguish among multiple source alternatives in regions merged by cross-jumping. Chapter 7 demonstrates the correctness of the algorithms, analyzes their efficiency for repeated cross-jumping, and discusses some possible improvements to them. 6.1 Cedar compiler This section describes the overall organization of the Cedar compiler and its data structures. Some of the data structures were enlarged to create the Navigator compiler; these changes will be detailed in later sections. The Cedar compiler is a six-pass compiler with a typical organization. Figure 6-1 shows the compiler's overall structure, along with the responsibilities of each pass. Input scanning and parsing. Pass 1 scans and parses the input program, creating an abstract syntax tree representation of the program. fig61.press leftMargin: 2.3125 in, topMargin: 0.25 in, width: 4.625 in, height: 8.5 in Figure 6-1. Structure of the Cedar compiler. Declaration processing. Passes 2 through 4 operate on the abstract syntax tree. Pass 2 builds a symbol table. Pass 3 performs type-checking and inline procedure expansion. These operations may require consulting symbol tables or abstract syntax trees created by previous compilation of definitions modules. Pass 4 allocates variables to stack frame positions according to their static frequency of use, since the first few local variables can be accessed more efficiently than the rest. Code generation and improvement. Pass 5 generates object code for a fairly simple stack machine, one procedure at a time. It has two phases. Pass 5A generates an abstract code stream from the decorated abstract syntax tree. Pass 5B performs a number of object code optimizations [Wulf+75], computes branch lengths, and does final opcode selection. Output. Pass 6 builds the statement maps and writes the object code, the symbol table, and the statement maps to the output file. 6.1.1 Data structures The compiler uses three major data structures: the abstract syntax tree, the symbol table, and abstract code streams. A fourth and less important data structure is the statement map. The abstract syntax tree, produced by the parser in Pass 1, is doubly-linked so that it is possible to access a node's parent or children with equal ease. Each abstract syntax tree node that corresponds to a statement in the Cedar language, as defined by the Cedar grammar, will be called a statement node. The subtree rooted at a statement node contains a complete description of that source statement. The symbol table, constructed from the abstract syntax tree in Pass 2, is also tree-structured. Its nesting captures the static name-inheritance of Cedar scopes. Each symbol table entry contains a complete description of the meaning of some name within the enclosing scope: the name and its type are important components. Storage allocation information also resides in the symbol table; it is added in Pass 4. Symbol table entries are themselves tree-structured, to express structured types such as procedures, arrays, and records. Abstract code streams, created for each procedure by the code generator in Pass 5A, are represented as doubly-linked linear lists of two types of cells: instruction cells and marker cells. Each instruction cell contains a complete description of one machine instruction (either a jump instruction or a code instruction). A marker cell does not correspond to any object code; it is either a label marker or an info marker. Each jump instruction has a pointer to its destination cell, which is a label marker. All jumps to the same label are linked together so that given either a jump or a label, all jumps to that label can be found. Info markers are used to record information for statement maps and symbol tables. An abstract code stream may also be referred to as just a code stream. Statement maps are created from the abstract code stream in Pass 6. In the Cedar compiler, a statement map is a table of entries, each consisting of a source statement number and the object code location corresponding to the start of its execution. The statement map is stored as a list of (source increment, object increment) pairs, so it allows only monotonically increasing values for statement numbers and object locations. This structure is space-efficient, but it cannot represent one-to-many and many-to-one source-to-object mappings. For the Navigator compiler, the statement map representation was changed to allow full source statement numbers and full object code addresses and to allow lists of statement numbers for a single object code location. 6.1.2 Keeping track of source correspondences The movement of source correspondence information through the compiler is crucial to truthful debugging. There are three major steps. From input program to abstract syntax tree. The compiler's input scanner, in Pass 1, records the source position of the beginning of each token. As the parser constructs the abstract syntax tree, also in Pass 1, it fills the sourcelink field of each statement node with the recorded token start point of the first token in the statement. Passes 2 through 4 modify the abstract syntax tree, but do not alter the sourcelink fields (except for the effects of inline procedure expansion in Pass 3, which will be discussed shortly). From abstract syntax tree to code streams. In Pass 5A, the first cell that the code generator produces when translating each statement subtree into a section of the code stream is a sourcelink cell, a type of marker cell. The sourcelink cell is filled with the source reference from the statement node's sourcelink field. Therefore, every instruction cell between two sourcelink cells is part of the object code for the source statement that begins at the source position mentioned in the statically preceding sourcelink cell. In the original Cedar compiler, the object code optimization phase (Pass 5B) inserts and deletes instruction cells but does not alter sourcelink cells. The Navigator compiler changes the sourcelink cells also. [Note: for descriptive ease, the algorithms described in this chapter assume that each instruction cell has its own source information. Section 6.4.5 discusses the changes necessary to handle separate sourcelink cells.] From code streams to output statement maps. In Pass 6, the output routines step linearly through the abstract code streams for each procedure, writing the contents of the instruction cells to the object code file. Marker cells do not appear in the output object code. Whenever a sourcelink cell is encountered, the compiler records the current relative program offset and the sourcelink cell's operand in the statement map. After all of the object code has been emitted, the completed statement map and the symbol table are also written to the object code file or to a related file. 6.2 Inline procedure expansion The compiler performs inline procedure expansion on the abstract syntax tree representation of the program, as part of Pass 3. For each call to an inline procedure, the compiler assigns the actual parameters of the call to the formal parameters of the inline procedure and replaces the call by the statements from the body of the inline procedure. This section describes how the Navigator compiler and debugger handle the effects of inline procedure expansion. Section 5.7.1 gave an overview of this process; this section presents more detail. Additional changes are needed to handle nested inline procedures, which are inline procedures that are called from other inline procedures, and imported inline procedures, which are inline procedures declared in separate modules. In addition, this section briefly considers the issues of further optimizations on the expanded statements. Recall that in Cedar, a programmer explicitly marks procedures that should be expanded; the compiler does not select which procedures or procedure calls to expand. This has three consequences. First, procedures that are not marked as INLINE procedures will not be expanded, even if they are as small and called as frequently as marked procedures. Second, every call to a marked procedure is expanded, rather than only calls that are expected to execute frequently (such as calls inside loops). Third, the Cedar compiler does not generate stand-alone object code for inline procedures. fig62.press leftMargin: 1.75 in, topMargin: 1 in, width: 5.6875 in, height: 8.5 in Figure 6-2. Inline procedure expansion in the abstract syntax tree form. As mentioned in Chapter 5, inline procedure expansion affects the following primary debugging functions: setting a breakpoint on a statement in an inline procedure, reporting the execution location at such a statement, and providing a procedure traceback that logically includes an expanded inline procedure. Inline procedure expansion also affects several secondary debugging functions: setting breakpoints at the entry or exit of inline procedures, calling inline procedures from the debugger's interpreter, displaying the values of parameters and variables when inside expanded inline procedures, and single-stepping with coarse granularity. The Navigator compiler's inline procedure expansion algorithm (Algorithm NI1, shown below) handles the primary debugging functions. Extensions to handle the secondary debugging functions are discussed at the end of this chapter. The changes from the original Cedar algorithm appear in italics. The simpler changes (not underlined) were explained in Section 5.7.1. The underlined changes are explained following the statement of the algorithm. These additional changes are needed to handle nested inline procedures and imported inline procedures. Figure 6-2 shows the abstract syntax tree of the program from Figure 5-3 before and after the application of Algorithm NI1. The value of a statement node's sourceLink field (which indicates the corresponding source statement) is shown at the lower right of the node; the value of the inlineCallId field appears as a superscript on the statement number. 6.2.1 Algorithm NI1: Navigator compiler inline procedure expansion algorithm Data structures: A new Inline Table with fields calleeName, callerStmt, callerCallId, and declFileId. The table entry number (callId) is also used. The debugger uses the calleeName field to report the name of the innermost procedure when it displays the current execution location. It uses the callerStmt field to report the statement number in the caller when it displays a procedure traceback. If the caller's caller is also an inline procedure (i.e., an inline procedure calls another inline procedure), the debugger uses the callerCallId field to report location information for the caller's caller. See Section 6.3.2.1. If the callee is declared in a separate module, the debugger uses the declFileId field to access the source text for the callee's statement number. See Section 6.3.2.2. A statement node's sourceLink field is expanded to contain a new field inlineCallId. The information from the sourceLink field will be propagated into the statement maps. Statement map entries for expanded statements contain the statement number inside the callee and the new inlineCallId field, which the debugger uses to access the information in the inline table. 1. Walk the abstract syntax tree of the module. 2. Whenever a call node to an inline procedure (determined by looking in the symbol table entry for the procedure identifier) is encountered: 2A. Create a new inline table entry. 2A1. Set calleeName = the name of the called procedure. 2A2. Set callerStmt = statement number of the calling statement (i.e., the sourceLink field of the call node). 2A3. Set callerCallId = the number of the inline table entry of the caller, if any. 2A4. Set declFileId = a pointer to the recorded filename and timestamp of the interface module in which the called procedure is declared, if any. 2B. Create a new block node as follows: 2B1. Allow a new naming scope for the procedure statements. Copy the parameter and declaration subtrees of the inline procedure's declaration node into the declaration subtree of the block node. 2B2. Substitute actual parameters for formal parameters. 2B2a. Insert statements assigning the actual parameters of the call node to the formal parameters of the inline procedure as the first elements of the block's statement list. Set the sourceLink fields of these assignment statements to the sourceLink field of the inline procedure's declaration.1 This allows program exceptions that occur during parameter storing to be reported in the callee. This distinction is more important for languages that delay parameter type-checking until runtime. 2B2b. Inside each assignment statement, insert an eval node above the evaluation of each actual parameter. Set its sourceLink field equal to the sourceLink field of the call node. This allows program exceptions that occur during parameter evaluation to be reported in the caller. 2B3. Expand the statements of the procedure. 2B3a. Copy the statement list subtree of the inline procedure's declaration node into the block's statement list. 2B3b. 2B3c. For each copied statement node, set the sourceLink field's inlineCallId = the index of the new inline table entry. 2C. Replace the call node by the newly-formed block node. Notes: 1 In the unmodified Cedar compiler, step 2B2a set the sourceLink fields of these assignment statements to the sourceLink field of the call node. There was no step 2B2b. 2 In the unmodified Cedar compiler, step 2B3b replaced the sourceLink field of each copied statement node by the value of the sourceLink field of the call node. 6.2.2 Two additional complications: nested and imported inline procedures The Cedar language allows an inline procedure to call another inline procedure [of course, an inline procedure cannot call itself]. As part of its separate compilation feature, inline procedures can also be declared in separately-compiled modules. This section discusses the additional information that must be recorded to handle these language features. 6.2.2.1 Nested inline procedures In general, a procedure has multiple potential callers. The runtime procedure call chain records (indirectly, via the return address) which procedure was the caller at any given procedure invocation. In the case of a call to an inline procedure, the caller is fixed at compile-time. As so far described, the statement maps record the statement number in the callee, and the inline table records the name of the callee and the statement number in the caller. The name of the caller of an inline procedure is retrieved from the runtime procedure call chain. However, an inline procedure can call other procedures, which may or may not also be declared to be INLINE. When an inline procedure calls another inline procedure, neither the caller nor the callee appear in the procedure call chain. Therefore there is another field in an inline table entry, callerCallId, which gives information about the caller. If callerCallId is 0, the caller is in the procedure call chain. If callerCallId is not 0, this call is a nested inline call and the callerCallId entry of the inline table describes the caller. Figure 6-3 shows an example of a nested inline procedure. The procedure SetX is called from the procedure SetXY. Note that the subsumption optimization [see Section 6.2.3] was performed after the call to SetX was expanded; the actual parameter j in SetXY was substituted directly for the formal parameter i in SetX. SetX is also called from the ordinary procedure AdjustXY. 6.2.2.2 Imported inline procedures The separate compilation feature of the Cedar language adds another complication. In particular, an inline procedure is a special case of a compile-time constant. Inline procedures can be declared in definitions modules, and can be imported and called in separate program modules. In this case the inline procedure bodies must appear in the definitions module. Compiling a definitions module produces a symbol table containing an abstract syntax tree for each inline procedure declared in it. fig64.press leftMargin: 1.5 in, topMargin: -0.3125 in, width: 6 in, height: 8.5 in Figure 6-3. Nested inline procedure expansion with subsumption. The optimization process is not materially altered. When a call node for an imported inline procedure is encountered in the abstract syntax tree of an definitions module, the compiler finds the appropriate file, opens it, and copies the procedure's abstract syntax tree from the symbol table in the same way as before. The compiler records the filename and timestamp of the definitions module in the implementation module's symbol table for version-checking purposes. However, imported inline procedures present several additional debugger difficulties. When an imported inline procedure is called, the source statements for the inline procedure are in a different file than the statements of the calling procedure. To determine the execution location inside an imported inline procedure, the statement numbers in the statement maps refer to the interface module rather than the implementation module. This fact is indicated by an additional field declFileId in the inline table entry for a call. If declFileId is not nil, it points to the recorded filename and timestamp of the definitions module. The harder problem for imported inline procedures is setting breakpoints. When a user requests a breakpoint on a statement inside an imported inline procedure, the resulting breakpoint(s) must be set in an unknown group of program modules. During compilation, the compiler records the dependence from program modules to definitions module in the object code file; it does not record all uses of definition information (from definitions module to program modules). When the binder forms a program from a group of modules, it records this use information. Furthermore, since many programs could be loaded and executing at one time that use the same imported inline procedure, the loader must record another level of use information. The combination of binder and loader information of this sort is called the runtime model. The debugger must query the runtime model to discover all currently loaded programs that import a given inline procedure, and set breakpoints in each module as before. A system without a runtime model could use a partial solution to set breakpoints inside imported inline procedures. If the user specified a program module M and a statement s inside an inline procedure, the debugger would set breakpoints at all expansions of s inside M. The resulting behavior would not always be expected, but this approach would provide the necessary functionality. fig65a.press leftMargin: 1.4375 in, topMargin: 0.59375 in, width: 6 in, height: 8.5 in Figure 6-4. Inline procedure declared in definitions module. Figure 6-4 shows an example of an inline procedure declared in a definitions module. The procedure SetX is declared in the definitions module D, and is imported and called in the program module M (from the procedure SetX). 6.2.3 Applying further optimizations to the expanded statements Additional optimizations can be applied to the expanded statements. These additional optimizations fall into two categories: statement simplification based on the actual parameters to the call, and statement simplification and reordering based on the statements surrounding the call. In many cases it is safe to substitute the actual parameters directly for the formal parameters, rather than inserting assignment statements. This optimization, called subsumption [Loveman77], is a form of propagation (see Chapter 3). Subsumption makes it difficult for a debugger to display the current value of a formal parameter inside an expanded inline procedure [a secondary debugging function only peripherally considered by Navigator]. The Cedar compiler applies parameter subsumption only in very restricted cases: when the actual parameter is either a compile-time constant or a variable value and the compiler can guarantee that the call by name semantics implemented by this method is the same as the ordinary call by value semantics. The user can determine these values either by examining the text of the procedure call or by requesting the value of some variable in the caller from the debugger (except when the procedure has side effects). This behavior is not expected behavior; in some cases it may not be truthful behavior. Section 6.5.1.1 discusses how to extend Navigator to handle this problem. An expanded statement from the body of the inline procedure or a portion of one can be deleted if its results are not needed for this particular call. An expanded statement or a portion of one can also be reordered, possibly even moved outside of the expanded procedure to mingle with statements from the calling procedure. The Cedar compiler applies statement-based optimizations in the form of cross-jumping and the other object code optimizations. The effects of these later optimizations will be discussed in Section 6.5.2. 6.3 Cross-jumping In Pass 5A, the compiler generates the abstract code stream for a procedure by walking the procedure's abstract syntax tree. In Pass 5B, the compiler repeatedly performs several object code optimizations, including cross-jumping, on the code stream until the optimizations are no longer applicable. The other optimizations in this iterative process are: replacing a conditional jump around an unconditional jump by an opposite-sense conditional jump, removing branch chains and jumps to the next location, and examining groups of adjacent instructions for opportunities to delete instructions (notably instructions that pop the computation stack) or to combine them into a more powerful single instruction. All of the optimizations preserve the actual ordering of computations along any execution path, although the control flow may be altered. As a result of this compiler organization, the cross-jumping optimization can be applied repeatedly to a single section of the code stream. Furthermore, it can be applied to sections of code stream that have already been transformed by other optimizations, and other optimizations can be applied to sections of code stream that have been transformed by it. {vice versa?} This section first describes the details of the simple cross-jumping and path determination algorithm sketched in Section 5.7.2. The simple version of the algorithm does not correctly handle the application of cross-jumping to a previously cross-jumped region. It then considers several problems that manifest themselves even on the simple version of the algorithm. These problems include two problems that arise during compilation (ambiguous programs and externally-accessible labels) and two problems that arise during debugging (multiple processes and/or recursion, and determination cell allocation). Then the full version of the cross-jumping and path determination algorithm is presented. The full version collects additional information to track cross-jumping's control-flow transformations through repeated applications to the same region. The section concludes with a description of how to perform the path determination algorithm using a modified abstract code stream that represents source information more efficiently. The next section discusses the additional effects of the interactions between inline procedure expansion, cross-jumping, and the other object code optimizations. The complications arising from the cross-jumping algorithm fall into three categories. In order of decreasing seriousness, these categories are: 9 The problem causes incorrect debugger behavior. 9 The problem causes truthful debugger behavior. 9 The problem causes inefficient compiler and/or debugger behavior. The problems in the first category must be dealt with in some waypossibly by converting them into problems of the second or third type. The problems in the second category can be ignored if necessary to reduce complexity or avoid inefficiency. The problems in the third category must be considered seriously if the resulting system is to be of practical use, but can be ignored in some instances to reduce complexity. When a problem causes incorrect or truthful behavior, there are several general solution techniques to consider. Limit cross-jumping slightly. If the problem can be identified reasonably narrowly during the application of cross-jumping, the compiler can choose to inhibit cross-jumping for that region (or possibly just to decrease the size of that region). "Reasonably narrowly" means that the condition used to identify the problem includes only a small number of cases that do not have the problem. (It must include all cases that do have the problem.) This solution eliminates the problem, but results in a less-optimized program than that produced by the original Cedar compiler. Since the cross-jumping optimization saves only program space, the resulting program will be larger but will execute at the same speed (ignoring secondary size effects, such as paging). Insert a null instruction. Again, if the problem can be identified during the application of cross-jumping, the compiler could insert an extra null instruction. This extra instruction can hold some necessary source information, provide a place that corresponds to a user-visible location, or distinguish one execution path from another. This change eliminates the problem, but the resulting program is slightly larger (one more instruction) and slower than the fully optimized version. However, null instructions are small and fast on most machines. The benefit of this solution over the preceding one depends upon the size of the merging regions. The tradeoff of decreasing the program's speed slightly may be acceptable for fairly large regions; it is definitely not appropriate when the size of the region equals the size of the null instruction. Do nothing. This technique is allowable only when the problem causes truthful behavior. In addition, the problem should either arise infrequently or be expensive to fix. It does not solve the problem. 6.3.1 Graph notation and terminology In a code stream, an instruction x is the immediate static predecessor of an instruction y if x is the instruction positioned before y in the stream. An instruction x is an immediate dynamic predecessor of an instruction y if x can execute immediately preceding y. That is, either x is a jump to y, or x is the immediate static predecessor of y and not an unconditional jump to some other instruction. An instruction has exactly one immediate static predecessor; it may have many immediate dynamic predecessors. Static and dynamic successors are defined similarly. A code sequence is one or more statically adjacent instructions. A sentry of a code sequence is any instruction outside the sequence that is an immediate dynamic predecessor of an instruction inside the sequence. An entrance of a code sequence is any instruction inside the sequence that is an immediate dynamic successor of a sentry of the sequence. That is, a sentry of a code sequence is the last instruction executed outside the sequence on some control-flow path, and an entrance is the first instruction executed inside the sequence on that path. An extended basic block is a maximal code sequence that has the following property: there may be multiple exit points, but there is a single entry point at the beginning of the block; there is no internal non-sequential control flow. If two instructions x and z are in the same extended basic block and z is a static successor of x, then if z executes, x must have executed in the same iteration through the block. An extended basic block is more general than an ordinary basic block because of the possibility of multiple exits. 6.3.2 Simple cross-jumping and path determination algorithm To ensure that readers have the basic idea firmly in mind before more complications are added, this section presents the simplified cross-jumping and path determination algorithm whose effects were described in Chapter 5. This version of the algorithm does not correctly handle the application of cross-jumping to a previously cross-jumped region. 6.3.2.1 Discussion of the algorithm Ignoring the collection of path determination information for a moment, the Navigator compiler's version of the cross-jumping algorithm examines the abstract code stream sequentially. When it finds a label, each immediate dynamic predecessor of the label defines the end of a path to that label. The algorithm compares every pair of joining paths for identical tail code sequences. It finds identical code sequences by searching the two paths in reverse order until it encounters unequal instructions, an unconditional jump, a backward conditional jump, or a label. The first of these conditions is clearly necessary for any form of cross-jumping: only equal instructions can be merged. The other three conditions are necessary to ensure that the path determination algorithm can reconstruct the original control flow from the optimized program and the path determination information. Terms are needed to identify key items of the merging sequences: A path is a code sequence that ends in an immediate dynamic predecessor of the joining label. A path is called a fall-through path if it precedes the label both statically and dynamically; otherwise it is called a jump-in path. A jump-in path ends in a joining jump. If the label's immediate static predecessor is an unconditional jump to some other label, there is no fall-through path. The link cell of a path is the label cell if the path is a fall-through path; otherwise it is a jump cell whose jump destination is the label. All paths whose link cells are conditional jumps are removed from consideration because the static successor cells must execute when the condition is false. The bottom cell of a path is the immediate static predecessor of the link cell. Continuing with the cross-jumping algorithm, when the compiler finds identical code sequences, it designates one code sequence as the to-delete sequence, and the other as the to-remain sequence. To avoid introducing extra jumps, if one of the paths is the fall-through path, it is selected as the to-remain path. It replaces the to-delete sequence by a crossing jump from the beginning of the to-delete sequence to a crossing label at the beginning of the to-remain sequence. This effectively merges the identical parts of the two sequences. Each of the identical code sequences currently under consideration is called a merging sequence. Given one of the sequences, the other sequence that participates in the merge is called its opposing sequence. The main features of the path determination algorithm are the following: 9 Whenever two instructions are merged, the algorithm retains source information for both original copies of an instruction in the remaining instruction. 9 Each of the two separate source alternatives for a merged instruction is marked with its sequence's path determiner identifier. 9 Every sentry of each sequence is marked as a path determining instruction for that sequence. fig65c.press leftMargin: 1.8125 in, topMargin: 0.84375 in, width: 5.5 in, height: 8.5 in Figure 6-5. Ordering of path comparisons for cross-jumping. fig65b.press leftMargin: 1.75 in, topMargin: 0.34375 in, width: 5.8125 in, height: 8.5 in Figure 6-6. Anatomy of a merge: paths 1 and 2 of Figure 6-6. When more than two paths join at a label, the order of path comparisons is important. The algorithm compares paths in the order that the jumps to the joining label were generated (which is essentially arbitrary), rather than based upon any properties of the merging sequences (which are as yet unknown). As long as the fall-through path is never a to-delete path, the algorithm produces the same object code regardless of the choice of ordering. However, a poor choice of ordering can require more iterations of cross-jumping to completely merge a set of paths that join, because embedded labels are excluded from a merging sequence. A poor choice of ordering has worse consequences for the path determination algorithm; Section 7.2 considers this issue in detail. Figure 6-5 shows five execution paths that join at the label L and describes one possible ordering of the path comparisons. It also describes what matching sequences are found and what path determiners are assigned to each sequence. Figure 6-6 labels the merge of paths 1 and 2 from Figure 6-5 with the terms defined above. Data structures for cross-jumping and path determination The essential properties of the compiler's abstract code stream for the application of the cross-jumping and path determination algorithms described in this chapter are: 9 The algorithms process only instruction and label cells in the code stream, ignoring any other items in the code stream. 9 Each instruction cell has source correspondence information embedded in it. 9 The dynamic connectivity of the code stream is explicit. That is, labels appear at all destinations of non-sequential control-flow, and labels and jumps are linked together in both directions so that dynamic predecessors and successors can be found easily. Instruction y can execute after instruction x if and only if 9 y is the immediate static successor of x (and x is not an unconditional jump) or 9 y is the immediate static successor of a label L, and x is a jump to L. 9 There are never two label cells in a row. This condition allows jump destinations to be compared simply; it must be enforced by the (centralized) routines that insert labels or delete cells. These routines may need to alter the name of a jump's destination. fig61a.press leftMargin: 1.5 in, topMargin: .375 in, width: 5.1825 in, height: 8.5 in Figure 6-7. Data structures of the abstract code stream cells. We now describe the modifications necessary to the Cedar compiler's data structures to record the path determination information. Figure 6-7 shows the format of the three types of code stream cells. The fields nextCell and prevCell link the code stream statically, while the fields jumpDest, nextJump, and firstJump link the code stream dynamically. The shaded boxes indicate the fields that are added for Navigator. The boxes with the dark borders indicate the fields that are needed only for repeated cross-jumping; these fields are described with Algorithm NCJ2, in Section 6.3.4. An instruction cell's source information is expanded from a single source descriptor (with fields stmtNum and isStart) to a list of source descriptors (via a sourceList field). The isStart field is true if and only if this instruction represents the start of execution of the statement. Each source descriptor is expanded to contain a new field pdToCheck. If the value of a source descriptor's pdToCheck field is p, we say that the instruction, or more specifically the source descriptor, uses determiner p. We may also say that the source descriptor has a determiner use for p. The pdToCheck field ends up as a path determiner use in a source descriptor in the object table. A path determiner use looks backward along some execution path to a dynamically preceding path determiner assignment (see below). The debugger utilizes the determiner use in two ways: first, to tell which path determiners to activate when a primary breakpoint is set at that instruction, and second, to distinguish between the different statement numbers in the source descriptors when that primary breakpoint is reached. Each instruction cell is also expanded to contain new sentry information (via a pdToSetList field). Each sentry descriptor has a single field pdToSet. If the value of a sentry descriptor's pdToSet field is p, we say that the instruction, or more specifically the sentry descriptor, assigns determiner p. We may also say that the sentry descriptor has a determiner assignment for p. A nonempty pdToSetList indicates that this is a sentry instruction for some merged region(s). Therefore the debugger should collect dynamic control-flow information at this instruction, via a determining breakpoint, whenever the user places a breakpoint in an associated merged region (the association is shown by the information in the object table). The determining breakpoint will collect information for each path determiner that has a determiner assignment in the pdToSetList. Each pdToSet field (determiner assignment) becomes an entry in the path determiner table. This entry contains the path determiner identifier mentioned in the pdToSet field and the object location of this sentry instruction. The debugger uses the object location of the sentry in two ways: to place determining breakpoints when a path determiner is activated, and to ascertain which determination cells hold the timestamp value of a path determiner when a primary breakpoint is being processsed. Implications of the cross-jumping algorithm's termination conditions Notice that the choice of conditions on which to stop the code sequence comparison implies that all of the instructions in a given sequence are in the same extended basic block. The path determination algorithms described in this chapter rely heavily on this restriction. It is a slightly unusual restriction: the Bliss-11 cross-jumping algorithm allowed both unconditional jumps and labels embedded in a merging sequence [Wulf+75]. Chapter 7 discusses the additional path determination processing necessary to relax this restriction. In fact, the termination condition is a bit stronger than leaving the extended basic block, because unconditional jumps are not permitted inside either merging sequence (except, of course, for the joining jump at the end of the sequence). Unconditional jumps are not permitted for two reasons. First, if there were an unconditional jump inside a merging sequence, the statically following instruction(s) in the sequence would be unreachable, because labels are not permitted inside a merging sequence and all non-sequential control flow is assumed to be marked explicitly by labels. The unreachable instructions can be removed by another object code optimization. Second, if two unconditional jumps match and therefore could be included in the merging sequence, then they jump to the same label and can be cross-jumped directly later. Note that this situation is not true of embedded conditional jumps, because both branch destinations must be identical for a match to occur: a conditional jump as the joining jump of one merging sequence does not match an identical instruction as the joining jump of the opposite sequence, while a conditional jump embedded in a merging sequence does match an identical conditional jump in the opposite sequence. To avoid undeterminable control flow, backward conditional jumps are not permitted inside merging sequences. [This discussion of conditional jumps assumes that they have only two immediate dynamic successors. Multi-way conditional branches are a bit different.] We now show the details of the simple cross-jumping and path determination algorithm that was sketched in Section 5.7.2. The following high-level presentation of the algorithm, shown as Algorithm NCJ1, is intended to be read. The subsequent material in this chapter discusses its implications and fine points. Steps that were added to the original Cedar cross-jumping algorithm to collect the information for path determination appear in italics. 6.3.2.2 Algorithm NCJ1: Simple cross-jumping and path determination algorithm Define a cell match to occur between two instruction cells whenever both cells 1) have the same opcode and operand fields, 2) are not unconditional jumps, and 3) are not backward conditional jumps. Two forward conditional jumps are equal if they have the same condition, the same operand(s), and the same destination label. A label cell does not match any other cell. 1. Step forward linearly through the code stream of the procedure. 2. Whenever a label cell is encountered: 2A. Find the bottom cell in each unconditional path to the label. 2B. For every pair of paths to the label, neither of which was previously a to-delete path: 2B1. If the two bottom cells match, BEGIN Cross-jump 2B1a. Designate one path as the to-remain path, and designate the other path as the to-delete path. To avoid introducing extra jumps, if one of the paths is the fall-through path, it is selected as the to-remain path. 2B1b. Assign a new path determiner identifier to each of the two paths. 2B1c. Delete the joining jump (i.e., the link cell on the to-delete path). 2B1c1. Keep the joining jump's source information (see Section 6.3.2.3, below). 2B1d. On each path, set the current cell pointer to the bottom cell. 2B1e. As long as the current cells on the two paths match (note that all cells in each merging sequence are in the same extended basic block): BEGIN Merge one instruction 2B1e1. [Keep source information.] On each path, set the current cell's pdToCheck field (in the single source descriptor for the cell) to the path determiner of that path. Add the to-delete cell's source descriptor to the to-remain cell's sourceList, making two source descriptors for the to-remain cell. 2B1e2. Delete the current cell on the to-delete path from the code stream. 2B1e3. On each path, move the current cell pointer backward in the code stream to its immediate static predecessor cell. END Merge one instruction 2B1f. Mark the sentry (or sentries) for the to-remain path as follows: 2B1f1. Find all immediate dynamic predecessors of the top merged cell. (Note: the top merged cell is on the to-remain path; it has not yet been connected to the to-delete path.) 2B1f2. Add a determiner assignment for the to-remain path determiner to the pdToSetList of each such cell. 2B1g. Make the top merged cell jump-addressable by inserting a label cell before it if there isn't already one there. Call the label cell the crossing label. 2B1h. Insert the crossing jump (an unconditional jump cell to the crossing label) after the current cell on the to-delete path. 2B1h1. Create source information for the crossing jump: If its immediate static predecessor is not an unconditional jump, copy its source information. Otherwise, copy the source information from the to-delete path of the top merged cell. Do not mark it as the start of a source statement. 2B1i. Mark the sentry for the to-delete path as follows: 2B1i1. Add a determiner assignment for the to-delete path determiner to the crossing jump's pdToSetList. END Cross-jump 6.3.2.3 Step 2B1d1: Keeping the joining jump's source information Because the joining jump (i.e., the link cell on the to-delete path) may not have a corresponding instruction on the to-remain path, keeping its source information can be difficult. This section describes various possible solutions and their tradeoffs: the discussion illustrates the kind of problem-solving methods that are applicable. Also, another environment or implementor might dictate a different solution than the one chosen here. If the to-remain path is a jump-in path, there is no problem: the joining jump has a corresponding instruction on the to-remain path (the link cell), and thus these two instructions can be treated like any other instructions in the merging sequences. The algorithm adds the appropriate determiner use to each instruction's source descriptor, and then adds the joining jump's source descriptor to the source descriptor from the to-remain path's link cell (as in step 2B1e1). If the to-remain path is a fall-through path, then there will be no remaining instruction cell corresponding to the joining jump. But if the joining jump does not represent the beginning of a source statement, the user cannot request a breakpoint there anyway. In this case, the joining jump can safely be deleted without keeping its source information. However, if the to-remain path is a fall-through path and the joining jump represents the beginning of a source statement, then the user might request a breakpoint on the joining jump. The goal is to have such a breakpoint appear in the expected place in the program's execution. There are several possible solutions: 1. Move the joining jump's source information backward. For breakpoint purposes only (i.e., in the source table only), the source information for the joining jump could be moved backward to mark the dynamically preceding instruction, which is the bottom instruction of the to-delete sequence. [After merging, the bottom instruction would thus have three source descriptorstwo that refer to the to-delete path determiner identifier, and one that refers to the to-remain path determiner identifier. One of these would be specially marked so that it would not appear in the object tablethat is, it would never be used to report the current execution loaction.] There are two problems with this solution: 1) since a primary breakpoint is visible to the user, the user will expect any effects of the bottom instruction to have taken place already when the breakpoint is reached, and 2) the bottom instruction might be a conditional jump, in which case a breakpoint there would give control to the user more often than it should. If the debugger and machine architecture support after-instruction breakpoints and provide an indication of the direction of a just-executed branch (for instance, in condition codes), these problems could be overcome. Otherwise, the debugger could interpret the preceding instruction and give control to the user afterwards. In the case of a conditional jump, the debugger could honor the user's breakpoint only when the conditional jump does not branch. 2. Swap the to-remain and to-delete paths. The algorithm could swap the to-remain and to-delete paths in this case, so that the joining jump won't be deleted. If the original to-remain path is path 1 and the original to-delete path is path 2, then swapping the two paths will cause execution path 1 to contain two more instructions than it would otherwise have: the crossing jump and the joining jump. Execution path 2 would be unchanged. Statically, the object code would contain one more instruction (the joining jump, which would otherwise have been deleted). Note that the cross-jumping algorithm normally preserves the number of instructions executed along any execution path, so this change actually slows the program down slightly. On pipelined machines, the extra jumps can cause extra delays while the pipe is flushed. Moreover, when the merging sequences are only one instruction long, this change does not save space. In fact, in this case the optimized program could be larger than the unoptimized program on a machine with variable-sized instructions (if the joining jump is longer than the instruction in the merging sequences). Therefore, this solution is particularly unsuitable when the merging sequences take less storage than the joining jump; it would be better to avoid cross-jumping such regions. 3. Insert a null instruction. The algorithm could replace the joining jump with a null instruction and insert a corresponding null instruction in the to-remain sequence. The to-delete null instruction would get the source information of the joining jump, and the to-remain null instruction would get the source information of its immediate dynamic predecessor. The path determination algorithm would treat the source information as before (see steps 2B1f1 and 2B1f2). Statically, the object code would contain one more instruction than the fully optimized version (the null instruction). Both execution paths would contain one more instruction than they would otherwise have: the null instruction. Again, this change slows the program down slightly. Since null instructions are usually small, fast, and do not cause pipe turbulence, this solution is probably preferable to the preceding solution. 4. Logically extend the to-remain sequence. The algorithm could logically extend the to-remain sequence to include one more instruction cell: the cell following the joining label. The joining jump's source information would be added to the additional cell's source information (as in step 2B1f2), adding determiner uses (as in step 2B1f1). Unfortunately, this can cause problems. The label would then become an embedded label. As long as control reaches the additional instruction via either the to-delete path or the to-remain path, all is well. But if there is a third immediate dynamic predecessor of the label, and control comes through the to-delete path and then through the third path, the debugger will believe that the joining jump's source statement is executing, although it is not! To try to repair this flaw, the algorithm could add to-remain determiner assignments to all other sentries of the joining label. This would greatly complicate the merged region; it could also cause incorrect behavior if either merging sequence contained a (forward) conditional jump to the joining label. As another repair attempt, the algorithm could add determiner assignments for a third determiner to all other sentries of the joining label. Then the instruction following the label would need three source descriptors: its original source descriptor with a to-remain determiner use, its original source descriptor with a third determiner use, and the source descriptor of the joining jump with a to-delete determiner use. This is also quite complicated, and it destroys the simple property that every merged region has exactly two associated path determiners. In my judgment, this case is not worth the additional complexity that the possibility of three determiners could create throughout the optimization phase. 5. Cross-jump an auxiliary region. Cross-jumping could be invoked on the single instruction following the joining label (which would be considered to match the joining jump). That instruction would comprise the entire new merged region and would get two determiner uses for new determiners. 6. Inhibit cross-jumping slightly. Of course, cross-jumping could be inhibited altogether for such regions. Note that reducing the size of the region cannot help this case, because the problem occurs on the bottommost instruction of the merge. The slightly inhibited optimized program would be smaller than the unoptimized program, assuming that other regions can be cross-jumped, but larger than the fully optimized program. Execution speed would be unchanged. As long as explicit jumps (or statements that can be optimized to become explicit jumps) in the source program are rare, the space penalty would be small. The cost also depends on the length of a particular merging region: if a large region ends in an explicit jump, the space penalty of inhibiting the merge is greater. The Navigator compiler uses solution 3: when necessary, it replaces the joining jump by a null instruction. Because the Cedar environment does not support after-instruction breakpoints, solution 1 is not suitable. Solutions 2 and 4 have drawbacks, and solution 5 would have been somewhat harder to implement for an uncertain return. Complex heuristics embodying a combination of these considerations could also be designed. 6.3.2.4 Fine points of the debugger algorithms {brief summary of debugger actions? & maybe move? references ambiguity & mult timestamps} A single sentry instruction can assign multiple determiners, and multiple sentry instructions can assign a single determiner. The resulting determining breakpoints and determination cells can be organized either by object code location or by path determiner. Navigator organizes them by object code location: a single determining breakpoint is placed on any path determining instruction. A reference count indicates how many determiners the determining breakpoint is currently collecting information for; the reference count is used to ascertain when the determining breakpoint can be removed (which is when all of the primary breakpoints that use its history information are removed). To find a path determiner's current timestamp, the debugger searches the path determiner table to find all object locations corresponding to that determiner. It then finds the determination cell for each such object location (for the current procedure invocationexplained below in Section 6.4.3). The timestamp for that determiner is the latest of the timestamps recorded in these determination cells. An organization by determiner would make determining breakpoints slower and primary breakpoints faster. Each determiner would have one determination cell (per procedure invocation). When a single instruction is a sentry for more than one merging sequence (that is, if a single instruction assigns multiple determiners), this new organization could use multiple distinct determining breakpoints, or it could use a single determining breakpoint that can set multiple determination cells. In this organization, care must be taken that two logically distinct determining breakpoints at the same location receive the same timestamp, because that is hiw the debugger detects ambiguity. 6.3.3 Fundamental difficulties with the path determination algorithm Two fundamental difficulties with the path determination algorithm arise at compile-time. Some source programs are essentially ambiguous with respect to path determination: two entire joining paths generate the same object code, causing the two opposing determiner assignments to collide at some sentry of the merged region. If no special compensation is made, the debugger can give only truthful behavior, rather than expected behavior. In other cases, a sentry of a merged region can be outside the procedure containing the merged region; that is, an instruction in another procedure branches directly to a label at the top of a merged region. This second situation arises as a result of Cedar's exception-handling mechanism. It is crucial to do something about this second compile-time problemotherwise the debugger will give incorrect behavior for some computations. Difficulties also arise at runtime. Two paths that contain a directly or indirectly recursive procedure call can be cross-jumped; also, multiple processes can execute a cross-jumped procedure. Of course, in order to record the flow of control through each concurrently active invocation of the procedure, there must be a separate timestamp for each such invocation. A second problem concerns the debugger's runtime storage efficiency: where and when should determination cells be allocated (and de-allocated)? 6.3.3.1 Compile-time problem 1: essential ambiguity (path determiner collision) If an entire path generates the same object code as some other joining path, the same instruction will assign both a path determiner and its opposing path determiner. Figure 6-8 illustrates this situation. Such programs may not occur frequently in practice, but if they occur, the path determination algorithm can at least provide truthful behavior by correctly identifying the source alternatives. Actually, repeated optimizations must occur for this situation to arise, because the cross-jumping algorithm always replaces the extra path by a jump instruction, which can then be used for disambiguation. See Section 6.5.2.5 for a description of the steps needed to create this problem. source program fragment after cross-jumping 1 b _ 5; b _ 5; pd1, pd2 2 IF cond THEN 3 a _ 1 ELSE 4 a _ 1; a _ 1; source: 31, 42 Figure 6-8. Program fragment with undeterminable paths. In fact, the ambiguity may not be apparent in the source program. The two constants "1" in Figure 6-8 might have been expressed as different constant identifiers or elements of enumerated types that happened to have the same underlying numeric representation. Or statement 3 might have been a call to an inline procedure that happened to expand to "a_1". There are three alternatives to handle essential ambiguity: 1. do nothing special, yielding only truthful behavior for such regions, 2. stop the merge one instruction early, yielding a slightly larger program (than the fully optimized program), or 3. insert a null instruction on one of the two paths to act as a placeholder for the path determiner, yielding a program that is slightly slower than the original unoptimized program. Alternatives 2 and 3 preserve expected behavior. Special processing of ambiguous regions would be triggered by the lack of distinct instructions on which to put the opposing path determiner assignments. These alternatives are possible because the path determination algorithm is part of the cross-jumping optimization rather than a later step in the compilation. Different code generation templates can also lead to ambiguity. If a SELECT statement is implemented as an indexed jump into a table of jump destinations, path determiner assignments for two identical cases will collide at the indexed jump (recall that a path determiner assignment, because it is implemented as a determining breakpoint, must be an executable instruction). But the same source code will not cause path determiner collision if the compiler implements the SELECT statement as an indexed jump into a table of jump instructions or as a series of tests and jumps. Figure 6-9 illustrates this distinction. Of course, a special runtime mechanism could be designed to monitor indexed jumps (and also conditional jumps) so that different determiners could be assigned on each path away from the jump. The debugger could achieve this result reasonably easily by interpreting complex jumps that are path determining instructions. fig66a.press leftMargin: 2 in, topMargin: 1.5 in, width: 5.3125 in, height: 3.9375 in Figure 6-9. Two different code generation templates for SELECT statements. 6.3.3.2 Compile-time problem 2: externally-accessible labels The Cedar language allows a statement to signal an exception, which will be caught by a dynamically-scoped catch phrase. The catch phrase may be in a different procedure or a different module than the signaller. In effect, this mechanism allows statements in one procedure to branch indirectly to certain labels inside other procedures. These branches pass through the runtime exception handler, which searches backward through the procedure call chain for blocks or procedure calls that have enabled a catch phrase for that exception. Internal labels that can be accessed from outside their procedure are not unusual in Algol-like languages. For example, Pascal allows a nested procedure to contain a branch to a label in a containing procedure [Jensen+74]. The SAIL language allows any label to be declared INTERNAL [Reiser76], allowing it to be accessed from externally compiled programs. Externally-accessible labels cause problems for cross-jumping. If an externally-accessible label's instruction is merged (and hence the labelled instruction is an entrance of a merged region), the proper operation of the path determination algorithm requires that a path determiner assignment be placed at each immediate dynamic predecessor of the label. But in this case, the predecessor instructions are outside the current procedure and may even be outside the current module. [A similar problem might arise if the first instruction in a procedure were inside a merged region. This cannot occur in many implementations, including Cedar, because procedure prologue instructions are unique.] The path determination algorithm cannot place path determiner assignments at these instructions for three reasons. First, the algorithm deals only with the code stream for a single procedureinstructions in other procedures are not available. Second, deciding which instructions might be immediate dynamic predecessors can be difficult. For example, any SIGNALX instruction that can execute in the dynamic scope of the CATCHX is potentially an immediate dynamic predecessor. Third, when a SIGNALX is in another module, its association with the CATCHX may not be known until runtime. There are two solutions to this difficulty. The first solution provides only truthful behavior for cross-jumped regions that contain catch phrases. The second solution preserves expected behavior, but increases code size and execution time slightly. Truthful behavior: permanently-true determiners. In this solution, whenever an externally-accessible label is an entrance of a merging sequence, the cross-jumping and path determination algorithm places the path determiner identifier associated with that sequence in the path determiner table immediately (rather than placing a determiner assignment on the sentry for that sequence and waiting for the output routine to create the appropriate path determiner table entry). This determiner is marked as permanently true. [One way to implement this marking is to replace the object code location of this path determiner table entry by a special value signifying TRUE.] The debugger handles permanently-true determiners in the following way: whenever execution is halted in a merged region, it always reports source alternatives associated with permanently-true determiners (that is, with catch phrases) as possibilities, as well as the source alternative identified by the normal runtime path determiner mechanism. This solution maintains minimal code size and execution time, but forces the debugger to check for permanently-true determiners at every object-to-source translation. Expected behavior: null instructions at externally-accessible entrances. In this solution, whenever an externally-accessible label represents an entrance to a merging sequence, the cross-jumping and path determination algorithm inserts a new (null) instruction and a new label between the externally-accessible label and the top instruction in the corresponding merging sequence. After the insertion, the new label labels the top instruction in the merging sequence, and the externally-accessible label labels the new (null) instruction. Therefore the sentry instruction for the merging sequence is now the new instruction rather than an external instruction. [Note: it is crucial that later optimizations not remove this added null instruction; see Section 6.5.2.] This solution adds an extra instruction to the object code, but leaves the runtime path determination algorithm unchanged. Remember that the cross-jumping algorithm saves only program space, so that this extra instruction will make the optimized program slightly slower. However, in this case, the extra instruction will execute only when control arrives through the external path. Program exceptions are assumed to happen relatively infrequently, so the speed penalty would be incurred infrequently. Since program exceptions involve a fair amount of processing, one extra instruction would be a small percentage of the total expense. 6.3.3.3 Runtime problem 1: multiple processes and/or recursion Two paths that contain a directly or indirectly recursive procedure call can be cross-jumped. In order to match timestamps and execution paths properly, a single determining breakpoint must record timestamps in one of several determination cells, each containing a timestamp for one invocation of the procedure. This solution also takes care of multiple processes executing a cross-jumped procedure. The mechanism for associating determination cells and procedure invocations is described in the following paragraphs. 6.3.3.4 Runtime problem 2: allocation and de-allocation of determination cells The choice of where to allocate determination cells affects the efficiency of the debugger. If the compiler were to allocate any necessary determination cells in the procedure's local frame, the association between a determination cell and a particular procedure invocation would be implicit. However, this strategy would be quite expensive, as runtime space would be consumed even for procedures in which no breakpoints had been set. Furthermore, recursive procedures or multiple processes would consume multiple copies of this unused space. It is likely that this extra data space would more than offset the space saved by the cross-jumping optimization. Instead, when a breakpoint is placed in a merged region, the debugger allocates determination cells in a debugger-managed history pool. The timestamp value in each determination cell is associated with a single procedure invocation by having a pointer to the activation frame for that procedure invocation. Similarly, the choice of when to allocate and de-allocate determination cells also affects the efficiency of the debugger. The Navigator debugger allocates a determination cell at the determining breakpoint if there is not already a cell for the current procedure invocation (i.e., with a pointer to the current activation frame). All determination cells for a determining breakpoint are de-allocated when the determining breakpoint is removed. In addition, the Navigator debugger garbage-collects determination cells corresponding to de-activated frames at a determining breakpoint if there is not sufficient space in its history pool. Unfortunately, the checks for allocation and de-allocation add complexity to the determining breakpoint processing. A fine point: the Cedar runtime environment allocates activation frames on a heap, from a pool of frames of the proper size. Therefore, a procedure could: be called (allocating a frame), set some determination cells associated with that frame, return, be called again, get the same activation frame, and have old determination cells with the proper activation frame pointer lying around. The timestamps for the new invocation will always be later than those for the old invocation, so this scenario can cause trouble only if the region is entered via a permanently-true determiner (see Section 6.4.3, above) in the new invocation. In that case, the non-catch phrase source alternative would be the extra one, rather than vice versa (a small problem for advanced user documentation). It would be better all around if each procedure invocation had a truly unique identifier that could be used instead of the activation frame pointer. If procedures are long or contain loops (so that procedure entry and exit happen much less frequently than determining breakpoints), it would be profitable to allocate determination cells corresponding to activated determiners at procedure entry and de-allocate them at procedure exit. This could be done as part of the procedure entry and exit instructions (the checking for which would slow down all procedure calls), or by means of extra invisible breakpoints at procedure entry and exit, activated by breakpoint requests as before. For safety, such a scheme must allocate and de-allocate properly in the presence of exception-raising (exits from the middle of a procedure) and exception-handling (externally-accessible labels, or entries to the middle of a procedure). All of this has one major ramification for timestamp-checking: an activated determining breakpoint may not have a corresponding determination cell when the primary breakpoint is reached. The timestamp value for that determining breakpoint is considered to be time zero. 6.3.4 Complete cross-jumping and path determination algorithm (repeated merging) If cross-jumped regions were never considered for further application of the cross-jumping algorithm, the path determination procedure described in the previous section would be adequate. However, not only can a single pass through the generated object code create multiply-merged regions, but the compiler applies the cross-jumping algorithm and other object code optimizations to the code stream repeatedly, until no changes occur. To keep track of the effects of these iterated optimizations, some additional static and dynamic information is needed. 6.3.4.1 Summary of the complete cross-jumping and path determination algorithm The problems arise when a candidate sequence for a new merge (either the to-delete sequence or the to-remain sequence) contains either a portion of a previously merged region or a sentry of a previously merged region. Given a path determiner d, we refer to the determiner of its opposing sequence as d. If a candidate sequence contains a portion of a previously merged region, the candidate sequence contains at least one instruction cell that already has more than one source descriptor. Each source descriptor has a determiner use for a previously merged sequence's path determiner in its pdToCheck field. The complete algorithm records the instruction cell's use of the new path determiner (for the candidate sequence) by allowing a single source descriptor to have multiple pdToCheck fields. That is, each source descriptor from the candidate sequence uses the candidate path determiner in addition to its previous path determiners. As before, the compiler merges the candidate sequence sourcelist with the sourcelist from the corresponding instruction cell in the opposing candidate sequence. Figure 6-10 shows an instance of repeated merging. In the flowgraph form of the source fragment, the instructions are shortened to their first identifier. The numbers at the lower right of fig67.press leftMargin: 1.875 in, topMargin: 1 in, width: 5.375 in, height: 8.5 in Figure 6-10. Simple repeated cross-jumping. an instruction letter refer to its source information: path determiner uses appear here as subscripts. An instruction's path determiner assignments appear at its upper left. For example, after the second merge, source descriptor 15 uses path determiners 1 and 4. The debugger interprets this information as follows: path determiners 1 and 4 must both have later timestamps than their opposing determiners (that is, path determiners 2 and 3) in order for statement 15 to be executing. All possible execution path traces for the final merged form, along with the path determiner assignments and source alternatives that they imply, are shown at the bottom of the figure. Now consider the case in which a candidate sequence contains a sentry of a previously merged region; that is, the candidate sequence contains an instruction cell with a non-empty pdToSet list. It is tempting to suppose that the compiler could remove these path determiner assignments from their current location and place a copy of them at each sentry of the candidate sequence. By this action, the compiler would represent the fact that the previously merged code sequences actually merge earlier than was known before. Unfortunately, there are programs for which this strategy fails to provide full path determination. These programs are characterized by multiple control-flow paths that are wholly contained in a merged region. Figures 6-11A and 6-11B show the correct repeated cross-jumping of such a program. To see how the proposed handling of embedded sentries would fail, suppose that the fourth cross-jumping step (in Figure 6-11B) were to move determiner assignments 1, 2, and 5 up to the location of determiner assignment 7, and similarly move determiner assignments 3, 4, and 6 up to the location of determiner assignment 8. It would then be impossible for the debugger to distinguish between the execution of statements 25 and 27, or between statements 34 and 36. Instead, the instruction's sentry information is made to depend upon the candidate path determiner, in much the same way as a source descriptor is made to depend upon the path determiner for the sequence that contains it. To record the fact that a path determiner assignment depends on the value of the new candidate determiner, each of the instruction's sentry descriptors is given an indirect determiner use for the candidate path determiner. As a result, the debugger will collect timestamp values for the two opposing determiners at runtime, and will check these values to determine on behalf of which of the two sentry descriptors the instruction is executing. Finally, if the candidate sequence is the to-delete sequence of this merge, the algorithm adds the updated sentry information to the current cell in the to-remain sequence. fig68a.press leftMargin: 1.9375 in, topMargin: 1 in, width: 5.625 in, height: 8.5 in Figure 6-11a. Repeated cross-jumping with embedded control-flow, part 1. fig68b.press leftMargin: 1.625 in, topMargin: 1 in, width: 5.875 in, height: 8.5 in Figure 6-11b. Repeated cross-jumping with embedded control-flow, part 2. To see that this algorithm works, suppose that an object location x has a possible path determiner assignment d3 whose pdToCheck list contains (only) d1. At runtime, the execution of x corresponds to a sentry d3 of an inner merged region if and only if the outer merged region was entered via path determiner d1 more recently than it was entered via path determiner d1. Dealing with the added complexity of repeated cross-jumping also requires some runtime modifications. The following path determiners are activated as a result of a breakpoint request: first, all path determiners that are used by any source alternative for the breakpoint's object location, and second, (recursively) for any path determiner d that is indirectly used by a determiner activated in step 1, both d and d. Timestamp checking at a primary breakpoint must also be altered slightly: a list of determiner uses of the form (d1, d2) on either a single source descriptor or a single sentry descriptor is satisfied if and only if latest timestamp (d1) > latest timestamp (d1) and latest timestamp (d2) > latest timestamp (d2). Ascertaining the latest timestamp value for a single path determiner at a primary breakpoint also reflects determiner nesting: a timestamp value in a determination cell is valid only if the latest timestamp value of the determiners that it uses is not earlier than the latest timestamp value of their opposing determiners. These timestamp-checking algorithms thus need to be able to find the opposing determiner for any determiner. A simple scheme, which works as long as cross-jumping introduces exactly two determiners for each merge, is to use an even/odd distinction. Notice that the order of consideration of joining paths for cross-jumping can affect the final placement and number of path determiners. Figure 6-12 shows the same program as Figure 6-10, but a different cross-jumping ordering has caused more determiners and more complex runtime behavior. (Figure 6-12 has indirect determiner uses, while the earlier figure did not need them.) We now show the full cross-jumping and path determination algorithm. Changes from the original Cedar algorithm are italicized; changes from Algorithm NCJ1 are italicized and underlined. fig69.press leftMargin: 1.8125 in, topMargin: 1 in, width: 5.25 in, height: 8.5 in Figure 6-12. Repeated cross-jumping of Figure 6-10 with a different cross-jumping ordering. 6.3.4.2 Algorithm NCJ2: Complete cross-jumping and path determination algorithm Data structures: An instruction cell's source information is expanded from a single source descriptor (with fields stmtNum and isStart) to a list of source descriptors (via a sourceList field). Each source descriptor is now permitted to have a list of path determiner uses rather than just one path determiner use. Each instruction cell also gets a new list of sentry descriptors (pdToSetList). Each sentry descriptor has one path determiner assignment (pdToSet) and a list of indirect path determiner use (pdToCheckList). Define a cell match to occur between two instruction cells whenever both cells 1) have the same opcode and operand fields, 2) are not unconditional jumps, and 3) are not backward conditional jumps. Two forward conditional jumps are equal if they have the same condition, the same operand(s), and the same destination label. A label cell does not match any other cell. 1. Step forward linearly through the code stream of the procedure. 2. Whenever a label cell is encountered: 2A. Find the bottom cell in each unconditional path to the label. 2B. For every pair of paths to the label, neither of which was previously a to-delete path: 2B1. If the two bottom cells match, BEGIN Cross-jump 2B1a. Designate one path as the to-remain path (to avoid introducing extra jumps, the fall-through path is always designated as the to-remain path), and designate the other path as the to-delete path. 2B1b. Assign a new path determiner identifier to each of the two paths. 2B1c. Delete the joining jump (i.e., the link cell on the to-delete path). 2B1c1. Keep the joining jump's source and sentry information: 2B1c1a. If the to-remain path is a fall-through path: 2B1c1a1. If the isStart field is TRUE for any source alternative of the joining jump, insert a null instruction immediately preceding the link cell on the to-remain path. Copy source information for the new null instruction from the bottom cell in the to-remain path. Do steps 2B1e1 and 2B1e2 with the to-delete current cell = the joining jump and the to-remain current cell = the new null instruction. 2B1c1a2. Otherwise, throw the joining jump's source information away. Unless it would create ambiguity, move the joining jump's path determiner assignments backward to its immediate static predecessor (see Section 6.4.4.3, below). 2B1c1b. If the to-remain path is a jump-in path, do steps 2B1e1 and 2B1e2 with the to-delete current cell = the joining jump and the to-remain current cell = the to-remain link cell. 2B1d. On each path, set the current cell pointer to the bottom cell. 2B1e. As long as the current cells on the two paths match (note: all cells in each merging sequence are in the same extended basic block), BEGIN Merge one instruction 2B1e1. [Keep source information.] On each path, add a determiner use for that path's determiner to each source descriptor for the current cell. Add the to-delete cell's source information to the to-remain cell's source information. There may be multiple source descriptors in each list. 2B1e2. [Keep sentry information.] On each path, add an indirect determiner use for that path's determiner to each sentry descriptor for the current cell. (Either or both cells may lack sentry information.) Add the to-delete cell's sentry information to the to-remain cell's sentry information. 2B1e3. Delete the current cell on the to-delete path from the code stream. 2B1e4. On each path, move the current cell pointer backward in the code stream to its immediate static predecessor cell. END Merge one instruction 2B1f. Mark the sentry (or sentries) for the to-remain path as follows: 2B1f1. Find all immediate dynamic predecessors of the top merged cell. (Note: the top merged cell is on the to-remain path; it has not yet been connected to the to-delete path.) 2B1f2. Add a determiner assignment for the to-remain path determiner to the pdToSetList of each such cell. Each new sentry descriptor has no indirect determiner uses. 2B1g. Make the top merged cell jump-addressable by inserting a label cell before it if there isn't already one there. Call the label cell the crossing label. 2B1h. Insert the crossing jump (an unconditional jump cell to the crossing label) after the current cell on the to-delete path. 2B1h1. Create source information for the crossing jump. If its immediate static predecessor is not an unconditional jump, copy its source information. Otherwise, copy the source information from the to-delete path of the top merged cell. Do not mark it as the start of a source statement. 2B1i. Mark the sentry for the to-delete path as follows: 2B1i1. Add a determiner assignment for the to-delete path determiner to the crossing jump's pdToSetList. The new sentry descriptor has no indirect determiner uses. END Cross-jump 6.3.4.3 Step 2B1c1a2: Keeping the joining jump's sentry information As in the case of keeping the joining jump's source information [step 2B1c1a1; see Section 6.4.2.4], keeping the joining jump's sentry information is only a problem when the joining jump has no corresponding instruction on the to-remain paththat is, when the to-remain path is a fall-through path. These two problems are similar, but not identical. The joining jump's source information must be kept only when the joining jump represents the start of execution of a source statement, so as to permit a statement breakpoint there. Furthermore, since a statement breakpoint is visible to the user, the new instruction holding the source information must execute in the proper order with respect to its surrounding instructions: that is, after the bottom instruction of the merged sequence and before the instruction following the joining label. In contrast, the joining jump's sentry information must always be kept, so as to properly distinguish among merged instructions that execute after it. Since a determining breakpoint is invisible to the user, there is a bit more flexibility in its placement, but we would like it to fall on the new sentry of the previously-merged region. [Recall that the sentry instruction for a merged region is always an immediate dynamic predecessor of the region.] If the joining jump is a sentry for a previously-merged region, the joining jump is the immediate dynamic predecessor of a previously-merged instruction. Since the joining jump is an unconditional jump, that merged instruction can only be the instruction following the joining label. So the new sentry instruction that will take the joining jump's place is the new immediate dynamic predecessor of that merged instruction after the current merge. If step 2B1c1a1 inserted a null instruction to hold the joining jump's source information, the new immediate dynamic predecessor is that null instruction when it executes on behalf of the to-delete sequencethat is, qualified by the to-delete path determiner. If no null instruction was inserted (and the source information was discarded), the new immediate dynamic predecessor is the bottom instruction of the new merge, qualified by the to-delete path determiner. Moving the joining jump's path determiner assignments backward in this way could include more execution flow paths to the previously-merged region than the joining jump did originally. Because the bottom instruction of the to-delete sequence is a matching instruction and is therefore neither a label nor an unconditional jump, that bottom instruction is guaranteed to be the single immediate dynamic predecessor of the joining jump. However, under unusual circumstances, the bottom instruction might be a conditional jump into the same previously-merged region, in which case the bottom instruction would be marked with the opposing determiner of one that marks the joining jump. Figure 6-13 illustrates this situation. In this unusual case, moving the joining jump's path determiners back would create ambiguity: when control comes through the bottom instruction, if the to-delete determiner of the new merge is true, then both the to-delete and the to-remain determiners of the previously-merged region are true. To avoid this ambiguity, solutions similar to the proposed solutions for the joining jump's source information can be used. To retain expected behavior: 1. Insert a null instruction on the to-remain path in this case also. This would make the program slightly slower, but hopefully smaller. 2. Abort the cross-jump of the new region. The resulting program would be larger than the fully optimized version, but the same speed as the unoptimized version. To fall back to truthful behavior: 3. Continue with cross-jumping, moving the joining jump's path determiners back to mark the bottom instruction in the to-delete path. fig60a.press leftMargin: 1.625 in, topMargin: .9375 in, width: 5.6875 in, height: 8.5 in Figure 6-13. Keeping the joining jump's sentry information. The Navigator compiler uses solution 1 because it matches the chosen solution for the joining jump's source information. Would these unusual circumstances ever arise? It seems very unlikely. The limited form of GOTO statements in Cedar ensures that the source version of a Cedar program can never pose this problem. But it is quite difficult to predict the effects of repeated applications of cross-jumping and other object code optimizations. Because I cannot prove that situations like this do not arise from repeated optimization, the Navigator compiler must be prepared to handle them correctly. 6.3.5 Efficiency: separate sourcelink cells Since the compiler generates several object code instructions for each source statement, the abstract code stream is actually organized a bit differently than I have previously indicated. Rather than having source information embedded in each instruction cell, this information is placed in separate sourcelink cells. The code generator produces a sourcelink cell as the first cell for each statement. Therefore every code stream cell between two sourcelink cells has the relationship to the source text that is described by the statically preceding sourcelink cell. No object code is emitted for a sourcelink cell. This organization saves space but adds processing complexity when compared with marking each instruction. The structure of a sourcelink cell mirrors the structure of the source information shown in Figure 6-1a. At the start of the object code optimization phase, all sourcelink cells have a single source descriptor in their sourcelists, with isStart true. Because the source table is concerned only with the object location for the beginning of a source statement (for setting breakpoints), retaining information for later construction of the source table is straightforward. When the compiler encounters a sourcelink cell in the to-delete code sequence during the backward scan, it need only move the sourcelink cell to the current position in the to-remain sequence (sourcelists of resulting adjacent sourcelink cells are merged). However, retaining information for the object table (for reporting the current execution location) is more complicated, for two reasons. First, source statements need not change at the same point on two merging paths. For example, if f1 and f2 are integer fields in a record rec, and a and b are integers, the two statements a _ rec.f1; b _ rec.f2 generate the same code as the single statement [f1:a, f2:b] _ rec. Second, cross-jumping may merge only a portion of a statement. A typical example of this case is merging a path that ends with c _ 1 with a path that ends with c _ 2. If bounds-checking is enabled, only the check and the store to c are merged. If the check fails, either statement could be executing. The compiler retains sufficient information in the sourcelink cells to create both the source table and the object table. The following cases specify the bookkeeping necessary when the backward scan encounters a sourcelink cell in either code sequence. 1. If a sourcelink cell is encountered in the to-remain code sequence, the to-delete code sequence is searched for the nearest sourcelink cell preceding the current point. The sourcelist from the to-delete code sequence is appended to the sourcelist from the to-remain code sequence. If the search crossed an instruction cell, the isStart field of each element of the to-delete code sequence sourcelist is set to false as it is appended. 2. If a sourcelink cell is encountered in the to-delete code sequence only, the to-remain code sequence is searched for the nearest preceding sourcelink cell. That cell is copied to the current point in the to-remain code sequence, setting isStart to false for each element of the sourcelist. The sourcelist from the to-delete code sequence is then appended to that sourcelist. To reflect the full relationship between the source text and the newly merged object code, there must be sourcelink cells that correctly describe the first piece of object code in the new merged sequence (sourcelink cell expansion), the first piece of object code following the to-remain sequence (sourcelink cell contraction), and the first piece of object code following the to-delete sequence (sourcelink cell completion). If these locations do not already have sourcelink cells, appropriate sourcelink cells must be created. Similarly, since only a few instructions in a code stream become path determining instructions, the sentry information is separated out into sentry cells, thereby reducing the size of most code stream cells by one field. In general, separate sentry cells are not much harder to handle than embedded sentry information. However, later object code optimizations must take care to treat a separate sentry cell and its associated instruction as a single unit. For expository ease, we now return to the assumption that source and sentry information are embedded in each code stream cell. 6.4 Interactions between optimizations This section describes how Navigator ensures that the debugging solution for one optimization is not destroyed by a later optimization. Inline procedure expansion occurs first. Then there is an object code optimization loop in which a few object code optimizations, followed by cross-jumping, are applied repeatedly. Finally, a few more object code optimizations are applied. 6.4.1 Cross-jumping after inline procedure expansion Because inline procedure expansion occurs earlier in the compilation process than the object code optimizations, an expanded inline procedure can be cross-jumped with statements outside that inline procedure (but inside the calling procedure). Because the algorithms were designed with these two optimizations in mind, no additional mechanisms beyond those described in Algorithms NI1 and NCJ2 are needed to handle this situation. The debugger can easily set and field breakpoints and report the current execution location. However, the possibility that a program region can be copied and then merged constrained the compile-time solution for providing a procedure traceback: each source descriptor must have a separate inline table pointer, since cross-jumping may move that statement away from the other statements that came from the same expansion. Figure 6-14 shows an example of inline procedure expansion followed by cross-jumping. An alternative solution for the inline procedure expansion optimization was considered. In this solution, the inline table information would have been embedded in the code stream: a startInline cell, stating the calling location (and called procedure name, which is constant for all source statements with that address) would mark the location of the deleted call, and an endInline cell would mark the location of the deleted return. Unfortunately, this form would have made later cross-jumping more difficult. During cross-jumping, the creation of surrounding startInline and endInline cells in the to-remain path could have been necessary if the to-delete instructions were the result of an expansion. But discovering whether a to-delete instruction came from an expansion could have required searching a large portion of the procedure's code stream. fig610.press leftMargin: 2.125 in, topMargin: 0.90625 in, width: 5.3125 in, height: 8.5 in Figure 6-14. Cross-jumping after inline procedure expansion. 6.4.2 Other object code optimizations after cross-jumping A more serious problem is that other control-flow optimizations can occur after cross-jumping. The difficulty arises when one of these later optimizations alters the code stream so that path determiner assignments no longer mark each of the ways to enter the merged region. One solution to this problem would be to prevent later optimizations from including an instruction with a path determiner assignment. A better (although ad hoc) solution is to ascertain how each optimization affects the placement of path determiner assignments and then individually inhibit unanalyzed or troublesome optimizations. The best solution is to develop a general (or at least centralized) mechanism for updating path determiner assignments when needed. The organization of the compiler's peephole optimization phase permits any of the implemented optimizations to occur after cross-jumping, possibly allowing a later optimization to interfere with the proper placement of path determiners. This section explains how correct determiner placement is ensured despite the effects of later optimizations. 6.4.2.1 Navigator peephole optimizations The peephole optimization phase improves a single procedure's object code in two stages. The first stage repeats until the opportunities for its optimizations are exhausted. First stage optimizations, in the order of their application, are: 1. Delete jumps to the next instruction, unreferenced labels, and multiple consecutive labels. 2. Redirect unconditional jumps whose destinations are also unconditional jumps to remove intermediate hops; this process is known as branch chaining. 3. Delete unreachable code. All instructions that lie between an unconditional jump and the statically following label are unreachable. 4. Replace a conditional jump around an unconditional jump by an opposite-sense conditional jump; this process is called condition reversal. 5. Perform cross-jumping. In the second stage, a collection of optimizations that affect instructions within a single basic block is applied once. In these optimizations, statically adjacent instructions are examined for opportunities to delete instructions or to combine them into a more powerful single instruction [Lamb80]. (These optimizations might profit from repetition also; only optimizations that modify or delete a single instruction without reference to surrounding instructions are guaranteed to have exhausted their opportunities after one pass.) 6.4.2.2 Why path determiners must be updated A path determiner assignment is intended to identify a given execution path at runtime, so that when execution reaches a merged region, the path that was traversed to get there can be used to distinguish among the different source alternatives. A path determiner assignment appears on a particular instruction only as a placeholder on that execution path. Path determiner assignments always mark executable instructions because a path determining instruction is replaced by a trap instruction for the determining breakpoint. The path determination portion of the cross-jumping algorithm locates all of the sentries of two sequences that are being merged; these are the newly-merged region's path determining instructions. From the definitions of "sentry" and "merged region", a path determining instruction is therefore an executable instruction that is an immediate dynamic predecessor of an instruction that executes on behalf of more than one source location. When a path determining instruction is moved or deleted, or when instructions are inserted immediately following a path determining instruction, other instructions become sentries of the merged region. Navigator identifies these new sentries by considering the local execution flow properties of the old path determining instruction, rather than by examining new dynamically following merged region(s) for unmarked sentries. 6.4.2.3 Primitive dynamic transformations The important aspect of path determiner placement is the position of the path determining instruction in every runtime execution path. The position is a dynamic position: the physical placement of instructions in the linear code stream is immaterial; what is important is which instruction executes before or after another instruction. Therefore, any transformation that alters the dynamic position affects the correct placement of the path determiner. I call these transformations dynamic transformations. The consideration of dynamic transformations captures the important features of an optimization with respect to path determiner placement more fully and less repetitiously than the effects of static transformations. For example, it makes no difference to path determiner placement whether an instruction is deleted from an execution path by means of physically removing the instruction from the code stream, or by rerouting instructions (that is, altering jump destinations) so that the execution path no longer passes through that instruction. The primitive dynamic transformations are dynamic deletion, dynamic insertion, and dynamic replacement. Note that a primitive dynamic transformation is a generalization of the corresponding primitive static transformation. Dynamic deletion occurs when an instruction is removed from a runtime execution path. Several static transformations cause dynamic deletion: deleting an instruction from the code stream, bypassing an instruction, and moving an instruction from one location in the code stream to another. An instruction x is bypassed when an execution path that passes through x is rerouted so that it no longer passes through x. Dynamic insertion occurs when an instruction is inserted in a runtime execution path. In static terms, dynamic insertion can be caused by inserting a new instruction in the code stream, by rerouting an instruction, and by moving an instruction from one location in the code stream to another. Dynamic replacement is simultaneous dynamic insertion and deletion; it is useful to consider it separately because it can be handled more easily than either insertion or deletion alone. Primitive static alterations to the code stream are: instruction insertion, instruction deletion, instruction replacement, instruction motion, and instruction rerouting. In order to discuss the dynamic positioning of instructions in the linear code stream, the following terminology is useful. A simple instruction is an instruction that has exactly one dynamic predecessor and exactly one dynamic successor. In a linear form, such as a codestream, a simple instruction's dynamic predecessor may or may not be its static predecessor. Similarly, in a linear form a simple instruction's dynamic successor may or may not be its static successor. For instance, the single dynamic successor of an unconditional jump instruction is the jump's destination instruction, which is usually not the jump's static successor. A collector instruction is an instruction that has more than one dynamic predecessor. In object code terms, a collector instruction is statically preceded by a label that is the destination of more than one jump (or one jump and a fall-through instruction). A selector instruction is an instruction that has more than one dynamic successor. In object code terms, a selector instruction is a conditional jump or an indexed jump. [Other machine architectures might define other selector instructions.] A collector-selector instruction is an instruction that is both a collector instruction and a selector instruction. 6.4.2.4 Effects of dynamic transformations on path determiner placement Dynamic deletion Suppose that there is an execution path x; y; z, where z is a merged instruction. Then y is a sentry of a merged region R that assigns path determiner d1. If a dynamic deletion removes y from that execution path (giving execution path x; z), the new sentry instruction of z along this path is x, which is y's dynamic predecessor along the original execution path. fig611.press leftMargin: 2.125 in, topMargin: 2 in, width: 5 in, height: 2.1875 in Figure 6-15. Bypassing: path determiner movement for dynamic deletion. If y is physically deleted, it is removed from all of its execution paths; in this case there may be multiple new sentries, corresponding to all of the dynamic predecessors of y. To update the sentry relationships when y is deleted from an execution path, each such new sentry instruction gets an additional determiner assignment for d1. Note that the old determiner assignments are not removed. fig612.press leftMargin: 2.125 in, topMargin: 1.9375 in, width: 5.0625 in, height: 2.125 in Figure 6-16. Path determiner movement for physical code deletion. Case 1. If no new sentry is a selector instruction, the new sentries represent exactly the same execution paths as y does (that is, some new sentry executes if and only if y does). In this case the deletion does not alter the sentry information. It is possible that some new sentry si is already assigns a different determiner pd2. This situation can arise because another merged region R( directly precedes R, as shown in Figure 6-17, or because three or more paths join to form R, as illustrated in Figure 6-18. Even though si assigns both d1 and d2 after the deletion of y, these two determiners do not conflict and path determination still works properly. Case 2. If some new sentry sk of x is a selector instruction, the new sentries represent different execution paths than y does, because execution can pass from sk to some instruction other than y. Normally this situation does not create difficulties, but if two further conditions hold, deleting y from that execution path can introduce ambiguity: (1) sk is already a sentry of R that assigns a different determiner d3, and (2) some instruction in R has one source descriptor that uses d1 (and possibly some other group D of path determiners), and a different source descriptor that uses d3 (and D). That is, d1 and d3 are the only differences between the determiner uses of two source descriptors for some instruction in R. In this case, deleting y will make it impossible to distinguish between those two source alternatives whenever R is entered via sk. If the system designers are willing to sacrifice a small amount of optimization for better information, y should not be deleted when these conditions hold. If an instruction that is to be deleted has multiple determiner assignments, all of its determiner assignments are added to the sentry information of each new sentry. fig614a.press leftMargin: 2.5 in, topMargin: .5 in, width: 3.5625 in, height: 8.5 in Figure 6-17. Merged regions can abut. fig614b.press leftMargin: 2.5 in, topMargin: .5625 in, width: 3.5625 in, height: 8.5 in Figure 6-18. Merged regions can overlap. When several dynamically consecutive instructions are deleted in one step, there are two cases. If the deletion is only dynamicthat is, it is a bypassing transformationthen by definition there can be only one new sentry along that execution path: it is the dynamic predecessor of the first bypassed instruction. Otherwise, if a sequence of instructions I is selected for physical deletion, then I can be treated as a unit as long as it has a single entrance. The new sentries are all of the dynamic predecessors of the unique entrance. All determiner assignments on any instruction in S are added to the determiner lists of the new sentries. If I has more than one entrance, it must be partitioned into sections that each have a unique entrance before proceeding. A special case of the procedure to update path determiner assignments deserves mention. If a path determining instruction has no dynamic predecessors, its determiner assignments are not added to any instructions and therefore disappear when it is deleted. A path determining instruction that has no dynamic predecessors is unreachable. Although it represents a sentry of a merged region, that sentry is never traversed at runtime, so removing the determiner assignments does no harm. Dynamic insertion To ensure that no execution path can sneak into a merged region wthout passing a determiner assignment, a dynamically inserted instruction x must copy or move all of its predecessors' determiner assignments down to it. The determiner assignments must be copied for a solely dynamic insertion and moved for a static insertion. However, when the insertion is between the top instruction of a merged region and its preceding label, the inserted instruction is logically part of the merged region and receives no determiner assignments for that region. Dynamic replacement If a path determining instruction x is replaced in an execution path by a new instruction y, x's determiner assignments can be moved to y as long as y has the same dynamic successors in the same semantic situations as x. An example of a replacement instruction that has the same dynamic successors in different (in fact, opposite) semantic situations is JumpIfEqual L replaced by JumpIfNotEqual L. If a replacement causes a change in the successor relationships, it can be decomposed into its constituent deletion and insertion. 6.4.2.5 Analysis and examples of Navigator peephole optimizations First stage optimization 1: deletion of jumps to the next instruction The optimization that removes jumps to the following instruction acts as either a dynamic deletion or a dynamic replacement. An unconditional jump to the following instruction is physically deleted. A conditional jump to the following instruction is replaced by a POP, which removes the conditional jump's operand from the stack (recall that the Cedar virtual machine is a stack machine). unoptimized optimized unoptimized optimized a a, pd1 a a jump L,pd1 cjump L,pd1 POP, pd1 L: L: L: L: c c b b This optimization performs the final step in creating an ambiguous region when both parts of an IF-THEN-ELSE statement are identical. As suggested above, this step could be inhibited in order to provide expected behavior rather than truthful behavior. Figure 6-19 shows the steps in the creation of the ambiguity. source fragment after cross-jumping 10 Const1: INT = 7; 11 Const1: INT = 7; 9 9 9 60 b _ 5; b _ 5; 61 IF cond THEN IF cond THEN pd1 62 a _ Const1 pd2 63 ELSE ELSE 62 a _ Const2; a _ 7; 621 642 after deleting jump to next instruction after deleting unused test b _ 5; b _ 5; pd1, pd2 IF cond THEN pd1, pd2 a _ 7; 621 642 a _ 7; 621 642 Figure 6-19. Steps in the creation of essential ambiguity. First stage optimization 2: branch chaining Branch chaining removes intermediate hops in chains of unconditional jumps by redirecting the first jump so that its new destination is the first instruction in the chain of destinations that is not an unconditional jump. The branch chaining optimization acts as a dynamic deletion. Multiple instructions can be bypassed at once in the case of long chains. Therefore, all determiner assignments that mark any of the intermediate jump instructions must be added to the sentry information of the new direct jump. Another possible form of this optimization redirects all of the jumps in the chain to the final destination at once. This alternate form still requires an entire pass through the code stream, and requires more bookkeeping to update the path determiner assignments correctly. First stage optimization 3: unreachable code deletion Unreachable code deletion also acts as a dynamic deletion, but since the code to be deleted has no dynamic predecessors (by definition of unreachable code), the instructions and their determiner assignments can be deleted with no further processing. First stage optimization 4: condition reversal The optimization that replaces a conditional jump around an unconditional jump by an opposite-sense conditional jump acts as a dynamic deletion of the unconditional jump. First stage optimization 5: cross-jumping Repeated cross-jumping is described in Section 6.4.4. Second stage optimizations: intra-basic block pattern-matching The second stage optimizations are pattern-matching optimizations in which fixed sequences of two to five instructions, all in the same basic block, are deleted or replaced by faster sequences of instructions. In some cases the new sequence of instructions is longer than the original sequence. These optimizations act as dynamic deletions, dynamic insertions, or dynamic replacements. For simplicity, the current Navigator implementation inhibits these optimizations whenever any instruction in a pattern has determiner assignments. Because no control flow is altered, and no sequence is completely deleted, these changes cannot cause problems. Therefore, it would be reasonably straightforward to funnel all of the changes through a single routine that would push all determiner assignments in a sequence to be replaced to the bottom of the new version of the sequence. Extensions: other peephole optimizations A more extensive array of peephole optimizations than those implemented in the Cedar compiler is possible. For example, the Bliss-11 compiler includes peephole optimizations that can modify instructions that are arbitrarily far apart in the code stream (both statically and dynamically) to combine literal constants as operands, to exploit the PDP-11's auto-increment and auto-decrement instructions, to use faster addressing modes, and to remove redundant stores. For correctly updating the path determiner assignments, these optimizations could be decomposed similarly into dynamic insertions, deletions, and replacements. 6.4.2.6 Improvements to this path determiner updating scheme We really want to mark an execution point corresponding to a sentry of a merged region, but we need to put the determiner assignment somewhere once the cross-jumping and path determination algorithms identify the spot. At runtime, a determiner assignment must mark an executable instruction so that it can be replaced by a determining breakpoint. The complicated transformations on determiner assignments that are needed to track other peephole optimizations can be viewed as the result of binding sentries to instructions too early, since later optimizations can alter sentry relationships. It might be possible to create region sentry nodes during cross-jumping, then propagate determiner assignments back to all predecessors of each sentry node laterfor example, during code emission. Then what happens outside the region during later optimizations does not affect path determination. As an implementation of this idea, a label could act as a region sentry node if (1) every sentry corresponding to a different path gets a distinct label and (2) later optimizations do not remove multiple consecutive labels. A sentry node for the fall-through sentry is also needed. Determiner assignments would mark sentry nodes until code emission. Because multiple adjacent labels are allowed, the proposed scheme must use a slightly more complicated test for jump destination equality. The current test for jump destination equality is simple: two jump destinations are equal if and only if they have the same label name. 6.5 Extensions to Navigator for secondary debugging functions Navigator provides effective debugging for inline procedure expansion and cross-jumping for the primary location- and history-oriented debugging functions: setting breakpoints at program statements, determining the current execution location, and reporting the procedure traceback. This section describes ways to extend Navigator to provide effective debugging for the secondary debugging functions also. 6.5.1 Inline procedure expansion The secondary debugging functions that are affected by inline procedure expansion are: displaying the values of parameters and variables when inside expanded inline procedures, setting breakpoints at the entry or exit of inline procedures, calling inline procedures from the debugger's interpreter, and single-stepping with coarse granularity. 6.5.1.1 Displaying the values of local variables or parameters of inline procedures As alluded to in Section 6.2.3, step 2B2a of the inline procedure expansion algorithm, which describes the substitution of actual parameters for formal parameters, is an oversimplification. If the evaluation of an actual parameter involves runtime computation, the substitution is performed similarly to an ordinary procedure call: nodes to perform the computation and assign the result to the formal parameter are generated. However, if an actual parameter is simple (such as a compile-time constant expression or a simple variable), it is substituted directly for the formal parameter throughout the expanded procedure (parameter subsumption). One alternative would be to disallow parameter subsumption. This would result in a less efficient program. To handle parameter subsumption, Navigator would have to record the value of the compile-time constant or the name of the actual parameter in the symbol table entry for the formal parameter, so that requests to display the formal parameter get the expected response. Of course, if the actual parameter were a variable whose value were subsequently changed from inside the inline procedure (by a global assignment), then the value of the formal parameter would no longer be available. A more serious problem would arise for debuggers that allow modification of variable values: the user could not change the value of such a parameter at runtime. These considerations are outside the scope of Navigator, which concentrates on location-oriented debugging functions. Local variables present a minor way in which Navigator does not provide expected behavior. Upon request, the Cedar debugger will list all of the local variables and parameters of the current procedure. With respect to the symbol table, an inline procedure becomes an inner block of its enclosing ordinary procedure. Therefore, when the user requests the local variables inside an inline procedure, the debugger reports the variables for the inline procedure, the inline procedure's parameters (which appear as ordinary variables), as well as the variables for the enclosing ordinary procedure (and any enclosing inline procedures). The extra variables could be removed by recording inline procedure information in the symbol table as well as the statement maps (possibly through the same inline table). A request to list the current local variables would only list the contents of those blocks belonging to the inline procedure. 6.5.1.2 Breakpoints at procedure entry and exit The Cedar debugger also allows a user to place breakpoints at the entry or exit of a procedure, by specifying the procedure's name (as opposed to statement breakpoints, which use a statement number and module name). These entry/exit breakpoints are normally implemented by looking in the symbol table entry for the procedure name, where the first and last object code locations for the procedure's code are recorded. The Cedar compiler ensures that all source-level RETURN statements jump to a single procedure exit at the end of the procedure's object code. The code extent information in the symbol table is part of the debugger's mapping from object code location to procedure name for ordinary procedures. If the Navigator debugger is to provide entry/exit breakpoints, there are two possible solutions. The first solution would be to record a list of pairs of first and last object code locations in the symbol table entry for an inline procedure. For an imported inline procedure, each implementation module would keep its own list. Break entry and exit would set breakpoints at all of the indicated object locations. The second and preferable solution would be to record the first and last statement numbers of each inline procedure in its symbol table entry. Imported inline procedures would need no special processing. In this case, break entry and exit use the source table to translate the indicated statement numbers to the corresponding object locations. Either of these two solutions has a potential problem: it is important that later optimizations do not remove source information corresponding to the entry or exit of an inline procedure. It is typically easier to ensure proper behavior of later optimizations if indicators are present in whatever code form will be transformed, so that undesirable optimizations can be inhibited or symbol table information can be updated. One way to handle this problem is to flag an expanded statement number as the first (or last) statement in an inline procedure. Among the Cedar optimizations that could endanger the entry or exit information are unreachable code deletion (entry or exit) and branch chaining (exit). Of course, code reordering optimizations would create a difference between semantic and syntactic mappings for the entry or exit of an inline procedure. [Recall that the Cedar compiler does not perform code reordering optimizations.] 6.5.1.3 Calling an inline procedure from the debugger's interpreter The Cedar compiler does not emit stand-alone object code for an inline procedure. As a result, the Cedar debugger does not permit users to call an inline procedure at runtime. This capability can be provided in several ways: 9 The Navigator compiler could emit stand-alone object code for inline procedures. 9 The Navigator debugger could interpret an inline procedure's abstract syntax tree. 9 The Navigator debugger could use the compiler's code generator interactively to create an executable representation of an inline procedure when necessary. The Navigator compiler could emit stand-alone object code for inline procedures. In two situations, this solution is particularly inefficient: imported inline procedures and one stylistic use of inline procedures. Since definitions modules are not permitted to contain executable code, each program module that uses an imported inline procedure would need its own copy of the object code. Some Cedar programmers use the inline procedure mechanism as a way of structuring their programs without paying an execution penalty: they separate an ordinary procedure into logical parts (parts that are nevertheless specific to this procedure) and declare each part to be an inline procedure. The result is that each of the inline procedures is called exactly once. If the compiler were to create a stand-alone copy of the code for each inline procedure, the code size of a module constructed in this way could double. A better solution is to leave the abstract syntax tree for an inline procedure in the symbol table. (For inline procedures declared in definitions modules, this is the status quo: it is the communication method from definitions module to program module. For inline procedures declared in program modules, the Cedar compiler deletes the tree before writing the object file.) Either the debugger could interpret the abstract syntax tree when a user calls an inline procedure from the debugging interpreter, or it could use the code generator interactively to create an executable representation of an inline procedure when necessary. 6.5.1.4 Single-stepping with coarse granularity The major problem with single-stepping is keeping the granularity correct from the source viewpoint. This means that whenever a user requests single-stepping with coarse granularity inside a procedure P, the debugger must not report the execution of any statement in an inline procedure called from P or its callees. Single-stepping with fine granularity is not affected. 6.5.2 Cross-jumping The secondary debugging functions that are affected by cross-jumping are single-stepping inside merged regions and displaying the values of variables inside merged regions. 6.5.2.1 Single-stepping Actually, single-stepping inside merged regions works reasonably correctly with the algorithms described. Because determining breakpoints collect control-flow information in a way that can be used by any debugging activity, as long as a region's determiners are activated early enough, single-stepping can use the information also. This property need not have been true. Another way to implement path determination would have been to have a determining breakpoint collect control-flow information for a specific future point. Then it would have been harder for some other place, even though inside the same merged region, to use that information. The "early enough" issue also works properly. Since we require that a path determining instruction is a sentry for the merged region, and a statement-level single-step always executes at least one instruction, it is "early enough" as long as execution has not already entered the merged region. Obviously, if single-stepping is requested after the merged region has already been entered and the determiners were not activatedthat is, the user has neither set breakpoints inside the region nor issued a SUSPECT command for the enclosing procedurethen the debugger can report only a list of source alternatives at the next stopping point. In fact, since statement boundaries need not occur in the same place on two merging paths, in this situation the next stopping point might be inside a statement on one path but at a statement boundary in another. The debugger would have to stop at all such locations. 6.5.2.2 Displaying values of variables If two merged paths each declare local variables, the runtime path determination algorithm's choice of the correct source statement must be used to access the proper variable identifiers and type information for the correct path. As previously noted, the association of variable identifiers with variables is dependent on the correct source statement, while the association of variables with storage locations is dependent on the correct object code location. If the debugger were to use the object location to access the symbol table information for a variable, it might find the wrong variable declaration. 6.6 Summary This chapter has explained the implementation details and rationale behind the Navigator system. The Navigator compiler includes changes to 2CHAPTER 6: COMPLETE NAVIGATOR ALGORITHMS Êê–;(firstPageNumber) 129 .def (firstHeadersAfterPage) 129 .def–"thesis" style˜Iblock•Mark centerHeaderš ÐlsÏsžœžœžœž ™)J™chapheadšœ)˜)I thesisbodyšœÌ˜ÌMšœ-Ïiœˆ˜Ç—headšœ˜M˜ÝMšœ¨˜¨MšŸœk˜ˆI pagebreakšœ˜Icenter• ArtworkClass IncludePressšœZ˜ZI bigcenteršÏb-˜-MšŸœÒ˜ëMšŸ"œùž œ=˜áMšŸ œ{˜„˜Mšœ¹˜¹Mšœ¤Ÿœe˜—Mšœ™˜™Mš œ™Ÿ œ ŸœdŸœŸœUŸœŸœù˜˜Mšœû˜û—˜-Mšœ†˜†MšŸ-œç˜”MšŸ)œ¾žÜ˜ÃMšŸ-œ˜Ê——˜MšœÌŸœPŸœ¨˜öMšœjŸœžœÚ˜ÌO˜P– IncludePressšœV˜VQš I˜IMšœ…˜…Mšœ¦˜¦MšœÏf œv¡ œ8˜á˜Ltalg1šœ˜talgš ŸÐfi Ÿ¢ ŸÑfiz Ðiz£ Ÿ¢Ÿ˜ƒSšŸ¢ Ÿe˜…SšŸ ¢ Ÿ[˜qSš¤…£ ¤T˜åSš¤F£ ¤Y˜©— šŸ¢ Ÿ*¢ Ÿ˜TSšŸ¢ Ÿ¢ ŸN˜š——Ršœ/˜/ šœ˜ šŸ$˜$SšŸ ¢ Ÿ$˜7SšŸ ¢ Ÿ8¢ Ÿ˜nSš¤ £ ¤>˜SSš¤ £ ¤~˜‘— šœ'˜'Sšœ 6œˆ˜Ã šœ 3œ˜:Sšœ°Ÿ¢ Ÿ.¢ Ÿ-ÐiuŸÆ˜îSšŸt¢ Ÿ¢ Ÿ}˜™— šœ 'œ˜.Sšœq˜qSšŸ¢ Ÿ"¥Ÿ˜GSšŸ.¢ Ÿ ¢ Ÿ+˜x——S˜9—S˜—RšÏz˜SšÏuœ5¡ œ.¡ œ1˜©Sš§œ9¡ œ9¡ œ˜¡O˜˜IMšœOž3œâ˜ä˜ Mšœ¯˜¯Mš œdÏkœ¾¡ œ0¡ œ6¡ œ5¡ œ0˜£MšœI¡œ¡œ*žœ!¡œ$¡œ¡œ3¡œ¡œ¡œ,¡œ˜ø—˜"Mšœñ˜ñP– IncludePressšœV˜VQš @˜@MšœÕ˜ÕM˜øMšœé¡ œ+¡ œX˜€Mšœ…Ÿœ£Ÿ œª˜ãMš œœŸœŸœUŸœŸœt˜‚O˜P– IncludePressšœZ˜ZQš =˜=Mšœd¡œq¡œ˜ß——˜@M˜œMšœµž œ²žJœ¦˜âM˜’O˜——šœ˜MšœÏ˜ÏMšœçŸ ˜ôMšœ¯ ˜¯ O˜˜‘ItenumšÑmvxœ0˜1Tš©œ/˜0Tš©œB˜C—Itblock˜¤˜rItdiscussšŸœÝ˜úVšŸœ½˜×VšŸ œÀ˜Ë—šœ$˜$Mš'œ!Ÿ œŸœŸœŸœ&Ÿœ Ÿ œŸœŸœŸœ#ŸœŸœŸœŸœ(Ÿœ½Ÿ œŸ œ5ŸœŸœÉ˜åMšœŸœè¡œ¡œ*¡œ¡œ ¡œ ¡œ°˜“—˜;MšœÜ˜Ü˜#Mšœù˜ùMšœB˜BIdsindentš œŸœIŸ œŸ œYŸœ!Ÿ œ{˜†WšœŸœ¤˜¬WšœŸœE˜OMš œ†Ÿ œ Ÿ œ«Ÿ œ3ŸœÀŸœ_Ÿœ˜òM˜ITš©œ˜˜™Tš©œ€˜Tš©œ]˜^O˜P– IncludePressšœ\˜\Qš <˜ž œ?˜åMšœãžÕ˜¸Mš œåžœ:žœAžœžœ0žœžœ ˜ÎM˜ûMšŸ0œÛž’œƒ˜  MšŸHœÐžiœ€˜ —˜>M˜ˆ—˜NM˜¥O˜U˜£Mšœ²Ÿœ§˜öMšœŸœü˜¨M˜†MšœŽ˜Ž——˜PM˜«˜NMšœŸœÇ¡œ9Ïx¡œ˜°Mšœ¡¡ œ³¡ œ¸˜žMšœ  œ²˜¾O˜P– IncludePressšœV˜VQš ,˜,Ušœ‚Ÿœ›˜ Mšœ³¡œÉ˜ƒ MšœƒŸœ¯˜ÉO˜P– IncludePressšœX˜XQš I˜IP– IncludePressšœW˜WQš I˜IMšœB œ* œ¡ œ œ  œ œb œ7± ± œ˜ôMšœŸ±œ˜¢šœq¡Ðdfœ¡²œ`˜×Wš œ¡²œÏmœ±¡²œ˜.Pšœ˜Wš œ¡²œ³œ±¡²œ˜/—MšœÂ˜ÂMšœù˜ùM˜ûM˜ºO˜—P– IncludePressšœV˜VQš \˜\˜O šœ˜Sš ŸU¢Ÿ ÑfijŸ´¢Ÿ'´ Ÿ ¤x˜ªSšŸB¢ Ÿ>¢ Ÿ¤,£ ¤˜Ð—RšœŸœß˜òR˜B ˜(SšœA˜A ˜[ ˜$Sšžœ  ˜SšœÈ˜ÈSšŸG˜G šœJ˜J šŸ&¤ Ÿ ˜= šŸ5˜5SšŸ´Ÿ ®Ÿø¤ Ÿn˜”SšŸG¤ ˜ç—SšŸ@¤ Ÿm˜¶——S˜D šœŠ˜ŠSšžœ ˜SšŸ0¤7Ÿƒ¤6˜ Sš¤§˜§SšœJ˜JS˜xSšžœ ˜— šŸF˜FSšŸ³˜³SšŸL¢ Ÿ¤;˜§—SšœŸ œ˜ž ˜SšŸ¤˜¤— šŸ8˜8SšŸ\¢ Ÿ¤:˜¤—Sšžœ  ˜————šœ Ÿœ/˜CMšœ@ž®žœë˜ÎMšœÔžrœÌŸœÃ˜Ü Mšœø˜øšœ˜XšœŠ˜ŠXšœ¢˜¢—˜"Xšœ…˜…—P– IncludePressšœ\˜\Qš <˜Mšœì˜ì—˜(M˜ò——˜