[Indigo]<Voice>pollez>ThesisDone>NewChap7!3

CHAPTER 7: ANALYSIS OF THE PATH DETERMINATION ALGORITHMS

Chapter 7

Analysis of the Path Determination Algorithms

The preceding chapter explained Navigator's compiler and debugger algorithms in detail. This chapter carefully investigates the compiler and debugger path determination algorithms. The first section presents proofs that the algorithms work correctly. The second section analyzes the efficiency of the algorithms and the resulting structure of path determiner uses and assignments for several forms of repeated merging. The third section reports on several attempts to improve the efficiency by changing the compiler's path determination algorithms, each of which failed to handle some programs or some debugging scenarios correctly. The fourth section informally describes a separate minimization post-pass that can effectively reduce the complexity of the path determination structures.

7.1 Proofs of correctness

This section shows that the compiler and debugger path determination algorithms work together properly to provide correct path determination at runtime. That is, (1) when a breakpoint is requested early enough in the program's execution [or when the SUSPECT command is issued early enough], the debugger provides the single correct source alternative for execution locations inside the breakpoint's merged region, and (2) otherwise, the debugger reports precisely those source alternatives that are possible for the current execution location.

7.1.1 Compiler

To show that the compiler's path determination algorithm works correctly, we must prove that the following properties hold:

C1. There is no way to enter a merged region without passing an associated determiner assignment. In other words, every sentry of a merged region assigns one of the two determiners for that region.

C2. Every source descriptor (sentry descriptor) uses the correct determiner.

C3. For every object instruction x before cross-jumping, x's source descriptor marks x's corresponding instruction after cross-jumping. (This is what is needed for truthful behavior.)

C4. No non-essential ambiguities are introduced by the path determination algorithm.

C5. In the absence of essential ambiguity, every source alternative has a different set of determiner constraints to satisfy.

C6. Other object code optimizations do not invalidate the preceding conditions.

Two lemmas will be useful:

L1. Each simple merged region has exactly one entrance: the top instruction in the region.

L2. Every immediate dynamic predecessor of the entrance to a merged region (that is, every sentry of the region) is outside the region.

L1. Each simple merged region has exactly one entrance: the top instruction in the region.

Proof. Recall that the cross-jumping and path determination algorithms assume that the destinations of all non-sequential control flow are explicitly marked by labels, each of which precedes the labelled instruction. Since the cross-jumping algorithm never crosses labels, only the top instruction in a merging sequence can be labelled. Furthermore, sequential control flow can only enter a merging sequence through the top instruction. Therefore, each merged region has exactly one entrance: the top instruction in the region. Ž

L2. Every immediate dynamic predecessor of the entrance to a merged region (that is, every sentry of the region) is outside the region.

Proof. Suppose there is a sentry s that is inside the merged region that it guards. By L1, control can only enter the merged region at the top instruction in the region. The immediate dynamic predecessor of the top instruction for sequential control flow is the immediate static predecessor, which is outside the region because the top instruction is defined to be the statically earliest instruction in the merged region. Therefore s must correspond to non-sequential control flow, i.e. a jump instruction. A forward jump to the top instruction is also outside the region by the definition of the top instruction. Therefore, if s exists, it must be a jump instruction that branches backward to the top instruction of the merged region. But the cross-jumping algorithm expressly prohibits such jumps inside merging sequences. It disallows unconditional jumps of any kind, and it also disallows backward conditional jumps. Ž

Lemma L2, and the theorems that depend upon it, are necessary for the following reason: If a sentry s of one of the merging sequences were inside the region, then the history information collected by the path determiner assignment(s) at s might incorrectly identify the source statement for an instruction executing inside the region. To see this, suppose that a breakpoint is placed inside the merged region, activating the region's determiners. When execution reaches the determining breakpoint associated with s, the determiner(s) assigned by s would acquire a more recent timestamp. This would be correct when execution proceeds from s back to the entrance (top instruction) of the region. But when execution proceeds from s on to other instructions in the region (that is, when s is a backward conditional jump), the debugger would always conclude that s and any statically following instructions in the region are executing on behalf of the merging sequence that s guards. This would sometimes be wrong.

C1. There is no way to enter a merged region without passing an associated determiner assignment.

Proof. In other words, we want to ensure that some instruction on every path to every entrance of the merged region has a determiner assignment for that region. By L1, we need only ensure that the immediate dynamic predecessors of the top instruction (that is, the sentries) have determiner assignments for the region. The cross-jumping algorithm adds a to-remain determiner assignment to every immediate dynamic predecessor of the top instruction on the to-remain path. It adds a to-delete determiner assignment to the crossing jump, which is the single immediate dynamic predecessor of the top instruction on the to-delete path. There are no other immediate dynamic predecessors of the merged top instruction. Therefore, all sentries of the merged region have a determiner assignment for that region. Ž

C2. All source descriptors and sentry descriptors have the correct determiner use(s).

Proof. Part A: source descriptors. Every instruction has a source descriptor. Before the merge takes place, the path determination algorithm adds a to-remain determiner use to all source descriptors of all instructions in the to-remain sequence, and a to-delete determiner use to all source descriptors of all instructions in the to-delete sequence. The merge consists of merging the two lists of source descriptors for each instruction. Therefore the resulting instruction has at least one source descriptor that uses each determiner; no source descriptor uses both a determiner and its opposite.

Part B: sentry descriptors. A path determining instruction is an instruction with one or more sentry descriptors. Before the merge, the path determination algorithm adds an indirect to-remain determiner use to all sentry descriptors in the to-remain sequence, and an indirect to-delete determiner use to all sentry descriptors in the to-delete sequence. At runtime, the debugger will compare the timestamp of a determiner that is used indirectly with the timestamp of its opposing determiner. In contrast with source descriptors, a sentry descriptor can indirectly use both a determiner and its opposite. This situation arises when the same instruction in each merging sequence assigns the same determiner; it does not mean that correct information has been lost (see determiner assignment 5 in Figure 7-4). Ž

C3. For every object instruction x before cross-jumping, x's source descriptor (sentry descriptor) marks x's corresponding instruction after cross-jumping.

Proof. Part A: source descriptors. Since the cross-jumping algorithm merges source entries from matching instructions, this is certainly true for instructions other than the joining jump. The source descriptors on the joining jump are moved to mark a new null instruction at the bottom of the merged region. [See the discussion in Section 6.3.2.4.] They cannot mark the dynamically preceding instruction (i.e., the bottom instruction of the sequence) because 1) the bottom instruction might be a conditional jump, in which case a breakpoint there would give control to the user more often than it should, and 2) since a primary breakpoint is visible to the user, the user will expect any effects of the bottom instruction to have already taken place when the breakpoint is reached.

Part B: sentry descriptors. Again, since cross-jumping merges sentry descriptors from matching instructions, the only problem area is any sentry descriptors on the joining jump. These are moved backward to the single dynamically preceding instruction before the merge begins. Because a determining breakpoint is invisible to the user, we must only ensure that the new sentry location and the old sentry location cover the same paths. There is exactly one dynamically preceding instruction because

1) there was a match, which triggered the merge (thus the statically preceding instruction is in the to-delete sequence),

2) all non-sequential control flow is explicit and neither merging sequence can contain an embedded label (thus there can be at most one dynamic predecessor), and

3) neither merging sequence can contain an unconditional jump (thus there is at least one dynamic predecessor).

This dynamically preceding instruction can be a conditional jump, in which case it could cover more execution paths than the original joining jump did. This can cause a problem only if the conditional jump already had a determiner assignment for the opposing determiner. To avoid the ambiguity that would result from moving the sentry descriptors backward in this special case, the cross-jumping algorithm inserts a new null instruction on the to-remain path to keep the path determiner assignments separate. Ž

C4. No ambiguities are introduced by the path determination algorithm.

Proof. An ambiguity is introduced when a single instruction is a sentry of both merging sequences; that is, a single instruction is an immediate dynamic predecessor of an entrance of both the to-delete sequence and the to-remain sequence. By L1, the only possibility for the entrance is the top instruction in each merging sequence. All instructions that were immediate dynamic predecessors of the top instruction in the to-remain sequence will continue to be so; all instructions that were immediate dynamic predecessors of the top instruction in the to-delete sequence will become immediate dynamic predecessors of the crossing jump after the merge. Therefore, no instruction is a sentry of both sequences.

We must also ensure that repeated cross-jumping does not merge two opposing sentries without giving each of them a different path determiner use. The primary merging action of cross-jumping merges instructions from opposing sequences, after each opposing sentry descriptor gets a determiner assignment for its sequence's determiner. Thus the primary merging action keeps sentries separate. The secondary merging action "merges" the joining jump with: its immediate static predecessor, a corresponding jump on the to-remain path, or a null instruction on the to-remain path. When the joining jump is merged with its immediate static predecessor, sentry descriptors from both instructions are given to-delete determiner uses. If these descriptors include assignments for opposing determiners, the opposing determiners will be indistinguishable at runtime. This is the case in which the secondary merging action "merges" the joining jump with a null instruction on the to-remain path, so no problems arise. Ž

C5. In the absence of ambiguity, every source alternative has a different set of determiner use constraints to satisfy.

Proof. The compiler updates the source information by adding a use for determiner d1 to all source descriptors in one sequence and adding a use for the opposing determiner d1 to all source descriptors in the other sequence. Therefore all source descriptors in one sequence must satisfy

latest timestamp (d1) e latest timestamp (d1),

while all source descriptors in the other sequence must satisfy

latest timestamp (d1) e latest timestamp (d1).

This is a clear partitioning. [If latest timestamp (d1) = latest timestamp (d1) and the timestamp value is the initial value (i.e., there is no timestamp cell for the current procedure invocation), then either the determiners were never activated or they were activated after control had already entered the merged region. If the two timestamps are equal and do not have the initial value, then ambiguity has crept in. By C4, ambiguity can be caused only by later object code optimizations. See C6 below.] Before the first merge, each copy of each instruction has one source descriptor. Determiner uses are added in partitioned pairs, as above. Therefore, in the absence of ambiguity, every source alternative has a different set of determiner use constraints to satisfy. Ž

Note that the analog of C5 does not necessarily hold when a single instruction has multiple determiner assignments. When such a path determining instruction is inside a later merging sequence, all of these determiner assignments acquire an indirect determiner use for the same determiner: the determiner for that merging sequence.

C6. Other object code optimizations do not invalidate the preceding conditions.

Proof. Section 6.4.2 considers this question explicitly. To summarize, there are three ways to alter "interesting" control-flow with respect to a merged region: a transformation can insert a label before an instruction in the merged region, delete a sentry instruction, or insert an instruction between the sentry and the entrance on some execution path. Only the cross-jumping algorithm is permitted to insert labels in the code stream. For any transformation that inserts or deletes instructions, added processing ensures that all new sentries of the merged region get the proper determiner assignments.

A dynamic deletion could potentially introduce ambiguity: determiner assignments must be moved backward and can collide with another determiner for the same region. Checking for an opposing determiner assignment is not sufficient here, because there may be a complicated hierarchy of indirect determiner uses. However, if two determiners for different (non-overlapping) regions collide, no ambiguity can result.

A determiner assignment cannot be pushed backward (via instruction deletions) past any determiner assignment on which it depends. A determiner assignment d1 has an indirect use for d2 and therefore depends on determiners d2 and d2 when d1 is inside a merging sequence whose two determiners are d2 and d2. Therefore, either d2 or d2 occurs before d1 in every execution sequence that includes d1. Any kind of instruction deletion pushes all determiner assignments for that instruction backward along every affected execution path (that is, all execution paths for a physical deletion and a single execution path for a dynamic deletion). It does not push the determiner assignments back statically. Thus a determiner assignment d1 and a determiner assignment d2 on which d1 depends may end up marking the same instruction, but d1 cannot pass d2. To cater for this possibility, all determining breakpoints on an instruction should be honored before any primary breakpoints on that instruction. Ž

7.1.2 Debugger

To show that the debugger's path determination algorithm works correctly, we must prove that:

D1. Setting a breakpoint inside a merged region activates enough path determiners to discriminate among the source alternatives for that instruction.

D2. Setting a breakpoint when the execution point is already inside a merged region never causes incorrect behavior (although it may cause only truthful behavior).

D1. Setting a breakpoint inside a merged region activates enough path determiners to discriminate among the source alternatives for that instruction.

Proof. Every source descriptor in each merging sequence has a path determiner use for every merged region it participates in. As noted above, since source descriptors from each merging sequence combine to form the merged source information, each instruction therefore has a to-remain determiner use and a to-delete determiner use for each merge in the object table. When a path determining instruction appears in a merging sequence, its path determiner assignments get an indirect determiner use for its sequence; for every such indirect determiner use, the debugger activates both that determiner and its opposite. As a result, every instruction that is inside any merged region has its source descriptors and its sentry descriptors (if any) connected to the determiners for that region. [For path determining instructions, this connection can be through a chain of indirect determiner uses.] By C1, determiner assignments mark every sentry of every merged region. Thus every instruction in a merged region can be separated into the two original instructions corresponding to its to-remain and its to-delete sequences, according to the determiner assignments attached to the most recent sentry of the region. Every path determiner assignment inside a merged region can be separated similarly. By possibly recursive path determiner activation, every such separation can be made. Ž

D2. Setting a breakpoint when the execution point is already inside a merged region never causes incorrect behavior (although it may lead to only truthful behavior).

Proof. A computation reaches a sentry of a particular merge exactly one instruction before entering the merged region. Since every sentry of every merged region has a determiner assignment for one of the region's two determiners, including sentries of subregions of repeatedly merged regions, that is the last time that the determiners for that region can be activated and do their job properly. Consider the following program schema:

fig70a.press leftMargin: 3.625 in, topMargin: 1.625 in, width: 2.375 in, height: 1.3875 in

One execution path is x;c;y. The other is x;b;c;y. Suppose that execution is stopped at b via a user interrupt or a breakpoint in a procedure called from b. No path determiners for c have been activated yet. A breakpoint is then placed on statement 2, activating determiners 1 and 2. Although x (pd1) has just executed and it is therefore too late for its timestamp to be recorded, it's not too late for b (pd2). Thus path determination proceeds correctly.

Repeated merging can create situations in which some determiners that are used inside a region R have already been activated, while the determiners for R itself have not been activated. By D1, setting a breakpoint inside a merged region activates enough determiners to discriminate among the source alternatives for that instruction. That may not be enough determiners to discriminate among the source alternatives for some other nearby instruction. To see this, consider the following example.

1 SELECT x FROM 1 SELECT x FROM
#"#"#"#" "#"#"#"
10 1 => {x ← 20; 10 1 => {x ← 20;
11 P[3]; <go to L2>}; pd4
12 d ← 4};
#"#"#"#" "#"#"#"
20 3 => {y ← 10; 20 3 => {y ← 10;
21 b ← 1; B2 <go to L1>}; pd2
22 c ← 2;
23 P[3];
24 d ← 4};
#"#"#"#" "#"#"#"
30 5 => {z ← 0; 30 5 => {z ← 0; pd1
31 b ← 1; L1: b ← 1; (311, 212) primary bkpt B2
32 c ← 2; c ← 2; (321, 222) pd3
33 P[3]; L2: P[3]; (331,3, 232,3, 114)
34 d ← 4}; B3 d ← 4}; (341,3, 242,3, 124) primary bkpt B3
35 ENDCASE; 35 ENDCASE;

The determiners for the merged region "d𡤄" are determiners 3 and 4. Determiners 1 and 2, which are used inside the merged region "d𡤄", can be active when determiners 3 and 4 are not. The following scenario explains the sequence of events necessary to create this situation:

Suppose that the user requests two breakpoints before starting the program: breakpoints 1 and 2. Breakpoint 1 is placed inside the procedure P, which is called from inside the merged region at statements 11, 23 and 33. Since breakpoint 1 is in a different procedure, it activates no path determiners. Breakpoint 2 is placed at statement 21, activating path determiners 1 and 2. The user starts the program, and control enters the merged region through path determiner 1 at time t1. The primary breakpoint for breakpoint 2 is reached (control is not given to the user), and breakpoint 1 inside P is reached. While at this breakpoint, the user requests a breakpoint at statement 34, activating determiners 3 and 4 (determiners 1 and 2 were already active). When the program resumes execution, the primary breakpoint for breakpoint 3 is reached, but the timestamps for determiners 3 and 4 still have their initial value and cannot help to discriminate among the source alternatives. [Actually, there are no timestamps for pd3 and pd4 for this procedure invocation; the debugger interprets this as a timestamp value of zero.] Since determiners 1 and 2 have timestamp values [pd2 doesn't really either] and path determiner 1's timestamp is later than path determiner 2's timestamp, the debugger can eliminate statement 24 as an alternative. With this partial information, the debugger

reports that either statement 12 or statement 34 is executing. Note that if breakpoint 3 had been placed at statement 24, this partial information would have been sufficient for the debugger to resume the program's execution without notifying the user.

For situations like this to arise, the user must be able to stop somewhere inside a merged region without having activated all of the determiners for that stopping place, set a breakpoint, and resume program execution. Assuming that execution cannot be resumed after a program exception, this is possible only via a user interrupt or a merged region that contains a procedure call. In particular, as long as single-stepping is invoked while outside any merged region [and as long as a determining breakpoint placed on the current instruction (harder for after-instruction breakpoints) is handled properly], activating each successive breakpoint while at the previous one works without difficulty.

Actually, this situation does not differ substantially from setting only breakpoint 1 inside procedure P, arriving at breakpoint 1, and then setting a breakpoint 2` at statement 34, thus activating path determiners 1 through 4 simultaneously. Since the immediate dynamic predecessors of each merged region assign that region's determiners, execution is always either inside a merged region or not; it cannot be in one side of the region but not in the other. Ž

7.2 An analysis of repeated merging

Despite the conceptual simplicity of path determination for a single application of cross-jumping, repeated merging can be quite complicated and can construct very complex structures of path determiner uses and assignments. This section emphasizes (rather painfully at times) the ordered nature of cross-jumping: when three identical paths X, Y, and Z join, cross-jumping X and Y followed by X+Y and Z can give a quite different result than cross-jumping Y and Z followed by X and Y+Z.

When n paths with some common tail portions join, the cross-jumping algorithm is applied at least n—1 times to merge the entire region completely. In the process, the path determination algorithm inserts at least 2(n—1) path determiner assignments on at least n path determining instructions. [Recall that the path determining instructions are all of the sentries of the merged region; that is, they are

the immediate dynamic predecessors of the top instruction of the merged region.] However, the exact way that instructions in the n paths match and the order of application of cross-jumping to the n paths can require

" many more merges to finish merging the region,

" multiple instances of some determiner assignments: one instruction may assign many determiners and many instructions may assign the same determiner,

" many determiner uses in a single source descriptor, and

" many indirect determiner uses in a single sentry descriptor.

For the purposes of this section, the region refers to all of the instructions that will be merged by repeated cross-jumping. It includes n paths such that: all n paths join at the same label, each path has a tail segment that is identical to the tail of some other path in the region, and each matching tail segment is a legal merging sequence (that is, it has no internal labels or control flow). A single application of cross-jumping affects a subregion of the entire region, producing a merged subregion. An external sentry of the region is an instruction outside the region that is an immediate dynamic predecessor of some instruction inside the region. The unoptimized region may include some paths that match only a subtail of the longest matching tail among the n paths. When this happens, some iteration of cross-jumping will introduce a label into the region, and one or more instructions inside the region (but outside the new merged subregion) will become internal sentries of that merged subregion. Note that if an execution path has internal labels or control flow, the region must be partitioned into subregions that have no internal labels or control flow, and even more applications of cross-jumping will be required.

This section carefully examines the properties of repeated merging. It presents actual and theoretical bounds on the number of instances of path determiners for a variety of merge orderings, including storage requirements and runtime path determination speed at primary and determining breakpoints. The interesting merge orderings are

(1) all paths the same length,

(2) successively shorter paths, and

(3) successively longer paths.

For each case, several quantities will be measured:

M = the number of applications of cross-jumping required to completely merge the region,

D = the number of instances of determiner assignments in the determiner table,

IU = the number of instances of indirect determiner uses in the determiner table, and

U = the number of instances of determiner uses on source alternatives in the object table.

The number M affects the compiler's execution time. The numbers D, IU, and U affect the storage requirements of both the compiler and the debugger, as well as the debugger's execution time. For each item in each ordering, we present both the actual value of that item for the current compiler path determination algorithm and the theoretically necessary value to permit correct path determination. The difference between these two values represents the algorithm's inefficiency.

Throughout the following analysis, we will use

n = the number of paths in the region,

e = the number of external sentries of the region (e e n),

i = the number of instructions in the region before optimization (i e n), and

b = the number of object table entries needed to specify source information for the merged region in the optimized object code, or in other words, the number of statement and/or subregion boundaries in the merged region (1 d b d i/n).

Paths are numbered in the order of their merging: first, paths 1 and 2 are merged, then 1+2 and 3, and so on. The number of external sentries of the ith path is signified by ei.

To permit path determination for an arbitrary region, all external sentries of the region must assign a path determiner for the region. Of course, if a single path has several external sentries, each external sentry will be marked with a separate instance of the same determiner assignment. Also, all internal sentries of a merged subregion assign a path determiner for the subregion. Regardless of the total length of the region, no region can have more than n—2 internal sentries, because no path in the region can have internal control flow. It is theoretically possible to construct the determiner uses in the source descriptors for each instruction in such a way that only one determiner assignment will be needed at each sentry. Therefore, depending on the structure of a region, somewhere between e and e+n—2 instances of path determiner assignments are theoretically necessary to permit path determination for it. [If the region is coalesced (via equivalence classes), only e instances of path determiner assignments are needed; see Section 7.3.1.]

In most cases, n is 2 or 3. Although relatively rare, there are real programs in which n is 10 or more, typically for short regions. The actions for common table-driven parsing algorithms often have this structure.

A note on the figure representations: in the examples, both internal and external sentries are represented by double enclosing circles. A gray box encloses the merging sequences for a given merge; the number of the merge appears in the upper left corner of the box. Recall that path determiner assignments on a path determining instruction are displayed at the upper left of the instruction. Indirect determiner uses appear here as subscripts. Source information for the instruction is displayed at the lower right of the instruction. Determiner uses on the source alternatives appear here as subscripts.

7.2.1 Repeated merging: all paths the same length

This section analyzes the runtime storage requirements and execution speed for merged regions that consist of several identical paths that are all the same length. The execution speed analysis applies equally to the remaining two cases, successively longer merges and successively shorter merges, which are considered in subsequent sections.

7.2.1.1 Debugger storage requirements: path determiner table and object table

Figure 7-1 illustrates the application of cross-jumping to a program schema in which four identical paths join. The cross-jumping algorithm needs three successive merges to collapse the region completely. The first application of cross-jumping merges paths 1 and 2, creating a single path 1+2 that has two sentries. The second application of cross-jumping merges paths 1+2 and 3, creating a single path 1+2+3 that has three sentries, and so on. When all paths are the same length, the cross-jumping algorithm uses a total of

M = n—1

successive merges to collapse the region completely.

Since the path determination algorithm adds a path determiner assignment to every sentry of each sequence, the successive increase in the number of sentries described above implies that later merges create more determiner assignments than earlier merges do. (Another way to look at this

fig71.press leftMargin: 1.5625 in, topMargin: .8125 in, width: 5.8125 in, height: 6.0625 in

M n—1 When the longest path has n—1 instructions,
i = n2—n. If b = n—1, then:
Dsingle ýn2+ýn—1
sentry Utotal n3—3n + 1
n 2
Dmultiple (S j en—j+1 ) — e1
sentries j=1 Utotal/i ~ýn

IU 0 Dsingle/M ~ýn

Uper group ýn2+ýn—1 i/M n

Figure 7-1. Actual repeated cross-jumping: all paths the same length.

is that earlier paths get more determiner assignments on their sentries than later paths do.) Every determiner assignment becomes a corresponding entry in the determiner table. In this example, in which each of the four paths has one external sentry of the unoptimized region, storage for nine determiner assignments is used in the path determiner table. If d is the number of path determiner assignments on each sentry of the region, then d is 1 for each sentry of the last path, 2 for each sentry of the preceding path, and so on backward, ending with n—1 for each sentry of the second path and n—1 for each sentry of the first path. When each path has a single external sentry, the total number of path determiner assignments is

n
Dsingle = (S d ) — 1 = n2+n — 1.
sentry d=1 2

If one or more paths have several external sentries, more determiner assignments are created. Paths that are merged earlier contribute more to the number of extra determiner assignments. The generalized expression for the total number of path determiner assignments is

n
Dmultiple = (S d en—j+1 ) — e1.
sentries d=1

This case does not create any indirect determiner uses.

Similarly, every source descriptor for an instruction in a merging sequence gets a path determiner use for that sequence's determiner, in addition to its previous determiner uses. Therefore, later merges also create more determiner uses in the source descriptors. Source information and its determiner uses appear in the object table [and in the actual implementation of code streams; see Section 6.3.5] only at statement or region boundaries. We will call the set of merged instructions that all have the same source information a group of instructions; there are b groups in the region. If s is the number of determiner uses for each source descriptor, the number of determiner uses in the object table for each instruction group is

n
Uper group = (S s ) — 1 = n2+n — 1.
s=1 2

If each path contains n—1 instructions, then there are

i = (n—1) n = n2—n

total instructions in the unoptimized object code. We will use this number of instructions per path to compare relative inefficiencies with the following cases of repeated merging. If there is one instruction per source statement in each path (also useful for comparison), then b = n—1. Given these assumptions, the total number of determiner uses in the object table for the region is

Utotal = (n2+n — 1) b = n3—2n2—n + 1.
2 2

Additional sentries (if any) do not affect the composition of the determiner uses in the source descriptors.

Since the translation from the optimized program schema to the contents of the path determiner and object tables is straightforward, the remaining figures in this section omit the tables.

Considering this region theoretically, it would be possible to distinguish among the n paths if there were n distinct path determiners, where (only) the external sentries of the ith path would assign determiner di. Thus a total of e instances of path determiner assignments would be needed. Similarly, n determiner uses per instruction group would be needed in the object table, requiring a total of nb determiner uses marking source alternatives in the object table. The determiner with the most recent timestamp value among the n alternatives would indicate the correct source alternative. Figure 7-2 shows the idealized view of cross-jumping n paths of equal length.

7.2.1.2 Debugger execution speed: determining and primary breakpoints

Because the debugger uses one determining breakpoint per object location, the debugger uses only the minimal execution time at a sentry of a merged region (see Section 6.3.2). It also uses the minimal timestamp storage for active determiners. This minimality holds regardless of how many determiners use the history information from a given object location and regardless of the order in which compiler merges the paths. The minimality of the number of timestamp cells allocated for a given procedure invocation is especially important for recursive procedures and multiple processes, because there is a separate copy of each timestamp cell for each procedure invocation.

However, the execution time required at the primary breakpoint to determine the correct source alternative depends upon the size and complexity of the path determiner structures. At a primary breakpoint, two levels of timestamp comparison must occur. At the lower level, the debugger must ascertainthe determinertimestamp foreachdeterminer usedby thesource

fig72.press leftMargin: 2.5625 in, topMargin: .875 in, width: 4.0 in, height: 6.125 in

M n—1 When the longest path has n—1 instructions,
i = n2—n. If b = n—1, then:
Dsingle n
sentry Utotal n2—n

Dmultiple e
sentries Utotal/i ~1

IU 0 Dsingle/M ~1

Uper group n i/M n

Figure 7-2. Idealized repeated cross-jumping: all paths the same length.

alternatives for the primary breakpoint: this means finding the latest timestamp among all timestamps for each assignment of a given determiner. At the higher level, the debugger must find the correct source alternative: this means comparing the timestamps of a determiner and its opposing determiner for the entire list of determiner uses on each source alternative.

In a straightforward implementation, which looks adequate when only simple merges are considered, the debugger would do both levels of timestamp comparison for each determiner use in each source alternative. Thus, for paths of equal length that have a single sentry per path, the Uper group (which is O(n2)) determiner uses for each instruction would require the examination of T (O(n3)) timestamp values at a primary breakpoint, where

n—1
Tsingle = n + S j2 = 2n3—n2+4n — 1.
sentry j=2 6

To avoid this extra factor of n, the Navigator debugger caches the current timestamp value of each determiner during the timestamp evaluation for a primary breakpoint. Thus the lower level of timestamp comparison need only be performed once. There are 2M distinct determiners; the timestamp for each of the D instances of these determiners must be examined. In total, it costs an additional 2M units (D+2M << T) of both storage and speed to store the current timestamp value of each determiner at each primary breakpoint.

The higher level of timestamp comparison can also be improved over the straightforward implementation. If the source alternatives are examined in order of increasing length of determiner use lists, and if the determiner use lists on each source alternative are arranged in order of the successive merges (i.e, the determiner corresponding to the first merge of the source descriptor appears first in its determiner use list), then the correct source alternative may be found before all determiner uses for all of the source alternatives for that instruction are examined or compared. For instance, suppose that a primary breakpoint is placed on instruction b in Figure 7-1. If determiner 6's timestamp is more recent than determiner 5's timestamp, then instruction b is executing on behalf of source statement 42 and no further timestamps need be examined. Otherwise, if determiner 4's timestamp is more recent than determiner 3's timestamp, then b is executing on behalf of source statement 32 and no further timestamps need be examined; and so on. This comparison algorithm handles equal timestamps as follows: if two determiner timestamps are equal at any point in the comparison, then the current and all remaining unchecked source alternatives are the source possibilities for this execution of the instruction.

The following table describes both the actual and the theoretically necessary processing at a primary breakpoint. It shows the number of times that the value of a determiner's timestamp must be examined (E), the required number of comparisons of determiner values (Cdet values), the required number of comparisons of timestamp values for merged regions with a single sentry per path (Ctimestamp single), and the required number of comparisons of timestamp values for merged regions with multiple sentries per path (Ctimestamp multiple).

actual required

n
Esingle=IU (S j ) — 1 = ýn2+ýn—1 n
j=1

Cdet values n—1 n

n—2
Ctimestamp S j = ýn2—3/2n 0
single j=1

n
Ctimestamp (S j en—j+1 ) — e1 e
multiple j=1

7.2.2 Repeated merging: successively shorter paths

Consider the program schema in Figure 7-3: four joining paths are merged in successively shorter order. Again, for simplicity, each path has one external sentry of the unoptimized region. As before, three successive merges are required to completely collapse the region. When the paths are merged in successively shorter order, the cross-jumping algorithm needs a total of

M = n—1

successive merges to collapse the region completely.

This case has very pleasant properties: an instruction that was merged in the jth merge becomes the single sentry of one of the merging sequences in the j+1st merge. As a result, only the minimum number of path determiner assignments are inserted, each marking a different object

fig73.press leftMargin: 2.625 in, topMargin: .875 in, width: 4.1875 in, height: 5.9 in

M n—1 When the longest path has n—1 instructions,
i = ýn2+ýn—1 and b = n—1:
Dsingle 2n—2
sentry Utotal n3+3n2—4n
n 6
Dmultiple n—2 + S ej
sentries j=1 Utotal/i ~1/3 n

IU 0 Dsingle/M 2

Uper group ýp2+ýp—1 i/M ~ýn

Figure 7-3. Actual repeated cross-jumping: successively shorter paths.
(Idealized version is the same.)

location. For this example, the number is 2(n-1), or six. In general, the number of determiner assignments is

n
Dmultiple = n — 2 + S ej = n—2+e.
sentries j=1

This case does not create any indirect determiner uses.

Furthermore, this case causes only the minimum number of determiner uses in the object table. If p is the number of paths in the unoptimized subregion in which a given instruction group appeared and s is the number of determiner uses on each source alternative in the group, the number of determiner uses in the object table for the instruction group is

n
Uper group = (S s ) — 1 = p2+p—1.
s=1 2

If the longest path contains n—1 instructions (the minimum number to permit n—1 successively shorter paths), the total number of instructions in the combined region is

n
i = (S j ) — 1 = n2+n — 1,
j=1 2

and each instruction group contains exactly one instruction. [There may be more than one instruction per source statement, but subregion boundaries have been introduced between every pair of instructions.] Then the total number of determiner uses on source alternatives in the object table is

n
Utotal = S (p2+p — 1) = n3+3n2—4n.
p=2 2 6

At first glance (and in the two examples), the total number of determiner uses for successively shorter paths is smaller than the total number of determiner uses for paths of the same length. This is primarily because the unoptimized region has fewer instructions in the successively shorter case. However, the number of determiner uses could actually be larger for the successively shorter case because the different lengths of the merges can introduce additional subregion boundaries (that is, some of the subregions might contain only a portion of a source statement).

7.2.3 Repeated merging: successively longer paths

Figures 7-4a and 7-4b illustrate the application of cross-jumping to a program schema in which successively longer paths join. This schema is really the same schema as Figure 7-3, but cross-jumping is applied in the opposite order (which, as this section shows, is extremely unfortunate). This time, six successive merges are required to completely collapse the region: the first merge of each instruction creates a label immediately preceding the instruction; subsequent merges must stop at that label because a merging sequence cannot contain an embedded label. When successively longer paths are merged, the cross-jumping algorithm uses a total of

n—1
M = S s = n2—n.
s=1 2

successive merges to collapse the region completely. We call the successive merges that are stopped by a given label one stage of the entire merge. There are n—1 stages in the complete merge.

The successively longer case has terrible properties: it causes many determiner assignments to be inserted, frequently on the same object locations. Each stage merges p paths of the same length, where p decreases from one stage to the next (p = n—s+1). Therefore the analysis of Section 7.2.1 applies to each stage. This example has sixteen determiner assignments.

p
Dsingle = (S j ) — 1 = p2+p — 1
per stage j=1 2

Dmultiple p2—p — 1 + (p—1) en—p+2 ; p=3,..,n
sentries = 2
per stage e1 + en ; p=2.

In addition, this case creates many instances of indirect determiner uses on the determiner assignments. A layer of identical instructions become internal sentries in one stage and get determiner assignments. In the next stage, these instructions are merged to form a single internal sentry; indirect determiner uses on the merged list of determiner assignments distinguish among the merged sentries. By contrast, in the successively shorter ordering described in the preceding section, instructions are merged first and become internal sentries of a new subregion later. This example has fourteen indirect determiner uses. Together with the sixteen determiner assignments, thirty units of determiner storage are used in the determiner table.

n—1
IUper stage = (p—1)(p—2) + S(p—j)2 = 2p3+3p2—5p + 1
p=2 6

n
IUtotal = S (2p3+3p2—5p + 1) = n4—7n2+6n.
p=2 6 12

fig74.press leftMargin: 2 in, topMargin: .875 in, width: 5.25 in, height: 8.5 in

Figure 7-4a. Actual repeated cross-jumping: successively longer paths (part 1).

fig75.press leftMargin: 1.875 in, topMargin: .875 in, width: 4.9375 in, height: 5.1875 in

M ýn2—ýn When the longest path has n—1 instructions,
i = ýn2+ýn—1 and b = n—1:
Dsingle ýp2+ýp—1
per stage Utotal 1/6n3+1/2n2—2/3n

Dsingle 1/6n3+1/2n2—2/3n Utotal/i ~1/3 n
total
n
Dmultiple n3—n + e1—1 + S(p—1) en—p+2
total 6 p=2

IUper stage 1/3p3+1/2p2—5/6p+1 Dsingle/M ~1/3 n

IUtotal 1/12n4—7/12n2+1/2n IUtotal/M ~1/6 n2

Uper group ýp2+ýp—1 i/M ~1

Figure 7-4b. Actual repeated cross-jumping: successively longer paths (part 2).

The successively longer case creates the same number of determiner uses on source alternatives in the object table as the successively shorter case. (But note that although the number of determiner uses are the same, instructions that are merged in stages 2 through n—1 use different determiners in this case.) If the longest path contains n—1 instructions (the minimum number to permit n—1 successively longer paths), then the total number of instructions in the combined region is

n
i = (S j ) — 1 = n2+n — 1,
j=1 2

and the total number of determiner uses in the object table is

n
Utotal = S (p2+p — 1) = n3+3n2—4n.
p=2 2 6

We now consider the merging of n successively longer paths from a theoretical standpoint. Since each stage s is an instance of merging ps paths of equal length, the idealized analysis of Section 7.2.1 applies: to distinguish among the ps paths, stage s should create ps distinct determiner assignments and ps determiner uses per instruction group. If the next stage s+1 were to combine the identical sentries minimally, only ps—1=ps+1 indirect determiner uses would be needed to distinguish among them. At a primary breakpoint, the determiner with the most recent timestamp value among the p alternatives would indicate the correct source alternative. Figure 7-5 shows the idealized view of cross-jumping n successively longer paths.

7.2.4 A comparison of the successively shorter and the idealized successively longer cases

For the successively shorter case, the total storage used when the longest path has n—1 instructions is

D = n — 2 + e
IU = 0
U = 1/6 n3 + 1/2 n2 — 2/3 n
---------------------------------------------------------
1/6 n3 + 1/2 n2 + 1/3 n — 2 + e.

If the same region were merged using the idealized successively longer ordering, the total storage used would be

fig76.press leftMargin: 1.875 in, topMargin: 1.165 in, width: 5.0625 in, height: 5.50 in

M ýn2—ýn When the longest path has n—1 instructions,
i = ýn2+ýn—1 and b = n—1:
Dsingle p
per stage Utotal ýp2+ýp—1

Dsingle ýn2+ýn—1 Utotal/i 1
total

Dmultiple ýn2—ýn—1+e
total

IUper stage p ; p=2,..,n—1 Dsingle/M ~1

IUtotal ýn2—ýn—1 IUtotal/M ~1

Uper group p i/M ~1

Figure 7-5. Idealized repeated cross-jumping: successively longer paths.

D = 1/2 n2 — 1/2 n — 1 + e
IU = 1/2 n2 — 1/2 n — 1
U = 1/2 n2 + 1/2 n — 1
-----------------------------------------
3/2 n2 + 1/2 n — 3 + e.

Surprisingly, when n is five or greater, the idealized successively longer ordering uses less total storage. However, the idealized successively longer case does require more merges (as well as an as-yet-unspecified minimization step; see Section 7.4): 1/2 n2 — 1/2 n as opposed to n—1. Both cases activate the same number of determining breakpoints for a given primary breakpoint, and both cases examine the same number of timestamp values at a given primary breakpoint.

7.2.5 Repeated merging: embedded control flow

The cross-jumping algorithm considers labels for cross-jumping in successive linear passes through the abstract code stream. Therefore, the successively longer merge ordering can be doubly expensive: the first merge of each stage creates a new label, which prevents later applications of cross-jumping from merging longer paths. Furthermore, that new label is most often inserted earlier in the abstract code stream than the label currently being cross-jumped, so that another pass through the code stream is necessary. Therefore, each stage of the merge typically requires a separate pass through the entire code stream. Of course, a more complicated cross-jumping algorithm could maintain a list of labels to be considered so that an entire extra pass would not be needed.

7.3 Reducing the number of path determiners by changing the path determination algorithm

The behavior reported in the preceding section is frequently undesirable. In an attempt to improve both the number of applications of cross-jumping required to merge a region completely and the resulting path determiner structures, I experimented with the cross-jumping and path determination algorithms. I also wanted to examine other algorithms because other compilers place different restrictions on the composition of a merging sequence. Each of the modifications discussed in this section increased efficiency in several respects, but they also created problems with providing expected debugger behavior. These problems were not insoluble, but each required some special-case checking and action. The complexity of the resulting full algorithms finally caused these improvements to be rejected in favor of the conceptually simpler minimization post-pass sketched in the following section.

Algorithm NCJ2 is conceptually simple and complete. Every instruction gets a determiner use (and an indirect determiner use, if the instruction is a sentry) for every sequence it participates in. Every sentry gets a determiner assignment for every sequence whose entrance it guards. The improvements discussed in the following sections attempt to remove the path determiner redundancy [clearly demonstrated in Section 7.2] that can be caused by repeated merging. Unfortunately, removing the redundancy can make it impossible to provide expected behavior for some cases; this may be an acceptable tradeoff. Sometimes it also causes the debugger to report an incorrect source alternative; this cannot be tolerated.

7.3.1 Improvement 1: First crossing marks (no determiner use lists)

The equal length case of repeated cross-jumping [described in Section 7.2.1] uses more storage than is theoretically necessary because every time an instruction belongs to a merging sequence, every source descriptor for that instruction gets a determiner use for the sequence's determiner. Furthermore, the determiner assignments grow similarly: each merging sequence for a given merge gets a new determiner assignment at its sentry, regardless of whether that sentry was already guarding that exact instruction sequence. This explosion of the number of determiner uses and determiner assignments can be prevented by modifying the path determination algorithm as follows: Each source descriptor gets a determiner use for (only) the path determiner for the first merging sequence to which it belongs. Similarly, when a path determining instruction appears in a merging sequence, its determiner assignment(s) get an indirect determiner use (only) for the first merging sequence that contains it. As a result, when a path has been merged before, the algorithm relies upon the original determiner assignments (as well as possible chains of indirect determiner uses) to distinguish among the source alternatives and path determiner assignments. Figure 7-6 shows the application of this version of the algorithm, called the First Crossing Marks algorithm, to the successively longer paths of Figure 7-4.

This change saves storage for determiner uses in the object table directly: each source alternative contains exactly one determiner use. It also saves storage in the path determiner table (as well as processing time at a primary breakpoint) indirectly: a determiner for a merging sequence that included only previously-merged instructions (and no sentries) will not be used by any instructions nor any determiner assignments. Therefore, all assignments of that determiner can be deleted from the determiner table. [Actually, the condition under which a determiner assignment can be deleted is a bit more complicated: no source descriptor uses it, and no sentry descriptor uses it or its opposite. This distinction between source and sentry descriptors arises because it is possible that only one of two merging instructions has a sentry descriptor, but both instructions must have a source descriptor. Therefore a determiner use for each merging sequence will automatically be created for source descriptors, while it may not be for sentry descriptors.] For the equal length and the successively shorter cases, this algorithm uses only n distinct determiner assignments for n paths. Determiner assignments mark only the external sentries.

However, there is a problem with this improvement. Debugger correctness property D2 [setting a breakpoint when the execution point is already inside a merged region never causes incorrect behavior; see Section 7.1.2] may fail. If a merged region has unmarked internal sentries and each primary breakpoint activates only the determiners needed to distinguish among paths to it at runtime, then the timestamp cells can give incorrect information when multiple breakpoints are placed in a repeatedly merged region. Consider the following program fragment, in which three paths have been merged using the improved algorithm and the successively shorter ordering. As a result, control flow can enter the region through a currently inactive external sentry s. If s is still inactive when the primary breakpoint is reached, the debugger can detect this unusual condition and report the source alternative corresponding to s also. But if s is activated between the time that control entered the region and the time that the primary breakpoint is reached, the debugger can be fooled into giving an incorrect response.

1 SELECT x FROM 1 SELECT x FROM
#"#"#"#" "#"#"#"
10 1 => {x ← 20; 10 1 => {x ← 20;
11 P[3]; <go to L2>}; pd4
12 d ← 4};
#"#"#"#" "#"#"#"
20 3 => {y ← 10; 20 3 => {y ← 10;
21 b ← 1; B2 <go to L1>}; pd2
22 c ← 2;
23 P[3];
24 d ← 4};
#"#"#"#" "#"#"#"
30 5 => {z ← 0; 30 5 => {z ← 0; pd1
31 b ← 1; L1: b ← 1; (311, 212) primary bkpt B2
32 c ← 2; c ← 2; (321, 222) [pd3 - unreferenced]
33 P[3]; L2: P[3]; (331, 232, 114)
34 d ← 4}; B3 d ← 4}; (341, 242, 124) primary bkpt B3
35 ENDCASE; 35 ENDCASE;

fig76b.press leftMargin: 1.5625 in, topMargin: .875 in, width: 5.875 in, height: 8.5 in

Figure 7-6. First Crossing Marks algorithm on successively longer paths.

Referring to the program fragment, one possible faulty scenario proceeds as follows: Before starting the program, the user sets breakpoint 1 inside procedure P and breakpoint 2 at statement 21. Breakpoint 2 activates path determiners 1 and 2. After execution begins, control enters the merged region through determiner assignment 1 at time t1. The primary breakpoint for breakpoint 2 is reached (control is not passed to the user), breakpoint 1 inside P is reached, and then control leaves the merged region. Later, the merged region is entered through determiner 4 (which has not yet been activated) at time t2, and breakpoint 1 inside P is reached again. While at breakpoint 1, the user requests breakpoint 3 at statement 34, activating path determiner 4, and proceeds. (Note that procedure P has no knowledge of any determiners active in its caller.) Upon arrival at the primary breakpoint for breakpoint 3, the timestamp cells indicate that statement 34 is executing (timestamp4 = 0, timestamp2 = 0, and timestamp1 = t1). Therefore, although statement 12 is actually executing, the debugger relinquishes control to the user, saying that breakpoint 3 at statement 34 has been reached.

There are two solutions to this problem that retain the improved path determination algorithm.

One solution is to record the time that each determining breakpoint was set. The algorithm for fielding a primary breakpoint at an instruction b becomes more complicated. One pass through the determiners discovers the time t of the most recent recorded sentry of the merged region from the determination timestamps. A second pass identifies the set of possible determiners, which are all determiners referenced in the source alternatives for b that (1) have a determination timestamp equal to t, (2) have at least one assignment whose determining breakpoint was set more recently than t, or (3) have not yet been activated. When debugging is active, this solution uses more debugger storage and requires additional processing at each arrival at a primary breakpoint.

The other solution is to create equivalence classes of determiners, clustering all determiners that cover a given (possibly complex) merged region together. Then all determiners for a given region would always be activated at the same time. These equivalence classes could be created during the cross-jumping optimization or in a single post-pass through the statement map and the determiner table. This solution requires a small amount of additional compilation time to discover the equivalence classes and extra debugging table space to store them. These increases are likely to be more than offset by the reduction in compilation time (for creating and managing the path determiner structures) and path determiner storage that this version of the algorithm permits.

7.3.2 Improvement 2: Allow embedded labels and embedded unconditional jumps

The successively longer case of repeated cross-jumping (described in Section 7.2.3) is much worse than the equal length or successively shorter cases because many more applications of cross-jumping (which insert more determiner uses and assignments) are needed to collapse the region completely. The additional applications of cross-jumping are necessary because the first merge of each stage inserts a new label, thereby preventing later applications of cross-jumping from merging longer paths. If merging sequences were permitted to contain embedded labels, the later applications of cross-jumping could match the full length of the identical tails at once, and would require a total of

M = n—1

successive merges to collapse the region completely. This is the same number of merges as the equal length and successively shorter cases. This modification is completely separable from the modification discussed in the previous section. The version described in this section, called the Embedded Label algorithm, has determiner use lists as described in Chapter 6.

Very few changes to the cross-jumping and path determination algorithms are required to permit embedded labels. The cross-jumping instruction matching routine is altered to skip over embedded labels in either merging sequence. As before, the path determination algorithm gives all sentries of a merging sequence a path determiner assignment for that sequence. However, some of the sentries may now be immediate dynamic predecessors of an instruction other than the top instruction in the sequence.

For additional efficiency, two more changes are made to allow paths with internal control flow, in the form of embedded forward jumps (an IF-THEN-ELSE structure), to be matched in a single application of cross-jumping. First, the cross-jumping instruction matching routine is altered to permit embedded jumps in either merging sequence. Two unconditional jumps match if they have the same jump destination. Two conditional jumps match if they have the same opcode, the same operand(s), and the same jump destination. [Note that conditional jumps are only permitted strictly inside a merging sequence (that is, they are still not permitted as link cells).] Second, when the backward comparison scan encounters an embedded label in the to-delete sequence, it immediately redirects each jump to that label to the corresponding point in the to-remain sequence, inserting a new label there if necessary. This "instant branch chaining" step means that two "congruent" forward jumps will become identical by the time that the backward comparison scan reaches them. It does not provide a general flowgraph-structural match for jumps, however: two "congruent" backward jumps will never match. Note that these internal jumps are not sentries of the merging sequence: a sentry of a sequence is always outside that sequence. [A sentry occurring inside its sequence would overwrite its determiner's timestamp value.] When the backward match and merge occur in one backward sweep, the sentries for each embedded label receive a determiner assignment for the label's path determiner when the label is encountered. When a forward jump to such a label is encountered later in the same backward sweep, it has a determiner assignment equal to the determiner for that sequence. This determiner assignment is removed.

As shown in Figure 7-8, this improved version of the algorithm can merge the successively longer paths of Figure 7-4 in three successive applications of cross-jumping, rather than six. It can merge the complicated nested example of Figure 6-11 [see page 162] in three merges rather than four.

Two problems can arise from this improvement. The first problem is the introduction of nonessential ambiguity. That is, a single sentry z might dynamically precede a different instruction in each merging sequence, as illustrated in Figure 7-7. As a result, z is given determiner assignments for both merging sequences—whenever the merged region is entered through z, the values of the determination cells do not distinguish between the two regions. This situation is particularly unfortunate because it is possible to place more determiner assignments in this region so the source alternatives can be distinguished. If c were merged alone, followed by merging a and b as a unit, the resulting four determiners can distinguish between the two source alternatives for c.

fig711.press leftMargin: 1.875 in, topMargin: .9375 in, width: 5.1875 in, height: 2.5 in

Figure 7-7. Introduced ambiguity in the Embedded Label algorithm.

fig77.press leftMargin: 1.6875 in, topMargin: .53125 in, width: 5.5 in, height: 8.5 in

Figure 7-8. Embedded Label algorithm on sucessively longer paths.

The second problem is worse: incorrect path determination might result. This situation arises when an instruction inside a merging sequence is an immediate dynamic predecessor of an instruction in both merging sequences. This situation, depicted in Figure 7-9, can arise when conditional jumps and labels are permitted inside merging sequences. For instance, the dynamic control flow information collected at the two determiner assignments do not determine which of statements 12 and 32 are executing at instruction b.

fig712.press leftMargin: 1.5625 in, topMargin: .9375 in, width: 5.375 in, height: 2.4375 in

Figure 7-9. Incorrect path determination in the Embedded Label algorithm.

The most satisfactory solution for these problems is to detect them during the application of cross-jumping and disallow embedded labels for their particular cases.

7.3.3 Improvement 3: First crossing marks and allow embedded labels and jumps

The two improvements to the cross-jumping and the path determination algorithms can be combined to create an algorithm that merges paths in fewer iterations and uses only one determiner use per source or sentry descriptor. This algorithm, called the Combined algorithm, has the problems of the preceding two algorithms plus a few of its own. [This is the algorithm reported in my previous paper on Navigator [Zellweger83].] For comparison with the previous algorithms, Figure 7-10 shows the application of the Combined algorithm to the successively longer paths of Figure 7-4.

This combined version causes additional problems for repeated nested merging: that is, for cases in which each of two identical sequences has internal identical sequences. Because of the "first crossing marks" property, the Combined algorithm works only when the inner merge occurs first; the inner determiner assignments then depend upon the outer determiner assignments. When the outer merge occurs first, the Combined algorithm creates ambiguity (i.e., only truthful behavior). These problems do not occur for the First Crossing Marks version because embedded labels are necessary to create nested regions. Note that for either "determiner use list" version (i.e., the original cross-jumping algorithm or the Embedded Label version), it is more desirable to merge the outer region first because then the region can be merged completely with only two merges rather than three.

To solve the problems with the Combined algorithm in general, some sort of recursive cross-jumping algorithm could be used to ensure that inner merges always occur first. Another possibility is to identify these problem cases and fall back to one of the previous algorithms when one of them is discovered.

The following cases demonstrate situations in which the outer merge occurs first. These problems can be created in actual Cedar programs: the juxtaposed case can arise in SELECT statements, while the backward jump nested case can arise in loops containing conditionals. Special-case solutions are identified for each case.

Juxtaposed case (four paths to the merging label). Suppose that two paths to the merging label have identical tails and are statically juxtaposed; they thereby form one "double" sequence. Suppose further that this sequence matches an identical "double" sequence. These two "double" sequences therefore have opportunities for either two or three merges, depending on the merging order. The merging order that causes difficulty is the one in which the "double" sequences are merged first and the internal merge occurs second; Figure 7-11 illustrates the difficulty. These "double" sequences are legal merging sequences for the Combined algorithm because it permits embedded labels and unconditional jumps. In this situation, the first merge gives all of the source descriptors determiner uses for its two determiners. The second merge includes only previously-merged instructions. Since the Combined algorithm never marks such instructions, the two determiners for the second merge are neither activated nor examined to distinguish between the two inner paths—in fact, they are deleted. The final result is that the instructions that participated in the second merge can only be reported as lists of two source alternatives.

As a special-case solution for the juxtaposed case, we can disallow embedded jumps to the current merging label.

fig78.press leftMargin: 1.6875 in, topMargin: .53125 in, width: 5.5 in, height: 8.5 in

Figure 7-10. Combined algorithm on successively longer paths.

fig713.press leftMargin: 1.4375 in, topMargin: .3125 in, width: 5.75 in, height: 8.5 in

Figure 7-11. Juxtaposed case for the Combined algorithm.

Backward jump nested case (two paths to the merging label). Suppose that two identical sequences with internal cross-jumping opportunities (as a result of internal control flow) each end in a backward jump to the merging label. [This situation is similar to the nested cross-jumping depicted in Figure 6-11, page 162.] Because the merging label appears earlier in the codestream than either sequence, the outer sequences are considered for cross-jumping before the inner sequences. As in the juxtaposed case, performing the outer merge first when each source descriptor can only have one determiner use means that the path determiners of the inner merge are never used. The inner statements cannot be distinguished by the resulting path determiner structures.

As a special-case solution for the backward jump nested case, we can delay the application of cross-jumping. Rather than trigger cross-jumping on a label as soon as that label is encountered during the codestream scan, trigger it upon encountering the latest item among the label and all jumps to it. To detect this condition, a "seen-this-codestream-pass" flag can be added to each jump and label. Each successive cross-jumping pass would toggle the flag.

7.4 Reducing the number of path determiners by adding a post-pass

Rather than modifying the path determination algorithm to generate fewer determiner uses and assignments and dealing with the attendant problems, the path determiner structures produced by the unmodified Algorithm NCJ2 could be simplified directly. Path determiner minimization could be conveniently applied as a post-pass over the completed path determiner table and statement maps for a procedure. It would identify unneeded or redundant determiner uses and assignments, deleting them from all debugging tables. In some cases, it would coalesce a group of determiners into a single new determiner.

The minimization algorithm proceeds as follows:

1. For each path determiner p, construct a fully-expanded logical clause that expresses the control flow paths on which p is assigned. To do this, form a disjunction from the object locations listed for p in the path determiner table. A determiner with indirect determiner uses becomes a conjunction of its (direct) object location and all of its successive indirect object locations.

2. Minimize these logical clauses.

3. For each (possibly complex) merged region, construct a structured equivalence class of determiner uses and indirect determiner uses for each instruction in that region. Minimize this equivalence class as a unit.

4. Construct new minimized determiner uses by expanding each list of determiner uses (or indirect determiner uses) as a conjunction of the minimized path determiner disjunctions from Step 2 and minimizing the resulting clause.

5. Map the resulting minimized determiner use clauses back into path determiner identifiers. This step may require some path determiner restructuring.

6. Relabel the determiner uses as needed.

7. Map the minimized determiner assignment clauses from Step 2 back into path determiner identifiers.

8. Delete any unreferenced determiners or determiner assignments.

Figure 7-12 illustrates the operation of this algorithm on the final path determiner structure for the successively longer case of repeated merging, from Figure 7-4b. Note that the resulting structure is the idealized structure shown in Figure 7-5.

Separating the path determination algorithm from the steps needed to make it efficient to store and access at runtime solves only part of the problem. When using the original algorithm, the compiler must still introduce and store many extra determiner uses and assignments; it must also use additional merge steps because of the labels introduced by earlier merges. Path determiner minimization might also be fruitfully applied to the results of the improved versions of the algorithm—although more efficient, these versions are not guaranteed to yield minimal path determiner structures.

7.5 Summary

This chapter has demonstrated that the path determination algorithms work correctly during compilation and during debugging. It has also analyzed the path determiner structures created by several different orders of application of cross-jumping (using Algorithm NCJ2) to more than two code paths that join. The successively shorter path comparison ordering creates a minimal path determiner structure, while the successively longer ordering creates many excess path determiner uses and assignments. [But recall that these inefficient path determiner structures arise only from complex repeated merges, which probably do not occur frequently in actual programs.] The final sections have discussed several ways to speed up complex merges and to reduce the size of these complex path determiner structures. The path determiner equivalence classes suggested in Section 7.3.1 and the path determiner minimization post-pass outlined in Section 7.4 seem especially promising.

Path determiners

1: b7,9 = b 'i ((a'iw) ' ((a'iw) ( y)) = b 'i (a'iw)
2: z
3: b7,9 ( z = (b 'i (a'iw)) ( z = (b 'i (a'iw)) ( z
4: b8,9 = b 'i (y ' ((a'iw) ( y)) = b 'i y
5: b7,9 ( b8,9 ( z = (b 'i (a'iw)) ( (b 'i y) ( z
6: b10 = b 'i (a'ix)
7: a11 = a 'i w
8: y [Note: 'i indicates an indirect determiner reference]
9: a11 ( y = (a 'i w) ( y
10: a12 = a 'i x
11: w
12: x

pdToCheck Lists

1,3,5: (b 'i (a'iw)) ' ((b 'i (a'iw)) ( z) ' ((b 'i (a'iw)) ( (b 'i y) ( z)
= b 'i (a'iw) = path determiner 1

2,3,5: z ' ((b 'i (a'iw)) ( z) ' ((b 'i (a'iw)) ( (b 'i y) ( z)
= z = path determiner 2

4,5: (b 'i y) ' ((b 'i (a'iw)) ( (b 'i y) ( z)
= (b 'i y) = path determiner 4

7,9: (a 'i w) ' ((a 'i w) ( y)
= (a 'i w) = path determiner 7

8,9: y ' ((a 'i w) ( y)
= y = path determiner 8

also referenced: path determiners 6, 10, 11, 12

=> not referenced: path determiners 3, 5, 9. These determiners can be deleted.

Minimized program schema

fig710.press leftMargin: 2.9375 in, topMargin: 3.25 in, width: 3.125 in, height: 2.125 in

Figure 7-12. Path determiner minimization applied to the final form of the successively longer repeated merge from Figure 7-4.