7.2 An analysis of repeated merging
Despite the conceptual simplicity of path determination for a single application of cross-jumping, repeated merging can be quite complicated and can construct very complex structures of path determiner uses and assignments. This section emphasizes (rather painfully at times) the ordered nature of cross-jumping: when three identical paths X, Y, and Z join, cross-jumping X and Y followed by X+Y and Z can give a quite different result than cross-jumping Y and Z followed by X and Y+Z.
When n paths with some common tail portions join, the cross-jumping algorithm is applied at least n—1 times to merge the entire region completely. In the process, the path determination algorithm inserts at least 2(n—1) path determiner assignments on at least n path determining instructions. [Recall that the path determining instructions are all of the sentries of the merged region; that is, they are
the immediate dynamic predecessors of the top instruction of the merged region.] However, the exact way that instructions in the n paths match and the order of application of cross-jumping to the n paths can require
" many more merges to finish merging the region,
" multiple instances of some determiner assignments: one instruction may assign many determiners and many instructions may assign the same determiner,
" many determiner uses in a single source descriptor, and
" many indirect determiner uses in a single sentry descriptor.
For the purposes of this section, the region refers to all of the instructions that will be merged by repeated cross-jumping. It includes n paths such that: all n paths join at the same label, each path has a tail segment that is identical to the tail of some other path in the region, and each matching tail segment is a legal merging sequence (that is, it has no internal labels or control flow). A single application of cross-jumping affects a subregion of the entire region, producing a merged subregion. An external sentry of the region is an instruction outside the region that is an immediate dynamic predecessor of some instruction inside the region. The unoptimized region may include some paths that match only a subtail of the longest matching tail among the n paths. When this happens, some iteration of cross-jumping will introduce a label into the region, and one or more instructions inside the region (but outside the new merged subregion) will become internal sentries of that merged subregion. Note that if an execution path has internal labels or control flow, the region must be partitioned into subregions that have no internal labels or control flow, and even more applications of cross-jumping will be required.
This section carefully examines the properties of repeated merging. It presents actual and theoretical bounds on the number of instances of path determiners for a variety of merge orderings, including storage requirements and runtime path determination speed at primary and determining breakpoints. The interesting merge orderings are
(1) all paths the same length,
(2) successively shorter paths, and
(3) successively longer paths.
For each case, several quantities will be measured:
M = the number of applications of cross-jumping required to completely merge the region,
D = the number of instances of determiner assignments in the determiner table,
IU = the number of instances of indirect determiner uses in the determiner table, and
U = the number of instances of determiner uses on source alternatives in the object table.
The number M affects the compiler's execution time. The numbers D, IU, and U affect the storage requirements of both the compiler and the debugger, as well as the debugger's execution time. For each item in each ordering, we present both the actual value of that item for the current compiler path determination algorithm and the theoretically necessary value to permit correct path determination. The difference between these two values represents the algorithm's inefficiency.
Throughout the following analysis, we will use
n = the number of paths in the region,
e = the number of external sentries of the region (e e n),
i = the number of instructions in the region before optimization (i e n), and
b = the number of object table entries needed to specify source information for the merged region in the optimized object code, or in other words, the number of statement and/or subregion boundaries in the merged region (1 d b d i/n).
Paths are numbered in the order of their merging: first, paths 1 and 2 are merged, then 1+2 and 3, and so on. The number of external sentries of the ith path is signified by ei.
To permit path determination for an arbitrary region, all external sentries of the region must assign a path determiner for the region. Of course, if a single path has several external sentries, each external sentry will be marked with a separate instance of the same determiner assignment. Also, all internal sentries of a merged subregion assign a path determiner for the subregion. Regardless of the total length of the region, no region can have more than n—2 internal sentries, because no path in the region can have internal control flow. It is theoretically possible to construct the determiner uses in the source descriptors for each instruction in such a way that only one determiner assignment will be needed at each sentry. Therefore, depending on the structure of a region, somewhere between e and e+n—2 instances of path determiner assignments are theoretically necessary to permit path determination for it. [If the region is coalesced (via equivalence classes), only e instances of path determiner assignments are needed; see Section 7.3.1.]
In most cases, n is 2 or 3. Although relatively rare, there are real programs in which n is 10 or more, typically for short regions. The actions for common table-driven parsing algorithms often have this structure.
A note on the figure representations: in the examples, both internal and external sentries are represented by double enclosing circles. A gray box encloses the merging sequences for a given merge; the number of the merge appears in the upper left corner of the box. Recall that path determiner assignments on a path determining instruction are displayed at the upper left of the instruction. Indirect determiner uses appear here as subscripts. Source information for the instruction is displayed at the lower right of the instruction. Determiner uses on the source alternatives appear here as subscripts.
7.2.1 Repeated merging: all paths the same length
This section analyzes the runtime storage requirements and execution speed for merged regions that consist of several identical paths that are all the same length. The execution speed analysis applies equally to the remaining two cases, successively longer merges and successively shorter merges, which are considered in subsequent sections.
7.2.1.1 Debugger storage requirements: path determiner table and object table
Figure 7-1 illustrates the application of cross-jumping to a program schema in which four identical paths join. The cross-jumping algorithm needs three successive merges to collapse the region completely. The first application of cross-jumping merges paths 1 and 2, creating a single path 1+2 that has two sentries. The second application of cross-jumping merges paths 1+2 and 3, creating a single path 1+2+3 that has three sentries, and so on. When all paths are the same length, the cross-jumping algorithm uses a total of
M = n—1
successive merges to collapse the region completely.
Since the path determination algorithm adds a path determiner assignment to every sentry of each sequence, the successive increase in the number of sentries described above implies that later merges create more determiner assignments than earlier merges do. (Another way to look at this
fig71.press leftMargin: 1.5625 in, topMargin: .8125 in, width: 5.8125 in, height: 6.0625 in
M n—1 When the longest path has n—1 instructions,
i = n2—n. If b = n—1, then:
Dsingle ýn2+ýn—1
sentry Utotal n3—3n + 1
n 2
Dmultiple (S j en—j+1 ) — e1
sentries j=1 Utotal/i ~ýn
IU 0 Dsingle/M ~ýn
Uper group ýn2+ýn—1 i/M n
Figure 7-1. Actual repeated cross-jumping: all paths the same length.
is that earlier paths get more determiner assignments on their sentries than later paths do.) Every determiner assignment becomes a corresponding entry in the determiner table. In this example, in which each of the four paths has one external sentry of the unoptimized region, storage for nine determiner assignments is used in the path determiner table. If d is the number of path determiner assignments on each sentry of the region, then d is 1 for each sentry of the last path, 2 for each sentry of the preceding path, and so on backward, ending with n—1 for each sentry of the second path and n—1 for each sentry of the first path. When each path has a single external sentry, the total number of path determiner assignments is
n
Dsingle = (S d ) — 1 = n2+n — 1.
sentry d=1 2
If one or more paths have several external sentries, more determiner assignments are created. Paths that are merged earlier contribute more to the number of extra determiner assignments. The generalized expression for the total number of path determiner assignments is
n
Dmultiple = (S d en—j+1 ) — e1.
sentries d=1
This case does not create any indirect determiner uses.
Similarly, every source descriptor for an instruction in a merging sequence gets a path determiner use for that sequence's determiner, in addition to its previous determiner uses. Therefore, later merges also create more determiner uses in the source descriptors. Source information and its determiner uses appear in the object table [and in the actual implementation of code streams; see Section 6.3.5] only at statement or region boundaries. We will call the set of merged instructions that all have the same source information a group of instructions; there are b groups in the region. If s is the number of determiner uses for each source descriptor, the number of determiner uses in the object table for each instruction group is
n
Uper group = (S s ) — 1 = n2+n — 1.
s=1 2
If each path contains n—1 instructions, then there are
i = (n—1) n = n2—n
total instructions in the unoptimized object code. We will use this number of instructions per path to compare relative inefficiencies with the following cases of repeated merging. If there is one instruction per source statement in each path (also useful for comparison), then b = n—1. Given these assumptions, the total number of determiner uses in the object table for the region is
Utotal = (n2+n — 1) b = n3—2n2—n + 1.
2 2
Additional sentries (if any) do not affect the composition of the determiner uses in the source descriptors.
Since the translation from the optimized program schema to the contents of the path determiner and object tables is straightforward, the remaining figures in this section omit the tables.
Considering this region theoretically, it would be possible to distinguish among the n paths if there were n distinct path determiners, where (only) the external sentries of the ith path would assign determiner di. Thus a total of e instances of path determiner assignments would be needed. Similarly, n determiner uses per instruction group would be needed in the object table, requiring a total of nb determiner uses marking source alternatives in the object table. The determiner with the most recent timestamp value among the n alternatives would indicate the correct source alternative. Figure 7-2 shows the idealized view of cross-jumping n paths of equal length.
7.2.1.2 Debugger execution speed: determining and primary breakpoints
Because the debugger uses one determining breakpoint per object location, the debugger uses only the minimal execution time at a sentry of a merged region (see Section 6.3.2). It also uses the minimal timestamp storage for active determiners. This minimality holds regardless of how many determiners use the history information from a given object location and regardless of the order in which compiler merges the paths. The minimality of the number of timestamp cells allocated for a given procedure invocation is especially important for recursive procedures and multiple processes, because there is a separate copy of each timestamp cell for each procedure invocation.
However, the execution time required at the primary breakpoint to determine the correct source alternative depends upon the size and complexity of the path determiner structures. At a primary breakpoint, two levels of timestamp comparison must occur. At the lower level, the debugger must ascertainthe determinertimestamp foreachdeterminer usedby thesource
fig72.press leftMargin: 2.5625 in, topMargin: .875 in, width: 4.0 in, height: 6.125 in
M n—1 When the longest path has n—1 instructions,
i = n2—n. If b = n—1, then:
Dsingle n
sentry Utotal n2—n
Dmultiple e
sentries Utotal/i ~1
IU 0 Dsingle/M ~1
Uper group n i/M n
Figure 7-2. Idealized repeated cross-jumping: all paths the same length.
alternatives for the primary breakpoint: this means finding the latest timestamp among all timestamps for each assignment of a given determiner. At the higher level, the debugger must find the correct source alternative: this means comparing the timestamps of a determiner and its opposing determiner for the entire list of determiner uses on each source alternative.
In a straightforward implementation, which looks adequate when only simple merges are considered, the debugger would do both levels of timestamp comparison for each determiner use in each source alternative. Thus, for paths of equal length that have a single sentry per path, the Uper group (which is O(n2)) determiner uses for each instruction would require the examination of T (O(n3)) timestamp values at a primary breakpoint, where
n—1
Tsingle = n + S j2 = 2n3—n2+4n — 1.
sentry j=2 6
To avoid this extra factor of n, the Navigator debugger caches the current timestamp value of each determiner during the timestamp evaluation for a primary breakpoint. Thus the lower level of timestamp comparison need only be performed once. There are 2M distinct determiners; the timestamp for each of the D instances of these determiners must be examined. In total, it costs an additional 2M units (D+2M << T) of both storage and speed to store the current timestamp value of each determiner at each primary breakpoint.
The higher level of timestamp comparison can also be improved over the straightforward implementation. If the source alternatives are examined in order of increasing length of determiner use lists, and if the determiner use lists on each source alternative are arranged in order of the successive merges (i.e, the determiner corresponding to the first merge of the source descriptor appears first in its determiner use list), then the correct source alternative may be found before all determiner uses for all of the source alternatives for that instruction are examined or compared. For instance, suppose that a primary breakpoint is placed on instruction b in Figure 7-1. If determiner 6's timestamp is more recent than determiner 5's timestamp, then instruction b is executing on behalf of source statement 42 and no further timestamps need be examined. Otherwise, if determiner 4's timestamp is more recent than determiner 3's timestamp, then b is executing on behalf of source statement 32 and no further timestamps need be examined; and so on. This comparison algorithm handles equal timestamps as follows: if two determiner timestamps are equal at any point in the comparison, then the current and all remaining unchecked source alternatives are the source possibilities for this execution of the instruction.
The following table describes both the actual and the theoretically necessary processing at a primary breakpoint. It shows the number of times that the value of a determiner's timestamp must be examined (E), the required number of comparisons of determiner values (Cdet values), the required number of comparisons of timestamp values for merged regions with a single sentry per path (Ctimestamp single), and the required number of comparisons of timestamp values for merged regions with multiple sentries per path (Ctimestamp multiple).
actual required
n
Esingle=IU (S j ) — 1 = ýn2+ýn—1 n
j=1
Cdet values n—1 n
n—2
Ctimestamp S j = ýn2—3/2n 0
single j=1
n
Ctimestamp (S j en—j+1 ) — e1 e
multiple j=1
7.2.2 Repeated merging: successively shorter paths
Consider the program schema in Figure 7-3: four joining paths are merged in successively shorter order. Again, for simplicity, each path has one external sentry of the unoptimized region. As before, three successive merges are required to completely collapse the region. When the paths are merged in successively shorter order, the cross-jumping algorithm needs a total of
M = n—1
successive merges to collapse the region completely.
This case has very pleasant properties: an instruction that was merged in the jth merge becomes the single sentry of one of the merging sequences in the j+1st merge. As a result, only the minimum number of path determiner assignments are inserted, each marking a different object
fig73.press leftMargin: 2.625 in, topMargin: .875 in, width: 4.1875 in, height: 5.9 in
M n—1 When the longest path has n—1 instructions,
i = ýn2+ýn—1 and b = n—1:
Dsingle 2n—2
sentry Utotal n3+3n2—4n
n 6
Dmultiple n—2 + S ej
sentries j=1 Utotal/i ~1/3 n
IU 0 Dsingle/M 2
Uper group ýp2+ýp—1 i/M ~ýn
Figure 7-3. Actual repeated cross-jumping: successively shorter paths.
(Idealized version is the same.)
location. For this example, the number is 2(n-1), or six. In general, the number of determiner assignments is
n
Dmultiple = n — 2 + S ej = n—2+e.
sentries j=1
This case does not create any indirect determiner uses.
Furthermore, this case causes only the minimum number of determiner uses in the object table. If p is the number of paths in the unoptimized subregion in which a given instruction group appeared and s is the number of determiner uses on each source alternative in the group, the number of determiner uses in the object table for the instruction group is
n
Uper group = (S s ) — 1 = p2+p—1.
s=1 2
If the longest path contains n—1 instructions (the minimum number to permit n—1 successively shorter paths), the total number of instructions in the combined region is
n
i = (S j ) — 1 = n2+n — 1,
j=1 2
and each instruction group contains exactly one instruction. [There may be more than one instruction per source statement, but subregion boundaries have been introduced between every pair of instructions.] Then the total number of determiner uses on source alternatives in the object table is
n
Utotal = S (p2+p — 1) = n3+3n2—4n.
p=2 2 6
At first glance (and in the two examples), the total number of determiner uses for successively shorter paths is smaller than the total number of determiner uses for paths of the same length. This is primarily because the unoptimized region has fewer instructions in the successively shorter case. However, the number of determiner uses could actually be larger for the successively shorter case because the different lengths of the merges can introduce additional subregion boundaries (that is, some of the subregions might contain only a portion of a source statement).
7.2.3 Repeated merging: successively longer paths
Figures 7-4a and 7-4b illustrate the application of cross-jumping to a program schema in which successively longer paths join. This schema is really the same schema as Figure 7-3, but cross-jumping is applied in the opposite order (which, as this section shows, is extremely unfortunate). This time, six successive merges are required to completely collapse the region: the first merge of each instruction creates a label immediately preceding the instruction; subsequent merges must stop at that label because a merging sequence cannot contain an embedded label. When successively longer paths are merged, the cross-jumping algorithm uses a total of
n—1
M = S s = n2—n.
s=1 2
successive merges to collapse the region completely. We call the successive merges that are stopped by a given label one stage of the entire merge. There are n—1 stages in the complete merge.
The successively longer case has terrible properties: it causes many determiner assignments to be inserted, frequently on the same object locations. Each stage merges p paths of the same length, where p decreases from one stage to the next (p = n—s+1). Therefore the analysis of Section 7.2.1 applies to each stage. This example has sixteen determiner assignments.
p
Dsingle = (S j ) — 1 = p2+p — 1
per stage j=1 2
Dmultiple p2—p — 1 + (p—1) en—p+2 ; p=3,..,n
sentries = 2
per stage e1 + en ; p=2.
In addition, this case creates many instances of indirect determiner uses on the determiner assignments. A layer of identical instructions become internal sentries in one stage and get determiner assignments. In the next stage, these instructions are merged to form a single internal sentry; indirect determiner uses on the merged list of determiner assignments distinguish among the merged sentries. By contrast, in the successively shorter ordering described in the preceding section, instructions are merged first and become internal sentries of a new subregion later. This example has fourteen indirect determiner uses. Together with the sixteen determiner assignments, thirty units of determiner storage are used in the determiner table.
n—1
IUper stage = (p—1)(p—2) + S(p—j)2 = 2p3+3p2—5p + 1
p=2 6
n
IUtotal = S (2p3+3p2—5p + 1) = n4—7n2+6n.
p=2 6 12
fig74.press leftMargin: 2 in, topMargin: .875 in, width: 5.25 in, height: 8.5 in
Figure 7-4a. Actual repeated cross-jumping: successively longer paths (part 1).
fig75.press leftMargin: 1.875 in, topMargin: .875 in, width: 4.9375 in, height: 5.1875 in
M ýn2—ýn When the longest path has n—1 instructions,
i = ýn2+ýn—1 and b = n—1:
Dsingle ýp2+ýp—1
per stage Utotal 1/6n3+1/2n2—2/3n
Dsingle 1/6n3+1/2n2—2/3n Utotal/i ~1/3 n
total
n
Dmultiple n3—n + e1—1 + S(p—1) en—p+2
total 6 p=2
IUper stage 1/3p3+1/2p2—5/6p+1 Dsingle/M ~1/3 n
IUtotal 1/12n4—7/12n2+1/2n IUtotal/M ~1/6 n2
Uper group ýp2+ýp—1 i/M ~1
Figure 7-4b. Actual repeated cross-jumping: successively longer paths (part 2).
The successively longer case creates the same number of determiner uses on source alternatives in the object table as the successively shorter case. (But note that although the number of determiner uses are the same, instructions that are merged in stages 2 through n—1 use different determiners in this case.) If the longest path contains n—1 instructions (the minimum number to permit n—1 successively longer paths), then the total number of instructions in the combined region is
n
i = (S j ) — 1 = n2+n — 1,
j=1 2
and the total number of determiner uses in the object table is
n
Utotal = S (p2+p — 1) = n3+3n2—4n.
p=2 2 6
We now consider the merging of n successively longer paths from a theoretical standpoint. Since each stage s is an instance of merging ps paths of equal length, the idealized analysis of Section 7.2.1 applies: to distinguish among the ps paths, stage s should create ps distinct determiner assignments and ps determiner uses per instruction group. If the next stage s+1 were to combine the identical sentries minimally, only ps—1=ps+1 indirect determiner uses would be needed to distinguish among them. At a primary breakpoint, the determiner with the most recent timestamp value among the p alternatives would indicate the correct source alternative. Figure 7-5 shows the idealized view of cross-jumping n successively longer paths.
7.2.4 A comparison of the successively shorter and the idealized successively longer cases
For the successively shorter case, the total storage used when the longest path has n—1 instructions is
D = n — 2 + e
IU = 0
U = 1/6 n3 + 1/2 n2 — 2/3 n
---------------------------------------------------------
1/6 n3 + 1/2 n2 + 1/3 n — 2 + e.
If the same region were merged using the idealized successively longer ordering, the total storage used would be
fig76.press leftMargin: 1.875 in, topMargin: 1.165 in, width: 5.0625 in, height: 5.50 in
M ýn2—ýn When the longest path has n—1 instructions,
i = ýn2+ýn—1 and b = n—1:
Dsingle p
per stage Utotal ýp2+ýp—1
Dsingle ýn2+ýn—1 Utotal/i 1
total
Dmultiple ýn2—ýn—1+e
total
IUper stage p ; p=2,..,n—1 Dsingle/M ~1
IUtotal ýn2—ýn—1 IUtotal/M ~1
Uper group p i/M ~1
Figure 7-5. Idealized repeated cross-jumping: successively longer paths.
D = 1/2 n2 — 1/2 n — 1 + e
IU = 1/2 n2 — 1/2 n — 1
U = 1/2 n2 + 1/2 n — 1
-----------------------------------------
3/2 n2 + 1/2 n — 3 + e.
Surprisingly, when n is five or greater, the idealized successively longer ordering uses less total storage. However, the idealized successively longer case does require more merges (as well as an as-yet-unspecified minimization step; see Section 7.4): 1/2 n2 — 1/2 n as opposed to n—1. Both cases activate the same number of determining breakpoints for a given primary breakpoint, and both cases examine the same number of timestamp values at a given primary breakpoint.
7.2.5 Repeated merging: embedded control flow
The cross-jumping algorithm considers labels for cross-jumping in successive linear passes through the abstract code stream. Therefore, the successively longer merge ordering can be doubly expensive: the first merge of each stage creates a new label, which prevents later applications of cross-jumping from merging longer paths. Furthermore, that new label is most often inserted earlier in the abstract code stream than the label currently being cross-jumped, so that another pass through the code stream is necessary. Therefore, each stage of the merge typically requires a separate pass through the entire code stream. Of course, a more complicated cross-jumping algorithm could maintain a list of labels to be considered so that an entire extra pass would not be needed.
7.3 Reducing the number of path determiners by changing the path determination algorithm
The behavior reported in the preceding section is frequently undesirable. In an attempt to improve both the number of applications of cross-jumping required to merge a region completely and the resulting path determiner structures, I experimented with the cross-jumping and path determination algorithms. I also wanted to examine other algorithms because other compilers place different restrictions on the composition of a merging sequence. Each of the modifications discussed in this section increased efficiency in several respects, but they also created problems with providing expected debugger behavior. These problems were not insoluble, but each required some special-case checking and action. The complexity of the resulting full algorithms finally caused these improvements to be rejected in favor of the conceptually simpler minimization post-pass sketched in the following section.
Algorithm NCJ2 is conceptually simple and complete. Every instruction gets a determiner use (and an indirect determiner use, if the instruction is a sentry) for every sequence it participates in. Every sentry gets a determiner assignment for every sequence whose entrance it guards. The improvements discussed in the following sections attempt to remove the path determiner redundancy [clearly demonstrated in Section 7.2] that can be caused by repeated merging. Unfortunately, removing the redundancy can make it impossible to provide expected behavior for some cases; this may be an acceptable tradeoff. Sometimes it also causes the debugger to report an incorrect source alternative; this cannot be tolerated.
7.3.1 Improvement 1: First crossing marks (no determiner use lists)
The equal length case of repeated cross-jumping [described in Section 7.2.1] uses more storage than is theoretically necessary because every time an instruction belongs to a merging sequence, every source descriptor for that instruction gets a determiner use for the sequence's determiner. Furthermore, the determiner assignments grow similarly: each merging sequence for a given merge gets a new determiner assignment at its sentry, regardless of whether that sentry was already guarding that exact instruction sequence. This explosion of the number of determiner uses and determiner assignments can be prevented by modifying the path determination algorithm as follows: Each source descriptor gets a determiner use for (only) the path determiner for the first merging sequence to which it belongs. Similarly, when a path determining instruction appears in a merging sequence, its determiner assignment(s) get an indirect determiner use (only) for the first merging sequence that contains it. As a result, when a path has been merged before, the algorithm relies upon the original determiner assignments (as well as possible chains of indirect determiner uses) to distinguish among the source alternatives and path determiner assignments. Figure 7-6 shows the application of this version of the algorithm, called the First Crossing Marks algorithm, to the successively longer paths of Figure 7-4.
This change saves storage for determiner uses in the object table directly: each source alternative contains exactly one determiner use. It also saves storage in the path determiner table (as well as processing time at a primary breakpoint) indirectly: a determiner for a merging sequence that included only previously-merged instructions (and no sentries) will not be used by any instructions nor any determiner assignments. Therefore, all assignments of that determiner can be deleted from the determiner table. [Actually, the condition under which a determiner assignment can be deleted is a bit more complicated: no source descriptor uses it, and no sentry descriptor uses it or its opposite. This distinction between source and sentry descriptors arises because it is possible that only one of two merging instructions has a sentry descriptor, but both instructions must have a source descriptor. Therefore a determiner use for each merging sequence will automatically be created for source descriptors, while it may not be for sentry descriptors.] For the equal length and the successively shorter cases, this algorithm uses only n distinct determiner assignments for n paths. Determiner assignments mark only the external sentries.
However, there is a problem with this improvement. Debugger correctness property D2 [setting a breakpoint when the execution point is already inside a merged region never causes incorrect behavior; see Section 7.1.2] may fail. If a merged region has unmarked internal sentries and each primary breakpoint activates only the determiners needed to distinguish among paths to it at runtime, then the timestamp cells can give incorrect information when multiple breakpoints are placed in a repeatedly merged region. Consider the following program fragment, in which three paths have been merged using the improved algorithm and the successively shorter ordering. As a result, control flow can enter the region through a currently inactive external sentry s. If s is still inactive when the primary breakpoint is reached, the debugger can detect this unusual condition and report the source alternative corresponding to s also. But if s is activated between the time that control entered the region and the time that the primary breakpoint is reached, the debugger can be fooled into giving an incorrect response.
1 SELECT x FROM 1 SELECT x FROM
#"#"#"#" "#"#"#"
10 1 => {x ← 20; 10 1 => {x ← 20;
11 P[3]; <go to L2>}; pd4
12 d ← 4};
#"#"#"#" "#"#"#"
20 3 => {y ← 10; 20 3 => {y ← 10;
21 b ← 1; B2 <go to L1>}; pd2
22 c ← 2;
23 P[3];
24 d ← 4};
#"#"#"#" "#"#"#"
30 5 => {z ← 0; 30 5 => {z ← 0; pd1
31 b ← 1; L1: b ← 1; (311, 212) primary bkpt B2
32 c ← 2; c ← 2; (321, 222) [pd3 - unreferenced]
33 P[3]; L2: P[3]; (331, 232, 114)
34 d ← 4}; B3 d ← 4}; (341, 242, 124) primary bkpt B3
35 ENDCASE; 35 ENDCASE;
fig76b.press leftMargin: 1.5625 in, topMargin: .875 in, width: 5.875 in, height: 8.5 in
Figure 7-6. First Crossing Marks algorithm on successively longer paths.
Referring to the program fragment, one possible faulty scenario proceeds as follows: Before starting the program, the user sets breakpoint 1 inside procedure P and breakpoint 2 at statement 21. Breakpoint 2 activates path determiners 1 and 2. After execution begins, control enters the merged region through determiner assignment 1 at time t1. The primary breakpoint for breakpoint 2 is reached (control is not passed to the user), breakpoint 1 inside P is reached, and then control leaves the merged region. Later, the merged region is entered through determiner 4 (which has not yet been activated) at time t2, and breakpoint 1 inside P is reached again. While at breakpoint 1, the user requests breakpoint 3 at statement 34, activating path determiner 4, and proceeds. (Note that procedure P has no knowledge of any determiners active in its caller.) Upon arrival at the primary breakpoint for breakpoint 3, the timestamp cells indicate that statement 34 is executing (timestamp4 = 0, timestamp2 = 0, and timestamp1 = t1). Therefore, although statement 12 is actually executing, the debugger relinquishes control to the user, saying that breakpoint 3 at statement 34 has been reached.
There are two solutions to this problem that retain the improved path determination algorithm.
One solution is to record the time that each determining breakpoint was set. The algorithm for fielding a primary breakpoint at an instruction b becomes more complicated. One pass through the determiners discovers the time t of the most recent recorded sentry of the merged region from the determination timestamps. A second pass identifies the set of possible determiners, which are all determiners referenced in the source alternatives for b that (1) have a determination timestamp equal to t, (2) have at least one assignment whose determining breakpoint was set more recently than t, or (3) have not yet been activated. When debugging is active, this solution uses more debugger storage and requires additional processing at each arrival at a primary breakpoint.
The other solution is to create equivalence classes of determiners, clustering all determiners that cover a given (possibly complex) merged region together. Then all determiners for a given region would always be activated at the same time. These equivalence classes could be created during the cross-jumping optimization or in a single post-pass through the statement map and the determiner table. This solution requires a small amount of additional compilation time to discover the equivalence classes and extra debugging table space to store them. These increases are likely to be more than offset by the reduction in compilation time (for creating and managing the path determiner structures) and path determiner storage that this version of the algorithm permits.
7.3.2 Improvement 2: Allow embedded labels and embedded unconditional jumps
The successively longer case of repeated cross-jumping (described in Section 7.2.3) is much worse than the equal length or successively shorter cases because many more applications of cross-jumping (which insert more determiner uses and assignments) are needed to collapse the region completely. The additional applications of cross-jumping are necessary because the first merge of each stage inserts a new label, thereby preventing later applications of cross-jumping from merging longer paths. If merging sequences were permitted to contain embedded labels, the later applications of cross-jumping could match the full length of the identical tails at once, and would require a total of
M = n—1
successive merges to collapse the region completely. This is the same number of merges as the equal length and successively shorter cases. This modification is completely separable from the modification discussed in the previous section. The version described in this section, called the Embedded Label algorithm, has determiner use lists as described in Chapter 6.
Very few changes to the cross-jumping and path determination algorithms are required to permit embedded labels. The cross-jumping instruction matching routine is altered to skip over embedded labels in either merging sequence. As before, the path determination algorithm gives all sentries of a merging sequence a path determiner assignment for that sequence. However, some of the sentries may now be immediate dynamic predecessors of an instruction other than the top instruction in the sequence.
For additional efficiency, two more changes are made to allow paths with internal control flow, in the form of embedded forward jumps (an IF-THEN-ELSE structure), to be matched in a single application of cross-jumping. First, the cross-jumping instruction matching routine is altered to permit embedded jumps in either merging sequence. Two unconditional jumps match if they have the same jump destination. Two conditional jumps match if they have the same opcode, the same operand(s), and the same jump destination. [Note that conditional jumps are only permitted strictly inside a merging sequence (that is, they are still not permitted as link cells).] Second, when the backward comparison scan encounters an embedded label in the to-delete sequence, it immediately redirects each jump to that label to the corresponding point in the to-remain sequence, inserting a new label there if necessary. This "instant branch chaining" step means that two "congruent" forward jumps will become identical by the time that the backward comparison scan reaches them. It does not provide a general flowgraph-structural match for jumps, however: two "congruent" backward jumps will never match. Note that these internal jumps are not sentries of the merging sequence: a sentry of a sequence is always outside that sequence. [A sentry occurring inside its sequence would overwrite its determiner's timestamp value.] When the backward match and merge occur in one backward sweep, the sentries for each embedded label receive a determiner assignment for the label's path determiner when the label is encountered. When a forward jump to such a label is encountered later in the same backward sweep, it has a determiner assignment equal to the determiner for that sequence. This determiner assignment is removed.
As shown in Figure 7-8, this improved version of the algorithm can merge the successively longer paths of Figure 7-4 in three successive applications of cross-jumping, rather than six. It can merge the complicated nested example of Figure 6-11 [see page 162] in three merges rather than four.
Two problems can arise from this improvement. The first problem is the introduction of nonessential ambiguity. That is, a single sentry z might dynamically precede a different instruction in each merging sequence, as illustrated in Figure 7-7. As a result, z is given determiner assignments for both merging sequences—whenever the merged region is entered through z, the values of the determination cells do not distinguish between the two regions. This situation is particularly unfortunate because it is possible to place more determiner assignments in this region so the source alternatives can be distinguished. If c were merged alone, followed by merging a and b as a unit, the resulting four determiners can distinguish between the two source alternatives for c.
fig711.press leftMargin: 1.875 in, topMargin: .9375 in, width: 5.1875 in, height: 2.5 in
Figure 7-7. Introduced ambiguity in the Embedded Label algorithm.
fig77.press leftMargin: 1.6875 in, topMargin: .53125 in, width: 5.5 in, height: 8.5 in
Figure 7-8. Embedded Label algorithm on sucessively longer paths.
The second problem is worse: incorrect path determination might result. This situation arises when an instruction inside a merging sequence is an immediate dynamic predecessor of an instruction in both merging sequences. This situation, depicted in Figure 7-9, can arise when conditional jumps and labels are permitted inside merging sequences. For instance, the dynamic control flow information collected at the two determiner assignments do not determine which of statements 12 and 32 are executing at instruction b.
fig712.press leftMargin: 1.5625 in, topMargin: .9375 in, width: 5.375 in, height: 2.4375 in
Figure 7-9. Incorrect path determination in the Embedded Label algorithm.
The most satisfactory solution for these problems is to detect them during the application of cross-jumping and disallow embedded labels for their particular cases.
7.3.3 Improvement 3: First crossing marks and allow embedded labels and jumps
The two improvements to the cross-jumping and the path determination algorithms can be combined to create an algorithm that merges paths in fewer iterations and uses only one determiner use per source or sentry descriptor. This algorithm, called the Combined algorithm, has the problems of the preceding two algorithms plus a few of its own. [This is the algorithm reported in my previous paper on Navigator [Zellweger83].] For comparison with the previous algorithms, Figure 7-10 shows the application of the Combined algorithm to the successively longer paths of Figure 7-4.
This combined version causes additional problems for repeated nested merging: that is, for cases in which each of two identical sequences has internal identical sequences. Because of the "first crossing marks" property, the Combined algorithm works only when the inner merge occurs first; the inner determiner assignments then depend upon the outer determiner assignments. When the outer merge occurs first, the Combined algorithm creates ambiguity (i.e., only truthful behavior). These problems do not occur for the First Crossing Marks version because embedded labels are necessary to create nested regions. Note that for either "determiner use list" version (i.e., the original cross-jumping algorithm or the Embedded Label version), it is more desirable to merge the outer region first because then the region can be merged completely with only two merges rather than three.
To solve the problems with the Combined algorithm in general, some sort of recursive cross-jumping algorithm could be used to ensure that inner merges always occur first. Another possibility is to identify these problem cases and fall back to one of the previous algorithms when one of them is discovered.
The following cases demonstrate situations in which the outer merge occurs first. These problems can be created in actual Cedar programs: the juxtaposed case can arise in SELECT statements, while the backward jump nested case can arise in loops containing conditionals. Special-case solutions are identified for each case.
Juxtaposed case (four paths to the merging label). Suppose that two paths to the merging label have identical tails and are statically juxtaposed; they thereby form one "double" sequence. Suppose further that this sequence matches an identical "double" sequence. These two "double" sequences therefore have opportunities for either two or three merges, depending on the merging order. The merging order that causes difficulty is the one in which the "double" sequences are merged first and the internal merge occurs second; Figure 7-11 illustrates the difficulty. These "double" sequences are legal merging sequences for the Combined algorithm because it permits embedded labels and unconditional jumps. In this situation, the first merge gives all of the source descriptors determiner uses for its two determiners. The second merge includes only previously-merged instructions. Since the Combined algorithm never marks such instructions, the two determiners for the second merge are neither activated nor examined to distinguish between the two inner paths—in fact, they are deleted. The final result is that the instructions that participated in the second merge can only be reported as lists of two source alternatives.
As a special-case solution for the juxtaposed case, we can disallow embedded jumps to the current merging label.
fig78.press leftMargin: 1.6875 in, topMargin: .53125 in, width: 5.5 in, height: 8.5 in
Figure 7-10. Combined algorithm on successively longer paths.
fig713.press leftMargin: 1.4375 in, topMargin: .3125 in, width: 5.75 in, height: 8.5 in
Figure 7-11. Juxtaposed case for the Combined algorithm.
Backward jump nested case (two paths to the merging label). Suppose that two identical sequences with internal cross-jumping opportunities (as a result of internal control flow) each end in a backward jump to the merging label. [This situation is similar to the nested cross-jumping depicted in Figure 6-11, page 162.] Because the merging label appears earlier in the codestream than either sequence, the outer sequences are considered for cross-jumping before the inner sequences. As in the juxtaposed case, performing the outer merge first when each source descriptor can only have one determiner use means that the path determiners of the inner merge are never used. The inner statements cannot be distinguished by the resulting path determiner structures.
As a special-case solution for the backward jump nested case, we can delay the application of cross-jumping. Rather than trigger cross-jumping on a label as soon as that label is encountered during the codestream scan, trigger it upon encountering the latest item among the label and all jumps to it. To detect this condition, a "seen-this-codestream-pass" flag can be added to each jump and label. Each successive cross-jumping pass would toggle the flag.
7.4 Reducing the number of path determiners by adding a post-pass
Rather than modifying the path determination algorithm to generate fewer determiner uses and assignments and dealing with the attendant problems, the path determiner structures produced by the unmodified Algorithm NCJ2 could be simplified directly. Path determiner minimization could be conveniently applied as a post-pass over the completed path determiner table and statement maps for a procedure. It would identify unneeded or redundant determiner uses and assignments, deleting them from all debugging tables. In some cases, it would coalesce a group of determiners into a single new determiner.
The minimization algorithm proceeds as follows:
1. For each path determiner p, construct a fully-expanded logical clause that expresses the control flow paths on which p is assigned. To do this, form a disjunction from the object locations listed for p in the path determiner table. A determiner with indirect determiner uses becomes a conjunction of its (direct) object location and all of its successive indirect object locations.
2. Minimize these logical clauses.
3. For each (possibly complex) merged region, construct a structured equivalence class of determiner uses and indirect determiner uses for each instruction in that region. Minimize this equivalence class as a unit.
4. Construct new minimized determiner uses by expanding each list of determiner uses (or indirect determiner uses) as a conjunction of the minimized path determiner disjunctions from Step 2 and minimizing the resulting clause.
5. Map the resulting minimized determiner use clauses back into path determiner identifiers. This step may require some path determiner restructuring.
6. Relabel the determiner uses as needed.
7. Map the minimized determiner assignment clauses from Step 2 back into path determiner identifiers.
8. Delete any unreferenced determiners or determiner assignments.
Figure 7-12 illustrates the operation of this algorithm on the final path determiner structure for the successively longer case of repeated merging, from Figure 7-4b. Note that the resulting structure is the idealized structure shown in Figure 7-5.
Separating the path determination algorithm from the steps needed to make it efficient to store and access at runtime solves only part of the problem. When using the original algorithm, the compiler must still introduce and store many extra determiner uses and assignments; it must also use additional merge steps because of the labels introduced by earlier merges. Path determiner minimization might also be fruitfully applied to the results of the improved versions of the algorithm—although more efficient, these versions are not guaranteed to yield minimal path determiner structures.