GATE MATRIX ALGORITHMS

Introduction:
The gate matrix program converts a set of logic equations to a LSI layout. The program is a Chipmonk subsystem, so its output is directly available to be combined with other Chipmonk logic. The input is an ASCII text file whose detailed format is described with the Operating Instructions (another document). Basically, the input consists of NOR, PASS, and AOI equations between a set of named signals.

x=~(a+b+ ...)
x is shorted to y when z is high
x=~(axb + cxd + ...)

The name "gate matrix" comes from a supposed resemblance between the final geometry of the layout and a matrix whose rows and columns correspond to the signal names and whose gates occur at the intersections of rows and columns. The name Universal Logic Array is also sometimes applied to such a structure.

Dismissing AOI:

The program immediately converts the input equations into a set of pullups and pulldowns. AOI equations add nothing new to the set of pullups and pulldowns, and therefore introduce no new fundamentally difficulty to the problem. AOI will not be considered further here, except to mention that it is not yet implemented in the parser, and are therefore not available in the program.

AOI could introduce something new if one took the view that it was permissable to interchange the order of the pulldowns on the legs of an AOI. Then one might wish to optimize such an interchange. I reject that view on two grounds: first that such an interchange does have quite an electrical effect on the circuits, even though it has no effect on the logic equations, and second that if such interchanges are to be permitted, they ought to be recognized not in the form of the input equation but from the mess of pullups and pulldowns, so that similar interchanges among the pass transistors would be found.

Overview of the Description of the algorithm:

The description will proceed in three Sections.

> First an LSI structure will be described which is general enough to implement any set of equations. This structure will have the unfortunate property that it grows as the cube of the number of signals in the equation set.

> Next a set of reductions to that structure will be described. The reductions, it will be claimed, reduce the structure to something which is more nearly linear in the number of signals. The reductions depend upon a proper ordering of the signals.

> Since the general ordering problem seems to be of factorial order, the actual ordering is produced heuristically. The rules governing this decision make up the Third Section.

Part 1: The general Layout:

Let "n" be the number of signal names implied by our equations.

Visualize a regular grid composed of n horizontal blue wires and n vertical red wires. Imagine that we assign the signals to the wires so that each signal corresponds to one red and one blue wire. There are (n factorial) squared ways to do this, but for the moment any will do (We will be very concerned with this ordering in Part 3). To insure that the red and the blue wires are actually the same circuit we will short the two at their intersection.

This grid has the interesting property that any logic which needs only two signals can be performed locally at one of the two places where the signals intersect. In particular, a pulldown can be implemented at either intersection. However, one intersection is clearly better than the other for such a pulldown, that being the intersection where the gate of the pulldown is already running on a red wire. This is important, because the intent is to space the red wires on 8 lambda centers and the blue wires on 7 lambda centers, and that doesn’t allow much room to shift levels around.

Since we are interested in pulldowns, we must provide ground signals at each of the intersections of the grid. This can be done by spreading the red wires apart, putting a ground wire between every pair of red wires, and attaching all the ground wires to a ground bus at the bottom of the figure. It will turn out to be convenient to run the ground wires in green (not very surprising - the side of the pulldowns you want to connect to ground is green, and green grounds are good electrically).

We must also provide for pullups on every signal. Fortunately this is easy, because pullups are one signal logic rather than two. We simply run a power bus across the top of the figure, and allow for pullups where the red signals cross the power bus.

Unfortunately, not all the transistors derived from our equations have power or ground as one of their legs. Pass transistors use all three legs of a transistor, and we have as yet no mechanism for getting three signals to a single place. To facilitate that we will again spread the red wires apart, this time far enough to allow for n vertical green wires between every pair of red wires. All the pass transistors will terminate in a short green wire which will run horizontally to one of the n vertical green wires, and then up or down to the corresponding blue wire to make its connection. This might seem to involve horizontal green wires crossing vertical ones, and in fact it does if one is foolish about choosing which vertical wire to connect to, but it is fairly easy order the green wires so that this is not a problem.
(Note to me -- why isn’t it a problem? It might be in some perverse case.)

We are about to add a restriction on the equations. Instead of a general tangle of logic at each intersection, we are going to specify that there will be exactly one transistor (or none of course), with its gate controled by the red wire and a green leg controled by the blue wire. It is concievable that one might want several such transistors, with their third legs attached to different signals. Fortunately, this does not make too much sense electrically, and I have trouble imagining a case where it might come up. However, I have never convinced myself that it could not happen.

Now we are done with part 1. The structure we have described is sufficient to implement arbitrary 2 and 3 gate transistors, subject to the hypothetical restriction of the previous paragraph, and its only drawback is that it is somewhat too large.


Part 2: Getting it down to a reasonable size.

The first trick is simply to remove the unused wires. There are actually quite a few wires which are completely unused. Our layout allows for n squared pass transistors, while a typical set of equations in n signals will have about n equations overall, of which more than half will typically be NOR’s. If you believe this is a typical case, then we can remove almost all of the n squared green wires we added to facilitate the pass transistors, coming down to one green wire between every pair of red wires, on the average. This amounts to one whole factor of n, but even so is not enough.

The second trick is to remove half of the ground wires. Each ground wire can service two red wires if one orients the transistors properly, and there is no reason not to (if you don’t count program complexity a reason).

The third trick is again to remove unused wire. All the wires have been defined as extending clear across the layout, but few actually go that far. All ends can be clipped beyond their last point of use. We will except the red wires from this process, because we need to bring each signal to the edge of the figure somewhere, and the red wires are the most convenient places. Actually we will also clip the red wires if they are not external signals, but this is done to reduce capacitence rather than to aid in compression.

The third trick by itself does not result in compression, but it is a short step from there to merge any two rows whose blue wires do not overlap, and any two adjacent green columns should they not overlap. This will result in a real but modest compression. In fact, the amount of compression obtained will depend a great deal on the ordering of the signals on the red wires, and to a lessor extent to the ordering on the blue wires. For random orderings the compression obtained may be modest, but orderings which attempt to place electrically connected signals close to one another achieve dramatic gains. This leads to a clustering problem, similar to many in the literature but differing in a few key details. The heuristic solution of this problem is the subject of the next section.

Experimentally, it seems that the number of rows actually needed is a small constant, somewhere between 5 and 10. Perhaps this constant is really growing as log n; the variation from problem to problem is so large that it could mask such a trend in the data I have seen. One would expect log n on theoretical grounds. On the other hand, real sets of equations have a large number of equations with only one signal on the right hand side, and all such equations disapear in at most two rows of the layout, and provide an extraordinaty flexability in placing the remaining equations. So in fact the number of rows may approach a constant when there is a bound on the number of terms on the right hand side of the equations.

As already mentioned above, the size of a cell in this structure is 7 lambda high and 8 lambda wide(see fig xx). Crossing the cell, always in fixed position, are places for one horizontal blue wire, one vertical red wire, and one vertical green wire. In addition it is possible to run a horizontal green wire for short distances under the blue wire. One has options for contacts of various sorts where the wires cross, and for transistors where a green wire might cross a red wire. In most cases it is possible to fit all combinations into the available space, although this does require the use of three way contacts when appropiate. While it is always possible to fit everything when the blue wire has a whole row to itself, it can be awkward when the ends of different blue wires come up to adjacent cells. These cases can be eliminated by requiring a one cell gap between different blue wires when packing blue wires into rows. The program does somewhat better than this, leaving the gap only when it is needed. One real beauty of the Gate Matrix scheme is that such a small cell seems to get the job done. It is hard to imagine a cell being less than 7 by 7, since 7 is the minimum spacing for both blue and green wires (6x5 would do for the wires alone, of course, but the wires are useless without contacts and the contacts bring it up to 7x7.

When it is necessary to run one or more extra vertical green wires for the pass transistors, the program first looks to see if there is actually space available in the normal green channel. If so it simply uses that space and there is no problem. However, if more space is required the program constructs a whole extra column of 7x8 cells in order to get another green column, even though it could get by with a structure only 6 lambda wide. These extra comumns are rare enough that I forgo the optimization.

There is one further annoying problem. The cells are not quite big enough to hold a pullup. Extending them vertically is not a problem, so there is a special row of outsized cells at the top of the figure, but the cells are not quite big enough in the horizontal direction either. In fact, there is room to place a pullup in the cell, but it will short to the pullup in an adjacent cell. One can even manage to place two pullups in a row, if one of them is placed backward. But there is no way to place three in a row. There are three possible solutions to this problem, and the one chosen is probably the ugliest of the three (but the smallest, on the average).The most straightforward solution is to expand the size of the cell by 2 lambda. Another straightforward solution is to use two rows of pullups, placing alternate pullups on alternate rows. Our solution is to insert an extra column of cells whenever there are three pullups in a row. You might think this would result in a 50 percent expansion, as opposed to the 20% of the first two solutions. But in fact there are quite a few vertical wires without pullups, and as long as one is carefully arranging the order of the columns one may as well take the number of pullups in a run into account as well. Usually a quite modest movement of columns will eliminate the need for extra columns, and the program is ready to insert a few extra columns for difficuit places.

fig xx

This figure tries to represent a typical cell using only bravo characters. It is less than wonderful on that account, but perhaps one can see how two vertical wires, one red and one green, cross a horizontal blue wire.

red green
2 1 4 1 lambda

| | | | | next cell
-----------------
| | | | | empty space (3 lambda)
| | | | |
-----------------
| | | | |
| | | | | blue wire (3 lambda + 1 lambda for contacts)
| | | | |
----------------
| | | | | next cell


fig yy

This figure shows a typical pulldown. Most of the interesting stuff happens in two adjacent cells. The control for the gate comes in on the leftmost red wire, ground comes up from the bottom on the green wire, and the signal being gated comes in on the blue wire from the right. The figure shows the blue wire ending here, but of course it could continue on across the figure and out the left side.


typical pulldown


rr rr
rr rr
rr rr
ggggttttgggggbbbb
ggggttttggxxgbbbb
ggggttttggxxgbbbb
ggggttttgggggbbbb
gggg rr rr
gggg rr rr
gggg
gggg

This figure illustrates how logic often slops over into adjacent cells, but in such a way that it causes no problems. The ground wire is running in the cell to the left of the transistor, occupying the green wire slot. This is ok, because the left cell will be reversed, and if there is a pulldown there it will simply share the ground. The cell to the right of the transistor may also contain a pulldown, in which case it will need a contact in the same place as the contact for this transistor. Fortunately, it can use the very same contact, because the fact that the blue wires cannot overlap insures that the signal needed here is the same one available here.


Part 3: Ordering the Signals.

The success or failure of the program will depend on how well it can assign signal names to the rows and columns. A good assignment will produce between 5 and 10 rows, while a bad one will produce about n/2 rows. This is the difference between success and disaster.

It is important to note that there are in fact two ordering problems, namely columns and rows. The first is much harder than the second, which almost has a guaranteed solution. We will consider the easier row problem first.

Ordering the rows:

Given a column assignment, the problem is to assign signals to rows in such a way as to minimize both the number of rows used and the number of extra columns introduced in order to handle pass transistors. Minimizing both means minimizing the area of the resultant rectangle when they are in conflict. In fact, since the number of rows is expected to be small and the number of columns large, we will assume that whenever the row and column minimizations are in conflict the correct strategy is to minimize rows. So the true strategy will be to first minimize the rows and then perturb the resultant figure to minimize columns, but only allowing perterbations which do not add any extra rows.

It is straightforward to find the minimal row solution. One need only sort the signals by their left ends and then place them one at a time in any available slot in the rows, adding an extra row only when all the other rows are full. The only complexity here is determining whether one must leave a one column gap between signals or not, and that is a straightforward analysis of a finite but large number of cases. It is easy to show that if a signal cannot be placed in the existing space, then there is no other ordering of the signals which would do a better job.

The approach taken to deal with pass transistors is not at all elegant, but fairly effective. First a trial layout is constructed neglecting the pass transistors entirely, according to the previous paragraph. Second, the rows are reordered so that rows with pass transistors are near the top of the figure and rows with nors are near the bottom. This has the virtue of tending to shorten the green vertical wires, thereby making it more likely they will fit in fewer columns. The pass transistor wires shorten because the pass transistors are more clustered, while the nor wires shorten because ground comes up from the bottom of the figure. Then each remaining extra column is examined in turn. The cause of the extra column is that the third signal needed on a transistor is remote, and needs to be brought down to the proper row, but there is some other use of the green column in the intervening space. The solution is to reorder the rows, either wholly or in part. Instead of considering all possible reorderings, the program examines only the first plausable set of reorderings - namely those which involve one of the affected signals and exactly one other row or signal between the two affected signals. This number is small enough to simply try them all. Most of the reorderings will be harmful, either shorting signals or expanding the number of columns needed. But there may be one which eliminates the extra column without any harmful side effects. As one might expect, this technique quickly eliminates some of the columns, but as it continues to be applied progress gets slower and slower. The final result seldom has more that 15% extra columns, but it is annoyingly that a human can usually spot a way to eliminate just about any of the columns the program has failed on. It seems likely that a manual layout would never need an extra green column, or at least that there would never be an extra green column in those regions of the array where not all of the rows were in use. It does not seem practical to give the program the knowledge necessary to play these tricks, for they often violate the cell-like nature of the figure.

Assigning Signals to Columns:

The initial assignment of columns is based on a clustering algorithm. The idea is simple enough - often one can determine by inspection that a particular signal should be "near" another signal. This might happen if the chosen signal occurred only once, for example - it should be located near the place it is used. The algorithm simply inspects all the signals, eliminating all those which can be so trivially dealt with. Surprisingly, one often reduces the whole set of equations to a single signal. If not, one randomly selected constraint is eliminated. Reduction then proceeds as before. Eventually one is left with a single signal, which is then "placed" and the whole assignment built up using the set of nearness relations. The details of this algorithm are given below.

While the output of this algorithm is surprisingly good, it is not good enough. One problem is that the clustering algorithm is trying to minimize some average inter column distance, while the true measure of goodness is the number of rows needed to connect the signals across the worst part of the diagram. The program therefore proceeds to locate the column or columns where the most rows are used, and tries its best to change things so as to reduce this number. Its technique is to systematically ask what would happen if the first column attached to such a row were moved through the bad region. This might increase, decrease, or leave unchanged the number of wires in the bad region. If it decreases, the program notes progress and starts over again. Surprisingly, such a simple technique seems effective; theoretically, one would have to try moving whole clusters through the bad region, and there are a lot of possible clusters, but often one column does the trick. In fact, progress seems to stop not because the program cannot find any more interesting wires, but becasue the bad region becomes quite wide and the wire is used inside the region.

Initial Column Assignment:


The original form of our equations is not very convenient for addressing the layout problem. The equations tell us what is logically connected to what, but have the red/blue and blue/red junctions hopelessly muddled together. So the first step is to make sets of columns which must be connected together by a blue wire. Usually there will be about n such sets, and the sets will contain some small number of elements (theoretically again n, but practically about 3)

Consider a signal which occurs only once in the whole set of sets. It might correspond to an input control line which gates only one internal signal. It is absolutely clear that this control line wants to be placed adjacent to the gate it controls. Any other placement would needlessly lengthen a blue wire (which might or might not do harm, depending on whether this was a critical region, but cannot do any good). All that is unknown is whether to place it right or left of the gate it controls. More generally, if the signal belongs to a larger set (ie is only one of the things which might pulldown the controlled signal) it must be right to place the rest of the diagram without regard to that signal, and then to add that signal anywhere along the existing blue wire for that set. Again there is a choice of where to place it, but it is right to ignore it until the end.

The program does exactly that, pushing the signal onto a list of columns to place later, with an indication of what to place it near, and removing it from the working sets.

Next consider a set with only one element. It might corespond to a normal set from which we have deleted one or more element. It can be placed anywhere. Well not quite anywhere - it corresponds to a zero length row, so it won’t impact the rest of the diagram much, but it will prevent through rows from passing by. It chould be placed at the end of some other row. Anyway, it too can be deleted, and pushed onto the set of things to place later, since it won’t have any global effect.

The progression continues. One can cope with signals which occur only twice, where one occurrance is in a set of exactly two elements. One can cope with a signal which always occurs in conjunction with another signal. I think that’s all I recognize, but there may be others.

Eventually we want the working sets to get down to a single element which we can simply place anywhere, but of course that won’t always be possible. If it was, then the problem would not be np hard, which it almost surely is. But for a lot of practical problems the working sets do reduce fairly well. When they don’t there are two recourses: the programmer can add another case (perhaps three sets of three interlocking signals), or the program could simply delete one of the signals and proceed. The latter is obviously what I do. Each time this is done it potentially adds one unnecessary row to the final diargam. Not a terrible price, considering the other sloppy parts of the algorithm.

It is not a trivial task to construct the ordering from the final single element and the list of pushed down signals, because at each step of the way one is faced with a problem of whether to place the new element to the left or to the right of the structure one has already built, and this involves predicting which side will have the most congestion. I tried several clever rules which were quite inadequate before I hit on a good one - namely always place the new item to the right of the old one. I have no idea why that works, but it seems pretty good.

Speculations

The current program does not do a spectacular job. A better ordering algorithm might get anywhere from 20% to 50% improvement, but that is not the most serious flaw. I have experimented with quite a few of these layouts by hand, and one can always get rather impressive improvements over the mechanical layout(factors or 2 or 3). Unfortunately, the method of gaining the improvement is not a more clever ordering algorithm, but a deliberate violation of the cell regularity. It seems the human designer "recognizes" the essential problem, and then does some one thing to fix it. One example among many - When the pass transistor green wires conflict with ground, it is usually rather easy to bridge the ground wire over intervening columns to connect to a ground on a nearby column. The human recognizes this without even thinking about it, while the program would need a one page subroutine to deal with it. Unfortunately, there are scores of such special cases, and even worse once one violates the cell regularity the special cases start to interact with each other in unforseeable ways.

A second complication is the need to optomize certain frequently occurring structures. A good example of that is the clocked flip flop. It is possible to lay out a good flip flop significantly better than the program can manage it, essentially by using some of the red column space for more than one signal. Unfortunately, the program has no notion of how to deal with an intractable "lump" which sits across several rows and columns and does not admit of the usual modifications.

If one really wants to do a good job of compression in this sort of scheme, there is a further refinement needed. Most sets of equations that I have investigated have a few signals which control a large number of gates. Clock and Reset are obvious examples, but there are usually other signals with names like "ready" and "phase1" which have the same character. In the present implementation each of these signals is supposed to occupy a red column, and all the logic which are gated by the signals are brought to the column on the blue rows. This tends to use a lot of rows, which is bad. It would be far more efficient to carry these critical signals across the layout on blue wires, and transform them locally to red, introducing an extra column if necessary. A different way of saying the same thing is that for many signals it is better to put the logic at the blue red crossing rather than at the red blue crossing, even though it requires a lot more space locally to implement the transistor. It is something of a trick to recognize these signals, and a lot of work to define the off-size cells to make this all happen.

Unfortunately, the modifications of the last two paragraphs interact quite strongly. The really dense flip flops require their clock wires to run horizontally (at a pre-defined spacing).

The upshot of all this is that it seems unlikely that a program will be able to do nearly as well as a determined human, even granted that they are both going to start with the same general gate array structure. The program of course compensates for its lack of goodness by its speed and accuracy, and probably deserves a place in the designers arsenel just on that account.