[_CD4_]<datools7.0>SoftHdw>SoftHdwPgmOld071789.tioga!1

A Programmable Logic Architecture

Richard Barth

Draft of May 22, 1989 11:56:56 am PDT

1.0 Introduction

This paper describes a new implementation of RAM programmable logic. A description of the closest commercial realization of this type of logic can be found in [Xilinx].

2.0 Assembly Language

This section describes the low level abstraction used to program the devices.

2.1 Basic Programming

Figure 1 shows the low-level wires from which all larger structures are built.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 1. Basic Interconnect

Output is the wire which carries the output of a basic gate. Input carries the input of a basic gate. RightDown is a short length of uncommitted wire whose direction is fixed to right or down. LeftUp is similiar but its direction is fixed to left or up. The dots on RightDown and LeftUp indicate that the value is inverted when making the transition across the wire segment. Long is a wire which extends across the width of a chip to enhance interconnect density and performance. Its direction is not fixed.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 2. Minor Array

Figure 2 shows an array of basic interconnects organized as a repetition in the horizontal and vertical directions. The basic interconnects are numbered, increasing from left to right, and top to bottom.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 3. Programming Primitives

Figure 3 portrays the graphical symbols used to program the array. Program indicates that an input wire is connected to an output wire through a gate. Inverter indicates that a signal is inverted. FlipFlop indicates that a signal flows through an edge-triggered flip-flop. The allowable placement locations for these tiles are described later in this document.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 4. Basic Logic Function Classification

Figure 4 classifies the 16 functions of 2 variables. A single output wire can perform the nand function of 5 variables. Through the use of De Morgan's theorems, the programmable number of inputs, and the propagation of inversions, this basic capability can perform any of the function classes except xor, which requires three output wires for implementation.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 5. Example Function

Figure 5 shows the programming of the function horizontal0 ← not (vertical0 * vertical2 * vertical3), in the top row of a minor array. The remaining 3 rows are elided for clarity. In this same manner all horizontal outputs can be functions of all the vertical inputs (not the RightDown, LeftUp, or Long lines), and all vertical outputs can be functions of all horizontal inputs. In addition, the input parallel to an output and of the same index can participate in the formation of the output as illustrated in figure 6, which computes horizontal0 ← not (horizontal0 * vertical0 * vertical2 * vertical3).

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 6. Another Example Function

The input wire can be delayed by a flip-flop and/or inverted. Figure 7 shows the same function as figure 6 except that the horizontal input has been delayed and inverted.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 7. Example Function with Inversion and Delay

The interior of a chip is formed by tiling the plane with minor arrays. Figure 8 illustrates a 4 by 4 array. A realistic chip using 2 micron DLM CMOS has an array which is 16 by 16.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 8. Sample Chip Interior Tiling

In this array logical networks are constructed by connecting physical wires in straight lengths, either horizontally or vertically, and then connecting the vertical and horizontal segments together with corners formed from 1 input gates.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 9. Connection Tiles

Figure 9 displays the tiles used to connect segments of wire together. At each intersection of basic interconnects any combination of these tiles may be used except that no wire may be driven more than once. The titles indicate function as follows:

ORDToI - Output Right Down To Input

LUToLU - Left Up To Left Up

OLUToI - Output Left Up To Input

ORDToL - Output Right Down To Long

RDToRD - Right Down To Right Down

ORDToLU - Output Right Down to Left Up

LUToI - Left Up To Input

OLUToRD - Output Left Up To Right Down

LToI - Long To Input

RDToI - Right Down To Input

The tiles are shown here in the orientation used for horizontal interconnects. Each tile must be rotated 90 degrees, mirrored, and flipped before use in vertical interconnects. This cumbersome mechanism of tiles is used to simplify printing and parsing of assembly language programs. It also succinctly expresses the permissible interconnections amoung the 9 wires which participate at each junction. Only 10 connections are legal out of the 81 possible.

Figure 10 is an example of routing a logical net through several wires, utilizing 2 corners. The net begins in the top horizontal output of the left minor array, runs to the top horizontal input wire of the right minor array, goes through a corner onto the leftmost vertical output of the right minor array, is fed back to the left vertical input wire, through a corner, onto the second from top horizontal output, and finishes as the second from top horizontal LeftUp wire in the left minor array.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 10. Logical Net Example

Whenever a function requires the inverted form of an input, an Inverter symbol must be placed on the corresponding input. Figure 11 computes the OR of two variables by inverting the inputs.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 11. OR gate implementation

Some functions require an inverted output, e.g. the AND gate. This is achieved by inverting the destination inputs. Since each corner also introduces an inversion, the input inversion is actually controlled by the parity function of the input inversion requirement, the source output inversion requirement, and all of the corners between the source output and the input.

2.2 Pin Programming

Eventually the abstraction of an infinite plane of silicon must break down. At this point we introduce the traces of the printed circuit board, to which the pins of the integrated circuit are attached, into the abstraction. In this section we describe the details of connecting the inner array to the pcb traces.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 12. Minor Array with Pins

In figure 12 only one minor array is shown. In a real array, which is 16x16, there would be 64 pins on each edge instead of the 4 pins shown.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 13. Pin Connection Tiles

Figure 13 shows the tiles available to connect the pins to the array. The upper row is the set of tiles available for the left and top (when rotated 90 degrees, mirrored, and flipped) edges. The lower row is the set of tiles available for the right and bottom (when rotated 90 degrees, mirrored, and flipped) edges. When a tile appears in both rows this indicates that the tile may be used for any of the four edges. The titles indicate function as follows:

OToP - Output To Pin

RDToP - Right Down To Pin

LUToP - Left Up To Pin

LToP - Long To Pin

PToI - Pin To Input

PToRD - Pin To Right Down

PToLU - Pin To Left Up

PToL - Pin To Long

2.3 RAM and ROM Programming

The flip-flops available on each input frequently do not supply sufficient memory density. Four tiles are available, RAMEven, RAMOdd, ROMEven, and ROMOdd, each of which cover one minor array, to supply denser memory. These tiles are only available in a single orientation, i.e. read data always flows up, write data always flows down, read address always flows right, and write address always flows left.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 14. Minor Arrays with RAM and ROM Tiles

Figure 14 shows the tiles. In these tiles the RightDown and LeftUp wires are assigned RAM functions. The top row contains the RAM tiles, the bottom row the ROM tiles. The Even tiles are those in which the RE and WE signals are asserted positive true. The Odd tiles are those in which the RE and WE signals are asserted negative true

The RAM and ROM tiles are organized as 8 words, each of 4 bits. Horizontal LeftUp 3 is the write enable, WE. If it is high when the rising clock edge occurs then the value of the vertical RightDown wires will be loaded into the addressed word. Horizontal RightDown 3 is the read enable, RE. The addressed word is placed, combinatorially, on the vertical LeftUp lines when RE is high. Horizontal LeftUp 0-2 are the write address bits, WA0-WA2. They select one of the 8 words in the RAM. Similiarly, horizontal RightDown 0-2 are the read address bits, RA0-RA2. Note that none of the Program tiles may be used in a minor array which is being utilized as a RAM since they share the same RAM bits.

The mechanism for specifying the contents of a ROM, or the initial contents of a RAM has not been specified yet.

2.4 Clock Programming

Not all systems can be implemented with a single clock. Some systems are more naturally partitioned into segments which run at different clock speeds. To accommodate these requirements, tiles are available to select which clock is to be used for each minor array. The input flip-flops and the RAM tiles are both controlled with this clock selection mechanism.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 15. Clock 1 Tile

Figure 15 shows a minor array with the clock specified as clock one. The default clock is clock zero. Currently only these two clocks are supplied. Extensions to more clocks, and clocks sourced from inside the array, instead of from the pins of the chip, are possible.

2.5 Mux Programming

Multiplexing is implemented as a set of tiles, which may not be rotated. They are extensions of the ROM tiles. Instead of the data appearing, however, the data read from the ROM controls the multiplexor for the RightDown or LeftUp lines, depending on the tile instantiated.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 16. Multiplexor Tiles

The horizontal RightDown wires select the word of ROM which is used to control each bit of the multiplexor. If all zeroes and all ones are put into every other word, then a normal multiplexor, controlled by the low order address bit, is obtained. Even and Odd tiles are not needed since the programming can adjust for the polarity of the MA signals.

2.6 Rotate Programming

Rotation is implemented as a set of tiles, which may not be rotated. Figure 17 shows the rotate control tiles.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 17. Rotate Tiles

The horizontal RightDown wires of the RotateSourceUpRight tile determine the mapping of the horizontal Input wires and the bottom vertical LeftUp wires onto a special set of wires which run up. The mapping of a RotateInnerUpRight tile is the same as the mapping of the minor array to the lower left. Unlike the normal LeftUp wires, no inversion takes place at each minor array. Figure 17.1 shows the tile which forces the LU output to be the value of the special wires.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 17.1. Rotate Output Tile

Figure 18 describes the value of the special wires according to the rotate control. The lower inputs, (a, b, c, and d), and upper outputs (i, j, k, and l), are not the LeftUp wires but the "hidden" special wires. Thus, with the exception of the horizontal input wires, none of the resources of a minor array are committed by a rotate tile, so they are available for any other purpose. The RotateSourceDownLeft and RotateInnerDownLeft tiles have the obvious correspondence to the RotateSourceUpRight and RotateInnerUpRight tiles.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 18. Rotate Mapping

2.7 Tristate Programming

Tristate programming is available as a tile which enhances the ORDToL tile by using the Left-Up output to control the long line driver. Figure 18.1 illustrates the usage of the tristate tile in a situation where a common control line is running horizontally and data lines are running vertically.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 18.1. Tristate Tile

This tile allowsemulation of tristate buses that are straight and contained only within one chip. Some day there will be a pin tile that helps gang a tristate bus across chips, but that will be a directional solution, i.e. either the bus will have to be manually wired end-around (ycch) or all the receivers will have to sense a different grain so that information is collected upwards from the drivers then broadcast downwards to the receivers.

3.0 Chip Interface

In this section we describe the interface of a feasible chip which has a 16x16 array of minor arrays in it. This is equivalent to the data sheet description of a microprocessor, where the preceding section was equivalent to the architecture manual.

4.0 Programming Hints

Some of the ways in which this architecture may be used are not obvious. Here we describe how to program many of the structures which are common to ordinary hardware design.

The detailed programming which follows assumes that each primitive is to be packed into minimal space, excluding consideration of form factor requirements.

4.1 PLA

A programmable logic array is a structure which is trivially built from the available primitives. Figure 18.1 shows a finite state machine.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 18.1 Sample Finite State Machine

The custom layout of the PLA for this finite state machine is shown in figure 18.2.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 18.2 Sample FSM Layout

Figure 18.3 shows the programmable logic implementation of this FSM. All feedback paths and flip-flops are shown, unlike the custom layout shown in figure 18.2.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 18.3 Sample FSM Programming

4.2 Counters

We will specifically consider only the case of a binary up counter with an enable in the form of CarryIn. This simple example exposes the fundamental structure of a counter without the complications of the fully general binary up/down counter with enable and load. We also fail to consider simpler counter structures such as Johnson counters and linear feedback shift register counters. [Xilinx] contains a good discussion of these other counter types, although their designs are, of course, not cast in terms of the abstraction used here.

Both adders and counters have at their heart a carry propagation network. The two basic combinatorial techniques for computing carry propagation are ripple carry and carry lookahead. Figure 19 shows the basic cells required to build counters with these two types of carry propagation networks.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 19. Basic Counter Cells

Counter Bit maintains the state of the counter and contains the XOR feedback. Ripple Carry computes carries for the ripple counter and Carry Lookahead computes carries for the carry lookahead counter. Figure 20 shows an 8 bit ripple carry counter. This structure requires O(n) space and time.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 20. Ripple Carry Counter

Figure 21 shows the translation of the three low order bits of the ripple carry counter into a programmable logic program. The carry propagates from the right hand side to the left hand side in the top row of the minor arrays. Some rows and columns are used purely as routing so that both true and complement forms of both carry and q are available to compute the XOR. Each of the lower two rows computes one term in the expression CQ'+C'Q. This equasion has been transformed so that the XOR is computed as ( (C*Q')' * (C'*Q)' )' which consists of nothing but NAND gates, our basic computational element in the underlying technology.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 21. Ripple Carry Counter Programming

There are many variants of the basic carry lookahead structure depicted in figure 22. These variants generally reduce the height of the propagation tree by using gates with more inputs, or by using ripple carry at higher levels in the tree. The major advantage of the carry lookahead approach is to reduce the computation time to O(log n) at the cost of increasing the space to O(n log n).

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 22. Carry Lookahead Counter

However, the timing model which gave rise to the carry lookahead technique assumes that wires have zero delay. This assumption is not true in general, although it was effective for previous technologies. In the programmable logic technology we are using here, this assumption is not effective unless the long lines are brought into play.

Figure 23 shows a four bit counter using carry lookahead techniques. It is not clear that it is even as fast as a four bit ripple carry counter. However the top row of minor arrays demonstrates the utilization of long lines to connect the carry and propagate signals from the right hand pair of bits to the carry propagate network at the top of the left most column of minor arrays. As the size of the counter increases the performance of this carry lookahead counter will dominate the performance of the ripple carry counter.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 23. Carry Lookahead Counter Programming

Note that the middle row of minor arrays does not use the long lines since the distance between carry lookahead cells has not grown sufficiently to make this advantageous. Also note that the long lines run the entire width of the chip, not just across the number of stages required to make the connection. This means that there is a wire assignment conflict for counters of 8 bits or more. This means that there is asymptotically a factor of O(n2) space consumption. However, since the chip is of fixed width, there is not much practical significance to this asymptote.

4.3 Adder

Adder techniques are straightforward extensions of the techniques presented in the counter section. Just as we did in that section, we simplify the problem to its essentials, leaving generalization to full arithmetic logic units to the needs of specific applications. Here we consider the problem of adding two n-bit numbers, along with a carry input. We will ignore the simple ripple carry case and proceed straight to the carry lookhead case. We also do not present the detailed programming, but just the logical structure. Figure 24 shows the basic cells which implement a carry lookahead adder network.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 24. Carry Lookahead Adder Cells

These cells are combined in a fashion identical to the carry lookahead counter. Figure 25 shows an 8 bit carry lookahead adder.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 25. Carry Lookahead Adder

4.4 Alignment

In this section we describe the construction of barrel shifters from the basic rotate tiles described in section 2.6. Figure 26 shows a 12 x 12 barrel shifter.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 26. 12 x 12 Barrel Shifter

In Figure 26 the inputs are labelled I11-I0, outputs O11-O0, and rotate distance r3-r0. Only the first 12 values of r3-r0 are allowed, the others produce unknown results.

4.5 Mask Generation

A common companion to rotation is mask generation. Given two bit indicies, generate a mask which can select between two sources. In a standard instruction set processor data path, the combination of rotation and mask generation is commonly known as a field unit, since it inserts fields in arbitrary positions within the width of the data path.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 27. Mask Unit

Figure 27 shows the inputs and outputs of a mask unit. L and R specify the left and right edges of the bits of O which are to come from B, the rest coming from A. In this specification of the problem we have required L to be less than or equal to R for any bits of B to be transferred into O. Figure 28 shows the general scheme we will use to program one bit of the mask unit.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 28. Mask Bit Slice

Two bit slices fit into one minor array. Figure 29 shows the programming for a pair of bits.

[ Nectarine figure; type 'Artwork on' to a CommandTool ]

Figure 29. Mask Bit Slice Programming

The programming has been broken into two pieces purely to meet the form factor requirements of 8.5" x 11" paper. The left column is actually stacked upon the right hand column.

5.0 Conclusion

A radically different architecture for RAM programmable logic has been described.

References

[Xilinx]

Xilinx Inc., The Programmable Gate Array Design Handbook, 1986.