[Ivy]<Speech>fir>fir.bravo!4

FIR is a 32-tap finite impulse response filter designed for performing high speed convolutions. No delay line for the data is provided on chip so that it can be used in a variety of configurations, as in two dimensional convolutions, or interleaved for higher sample rates. The tap weights of the filter are assumed to be symmetric so an adder is included at the input to each of the 16 multipliers (see figure 1). An adder tree sums each of the 16 products to form the convolution output. The output of each of the adders in the adder tree is divided by two to prevent overflow of the result, while allowing the maximum use of available bits in the data paths for the products and partial sums. A gain multiplier is provided at the ouput of the adder tree for scaling results.

Filter tap weights and data are input as negative true, two’s complement, bit serial (least significant bit first) numbers. The timing signal LSBtime-IN is asserted high during the bit time of the least significant bit of the serial input signals. The arithmetic components used do not constrain the number of bits used in the data paths so that accuracy and throughput can be traded off. If more accuracy is needed for the data paths the same chip can be used by using more bit times. A description of the multipliers used may be found in Two’s Complement Pipeline Multipliers, R. F. Lyon, IEEE Trans. on Communications, April 1976. The product obtained from the serial multiplier is the full product, truncated to the data path length. To guarentee no internal overflows occur while computing a product the data input to the multiplier must have two sign extensions (top 3 bits the same). The largest possible positive coefficient is 2-2↑(-8) [0000000001 using the data format conventions above] and the largest negative coefficient is -2 [1111111110]. Data input pads are pulled up on chip so that they input a zero by default.

The filter taps and output gain are input by holding the Load-Coef pin high and clocking them in the Coef-In pin bit serially, h(0), h(1), . . . h(15), Gain. The taps and Gain are 10 bit two’s complement numbers. Once the taps are loaded, they are held on chip until a new set is loaded.

The chip should be able to run at a 12 MHz bit rate. Thus, with 12 bit data paths the throughput would be 12 MHz/12 = 1 MHz, or 750 KHz with 16 bit data paths. The number of bits used in the data paths must be sufficient to avoid overflow, which is a function of both the magnitude of the inputs, and the the tap weights used. Assume the data path length is chosen to be N bits and that the inputs are restricted to signed N-3 bit numbers (the top 3 bits are sign extensions). The output of the input adders will then be N-2 bits, since a carry may be generated. If the coefficients are restricted to not use their most negative value (-2) the products formed will be N-1 bits. The restriction avoids the case where a data value of the most negative number with two sign extensions is multiplied by a coefficient of -2, which would result in an N bit positive result. The output of the first adder in the adder tree will then be N bits. Dividing this result by two before the next adder stage insures no overflow in its result. This add, divide by two calculation is performed for each stage in the adder tree. The output of the final adder will be an N bit number. This is divided by four (two sign extensions) to avoid possible internal overflow in the output gain multiplier.

Figure 2 illustrates how a number of chips can be used in parallel to obtain higher throughput rates for two dimensional separable convolutions. If the convolvers have a throughput f, 2K of them may be used to perform 2-D convolutions at throughput rate K*f. While this example uses 8-tap filters interleaved to double the throughput (K=2), the method is easily extended to higher factors. Odd and even input signals X(2n) and X(2n+1) are presented simultaineously to separate delay lines built out of one sample delays. Since the inputs to the filters are bit serial, this delay is a shift register with taps every N bits, where N is the number of bits used in the data paths. The two X dimension FIR filters use inputs staggered by one sample to compute the even and odd outputs of the X dimension convolution, Y(2n) and Y(2n+1). Two convolvers are then used to perform the Y dimension convolution. The delay elements used for the for the Y dimension data delay are the line width in samples times the number of bits in the data paths divided by the throughput multiplier K. Two outputs result, the odd and even points of the filtered data.

Figure 3 is a schematic of one of the sixteen tap slices and the timing generator used to generate timing signals for the coefficient register. Signals in the large font are bonded out, while those in the smaller font are internal signals. The coefficient register shifts the tap weight out during the first 10 bit times of a data word. The adder tree which sums the products is shown if figure 4. Delayed versions of the LSBtime output of the multiplier are used to generate timing signals for the adders in the adder tree. The output of each of the adder except the last is sign extended one bit to divide the result by two. Note that the Gain Register is first in the series connected coefficient registers, so that it is loaded last during the coefficient load sequence.