Patent Claims for Integrated Testers Richard Barth and James Gasbarro Draft of March 10, 1989 5:49:10 pm PST Test Vector Compression The interface between the tester and the test device requires four bits per cycle to describe the operation the tester is to perform. These bits are: - Force, this is the data value driven to the DUT - Expect, this is the the expected response data from the DUT - Inhibit, disables the output driver of the tester - Mask, disables error generation when the acquired data does not match Expect In the implementation of a single-chip tester it is impractical to store all of these bits in raw form for each pin in each cycle. This is due to the limited die area available for RAM storage. It is also impractical to use off-chip storage in the form of commercial RAMs because of pin bandwidth and board size limitations. The solution is to compress the storage space required for this data. This is achieved by using several different techniques: - sematic combination - mapping - lossless compression The Force and Expect bits are combined using the sematic combination technique. Since these bits are mutually exclusive for the most part, they can be simply combined into a single bit. This reduces the functionality of the tester somewhat, but separate control can be simulated by successive tester cycles. This technique is common in many testers. The storage required for the Mask and Inhibit bits is reduced using a mapping mechanism. During test execution, the drive state and data validity of many pins of the tester are strongly correlated in both space and time. A data bus is an example of both of these points: all pins of the bus switch between drive and sense at the same time, which means that all of the Inhibit and Mask bits have the same value at any instant in time, and they all change value at the same time. Since many of the tester pin drives are static, and there are relatively few groups of dynamic pins, all combinations of Mask and Inhibit can be stored in a small table as shown in Figure 1. This table can be indexed with a small number of bits per vector, thereby greatly reducing the required amount of storage. <<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>> <
> <<>> After applying these two compression methods, there is still considerable redundancy remaining in the test vectors. To further reduce the storage requirements of the test vectors a lossless compression mechanism is employed. This compression technique utilizes off-line compression on a host processor and real-time decompression inside the tester chip. The general technique was originally developed by Ziv and Lempel [3] and was further enhanced by Fiala and Greene [2] as a software means for compressing data. This is the first reduction to practice in hardware of the decompression algorithm. The encoding takes a set of source symbols (vectors) of variable length and maps them into a set of variable length codewords. The compressed stream consists of a sequence of command words interspersed with literal data words. Two types of commands are used: Copy and Literal. The Copy command has two parameters, the length of the run of symbols to be copied, and their position in the history buffer. The Literal command needs only one field: the length of the run of symbols which follow in the compressed stream. When Copy is interpreted, a sequence of source symbols corresponding to the number specified by the length field is transcribed from the history buffer of the decompressor to the output stream. The index of the first symbol in the buffer is indicated by the position field. At the end of each decompressor cycle, the current output symbol of the decompressor is copied into the least recently written location of the history buffer so that the buffer always contains a history of the most recently decompressed data. The block diagram of an architecture which implements these functions is shown in Figure 2. In order to meet the minimum data delivery requirement of one word per clock cycle, the RAM word width must be at least as great as the number of words required for a single Literal command and one word of literal data. Since the Copy and Literal command words are the same length, this guarantees that Copy commands can meet the minimum data delivery requirement as well, since only the command itself is needed to produce an output symbol. There are two outputs of the prefetcher in this scheme, the next command and a literal data word. The command is interpreted by the control logic to determine whether the next decompressed output word should come from the history buffer or the literal data stream. The multiplexor at the output of the history buffer selects between these two alternatives. The control logic decodes the current instruction to determine where the next instruction occurs in the prefetch buffer. <<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>> Figure 2. Block diagram of decompressor This decompression architecture has more general application beyond testers. Claims should be made in the general context of hardware decompression and then made specific to the problem of testing. Timing Generation See attached paper, "Integrated Pin Electronics for VLSI Functional Testers" for detailed description. Figure 3 shows the circuit used for the Format generator. The Pulse and nPulse inputs are driven by the XOR output of the pulse generator (Figure 2a of the above reference). The Inhibit signal is driven by the Map discussed in the section on Test Vector Compression. Data is the Force Data bit of the test vector and corresponds to Decompressed Data Out in Figure 2. The output signals, DriveHigh and nDriveLow drive the output transistors connected to the DUT pin. The remainder of the signals select the current output format. (This figure should be generalized.) <<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>> Figure 3. Format circuit The Acquire portion of the pin electronics circuitry is responsible for capturing data from the DUT and determining if it matches the desired response. Both the data value and output voltage level must be checked. In the Force Timing generator, both delay and width timing control are needed, but in the Acquire Timing generator only control of the sample delay timing is necessary. This allows the Acquire Timing generator to be simplified to a single delay line. The input to this delay line is still as it was for the Force Timing generator. This introduces a problem in that the frequency doubling effect of the XOR gate is not realized, so the delay line output is only half the desired cycle frequency. Since the die area consumed by the sample and comparison circuitry is much smaller than that of a second delay line, the solution to this problem is to simply build two sampling circuits: one which operates on the positive transition of the sample timing clock and one which operates on the negative transition. A beneficial side effect of this technique is that the data at the output of the two comparators is available for a full tester cycle. This greatly simplifies the problem of resynchronizing the data between the sample logic, and the data comparison pipeline. Since the phase of the captured signal is known, an optional pipeline stage clocked at the midpoint of the cycle ensures that the data can be captured at a stable point in the cycle (Figure 4). In the data pipeline, the acquired value is compared with the Expect Data from the test vector. If the comparison fails and the Mask bit from the test vector is not asserted, the external Error bit is asserted for the cycle. <<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>> Figure 4. Acquisition and synchronization circuit Claims: Pulse generation concept: two delay lines followed by XOR. Method for implementing wide range, high resolution delay line by using three types of delay mechanisms. Compact implementation of the delay generator Correction of pulse edge distortion by Edge Adjust units. Format generator circuit (Figure 3). Use of dual analog sampling latches to acquire data on both clock edges. System Architecture <<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>> Figure 5. Board level system Figure 5 illustrates the configuration for a 256 pin test system using the Testarossa chip. The system consists of sixteen tester chips, each providing sixteen I/O channels. The chips are arranged in a circular fashion around the central DUT thereby equalizing the lead lengths, while at the same time maintaining a total trace length of less than ten centimeters. This limits the time-of-flight of a signal between DUT and pin electronics to only a few hundred picoseconds, obviating the need for terminated transmission lines. The short trace length also provides a stray load capacitance on the DUT outputs on the order of only a few picofarads. This ensures high-fidelity waveform transmission for both driven and sensed DUT signals with a minimum of signal loading. As illustrated in Figure 5, the number of support components necessary to bind the system together is small, consisting of a few transceivers to buffer the host signals, and the necessary chip select logic for addressing individual tester chips. The only additional logic necessary is that required to generate the reference clock. We want to claim a system structure whereby all real-time signals are contained within a critical radius of the test device [1]. This should be one of the dependent claims for 1 and 2 together as it cannot stand alone. References [1] M. Barber, "Fundamental Timing Problems in Testing MOS VLSI on Modern ATE", IEEE Design and Test, pp. 482-489, August 1984. [2] E. Fiala and D. Greene, "Data Compression with Finite Windows", to appear in CACM, spring 1989. [3] J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression", IEEE Trans. on Information Theory, Vol. IT-23, No. 3, May 1977.