Patent Claims for Integrated Testers

Richard Barth and James Gasbarro
Draft of March 10, 1989 5:49:10 pm PST
Test Vector Compression
    The interface between the tester and the test device requires four bits per cycle to describe the operation the tester is to 
    perform.  These bits are: 

    - Force, this is the data value driven to the DUT
    - Expect, this is the the expected response data from the DUT
    - Inhibit, disables the output driver of the tester
    - Mask, disables error generation when the acquired data does not match Expect

    In the implementation of a single-chip tester it is impractical to store all of these bits in raw form for each pin in each cycle.  
    This is due to the limited die area available for RAM storage.  It is also impractical to use off-chip storage in the form of 
    commercial RAMs because of pin bandwidth and board size limitations.  The solution is to compress the storage space required 
    for this data.  This is achieved by using several different techniques:

    - sematic combination
    - mapping
    - lossless compression

    The Force and Expect bits are combined using the sematic combination technique.  Since these bits are mutually exclusive for the 
    most part, they can be simply combined into a single bit.  This reduces the functionality of the tester somewhat, but separate 
    control can be simulated by successive tester cycles.  This technique is common in many testers.
    The storage required for the Mask and Inhibit bits is reduced using a mapping mechanism.  During test execution, the drive state 
    and data validity of many pins of the tester are strongly correlated in both space and time.  A data bus is an example of both 
    of these points: all pins of the bus switch between drive and sense at the same time, which means that all of the Inhibit and 
    Mask bits have the same value at any instant in time, and they all change value at the same time.  Since many of the tester pin 
    drives are static, and there are relatively few groups of dynamic pins, all combinations of Mask and Inhibit can be stored in a 
    small table as shown in Figure 1.  This table can be indexed with a small number of bits per vector, thereby greatly reducing 
    the required amount of storage.

    <<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>>
<<Figure 1.  Compression through mapping>>
    <<>>
    After applying these two compression methods, there is still considerable redundancy remaining in the test vectors.  To further 
    reduce the storage requirements of the test vectors a lossless compression mechanism is employed.  This compression technique 
    utilizes off-line compression on a host processor and real-time decompression inside the tester chip.  The general technique 
    was originally developed by Ziv and Lempel [3] and was further enhanced by Fiala and Greene [2] as a software means for 
    compressing data.  This is the first reduction to practice in hardware of the decompression algorithm.  The encoding takes a 
    set of source symbols (vectors) of variable length and maps them into a set of variable length codewords.  The compressed 
    stream consists of a sequence of command words interspersed with literal data words.  Two types of commands are used: Copy and 
    Literal.  The Copy command has two parameters, the length of the run of symbols to be copied, and their position in the history 
    buffer.  The Literal command needs only one field: the length of the run of symbols which follow in the compressed stream.  
    When Copy is interpreted, a sequence of source symbols corresponding to the number specified by the length field is transcribed 
    from the history buffer of the decompressor to the output stream.  The index of the first symbol in the buffer is indicated by 
    the position field.  At the end of each decompressor cycle, the current output symbol of the decompressor is copied into the 
    least recently written location of the history buffer so that the buffer always contains a history of the most recently 
    decompressed data.  The block diagram of an architecture which implements these functions is shown in Figure 2.  In order to 
    meet the minimum data delivery requirement of one word per clock cycle, the RAM word width must be at least as great as the 
    number of words required for a single Literal command and one word of literal data.  Since the Copy and Literal command words 
    are the same length, this guarantees that Copy commands can meet the minimum data delivery requirement as well, since only the 
    command itself is needed to produce an output symbol.  There are two outputs of the prefetcher in this scheme, the next command 
    and a literal data word.  The command is interpreted by the control logic to determine whether the next decompressed output 
    word should come from the history buffer or the literal data stream.  The multiplexor at the output of the history buffer 
    selects between these two alternatives.  The control logic decodes the current instruction to determine where the next 
    instruction occurs in the prefetch buffer.

<<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>>
Figure 2.  Block diagram of decompressor

    This decompression architecture has more general application beyond testers.  Claims should be made in the general context of 
    hardware decompression and then made specific to the problem of testing.

Timing Generation
    See attached paper, "Integrated Pin Electronics for VLSI Functional Testers" for detailed description.

    Figure 3 shows the circuit used for the Format generator.  The Pulse and nPulse inputs are driven by the XOR output of the pulse 
    generator (Figure 2a of the above reference).  The Inhibit signal is driven by the Map discussed in the section on Test Vector 
    Compression.  Data is the Force Data bit of the test vector and corresponds to Decompressed Data Out in Figure 2.  The output 
    signals, DriveHigh and nDriveLow drive the output transistors connected to the DUT pin.  The remainder of the signals select 
    the current output format.
    (This figure should be generalized.)
<<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>>
Figure 3.  Format circuit


    The Acquire portion of the pin electronics circuitry is responsible for capturing data from the DUT and determining if it matches 
    the desired response.  Both the data value and output voltage level must be checked.  In the Force Timing generator, both delay 
    and width timing control are needed, but in the Acquire Timing generator only control of the sample delay timing is necessary.  
    This allows the Acquire Timing generator to be simplified to a single delay line.  The input to this delay line is still     as it was for the Force Timing generator.  This introduces a problem in that the frequency doubling effect of the XOR gate is 
    not realized, so the delay line output is only half the desired cycle frequency.  Since the die area consumed by the sample and 
    comparison circuitry is much smaller than that of a second delay line, the solution to this problem is to simply build two 
    sampling circuits: one which operates on the positive transition of the sample timing clock and one which operates on the 
    negative transition.  A beneficial side effect of this technique is that the data at the output of the two comparators is 
    available for a full tester cycle.  This greatly simplifies the problem of resynchronizing the data between the sample logic, 
    and the data comparison pipeline.  Since the phase of the captured signal is known, an optional pipeline stage clocked at the 
    midpoint of the cycle ensures that the data can be captured at a stable point in the cycle (Figure 4).  In the data pipeline, 
    the acquired value is compared with the Expect Data from the test vector.  If the comparison fails and the Mask bit from the 
    test vector is not asserted, the external Error bit is asserted for the cycle.
<<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>>
Figure 4.  Acquisition and synchronization circuit


    Claims:
    Pulse generation concept: two delay lines followed by XOR.
    Method for implementing wide range, high resolution delay line by using three types of delay mechanisms.
    Compact implementation of the delay generator 
    Correction of pulse edge distortion by Edge Adjust units.
    Format generator circuit (Figure 3).
    Use of dual analog sampling latches to acquire data on both clock edges.
System Architecture
<<[ Nectarine figure; type 'Artwork on' to a CommandTool ]>>

Figure 5.  Board level system
    Figure 5 illustrates the configuration for a 256 pin test system using the Testarossa chip.  The system consists of sixteen tester 
    chips, each providing sixteen I/O channels.  The chips are arranged in a circular fashion around the central DUT thereby 
    equalizing the lead lengths, while at the same time maintaining a total trace length of less than ten centimeters.  This limits 
    the time-of-flight of a signal between DUT and pin electronics to only a few hundred picoseconds, obviating the need for 
    terminated transmission lines.  The short trace length also provides a stray load capacitance on the DUT outputs on the order 
    of only a few picofarads.  This ensures high-fidelity waveform transmission for both driven and sensed DUT signals with a 
    minimum of signal loading.  
    As illustrated in Figure 5, the number of support components necessary to bind the system together is small, consisting of a few 
    transceivers to buffer the host signals, and the necessary chip select logic for addressing individual tester chips.  The only 
    additional logic necessary is that required to generate the reference clock.
    We want to claim a system structure whereby all real-time signals are contained within a critical radius of the test device [1].
    This should be one of the dependent claims for 1 and 2 together as it cannot stand alone. 


References

[1] M.  Barber, "Fundamental Timing Problems in Testing MOS VLSI on Modern ATE", IEEE Design and Test, pp. 482-489, August 1984.

[2] E.  Fiala and D.  Greene, "Data Compression with Finite Windows", to appear in CACM, spring 1989.

[3] J.  Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression", IEEE Trans. on Information Theory, Vol. IT-23, 
No. 3, May 1977.