Project Name: Synth
Designer: Jim Cherry
Location: Xerox PARC
Project Size (units of l): 1920 x 963
Filed on: [ivy]<speech>Synth>Synth.bravo
Figures filed on: [ivy]<speech>Synth>Synth1,2,3,4,5,6,7,8,9,10,11.sil
Description:
Synth is a digital lattice filter which performs most of the computation required to synthesize speech using the linear predictive coding (LPC) algorithm. The illustrations Synth3,4,5, and 10 for use with this document are color sil diagrams.
LPC is a technique for speech synthesis which digitally models the human speech production system. The block diagram of figure 1 illustrates a typical LPC speech synthesis configuration. This is the same algorithm that was used in the Texas Instruments Speak and Spell talking learning aid. The voiced excitation generator creates a sequence of pulses at the appropriate pitch rate which simulate the action of the glottis in voiced sounds such as vowels. An unvoiced excitation generator (a white noise generator) simulates the excitation that results from air passing through a constriction as in the sound /s/. The vocal tract is modeled as a time varying all-pole linear filter. This models the vocal tract as a sequence of concatenated acoutic tubes. To reduce the bandwidth of the parameters which drive the synthesizer the speech signal can divided into steady state segments. The parameters for each frame are then interpolated.
The lower block diagram in figure 1 illustrates a synthesis configuration that uses a mixed excitation source. This configuration avoids classifing a speech frame as only voiced or unvoiced. Some sounds such as /z/ have both a voiced and unvoiced component. Separate vocal tract models may also be used for the voiced/unvoiced components. While little has been written about how one would derive the parameters to drive such a system, it is desirable to have the synthesizer components constructed be compatable with such a configuration.
Synth implements a ten pole vocal tract model, not a complete synthesizer. A lattice filter configuration was chosen because of its computation properties. Figure 2 illustrates the lattice and the notation used to label the signals. Note that the filter’s k parameters (reflection coeffients) are not labeled as in the literature so that they more closely resemble the order in which they are used in the computation. The k parameters are those that result from using Levinson’s recursion to analyze a speech segment.
All arithemetic, inputs and outputs in the lattice filter are bit serial, least significant bit first two’s compliment binary numbers. Associated with serial data paths are signals denoted LSBtime which is high when the least significant bit of that signal occurs. Fourteen bit k parameters are used. Twenty four bit words are used for all intermediate calculations, as well as the input and output of the the filter. Twenty two word times, each composed of 24 bit times (clock cycles) and numbered from 0 to 21 are required to obtain one output speech sample. Thus to obtain speech samples at a 10KHz the chip must be clocked at 5.28MHz. Figure 11 shows the signal format and the timing of the filter inputs and outputs. All inputs and outputs to the filter are negative true logic. The input amplitude [A] and excitation [u(i)] are input during word time 0. Filter parameter kn (n=0 to 9) is input during word 2(n+1). The speech sample obtained during the previous lattice calculation [y10(i-1)] is also output at this time.
Figure 3 is a block diagram of the hardware used to impliment the lattice filter. One serial multiplier and adder are used to perform the lattice calculations for all ten stages. Documentation for these components may be found along with R. F. Lyon’s Speech project. The product obtained from the serial multiplier is the full product truncated to the 24 bit format. Figures 4 and 5 show the timing for the input and output signals for each of the major blocks block diagram. The input and output signals for each of the major blocks are shown. Two word times are used for each stage of the lattice calculation, one for each multiplication. Word times 0 and 1 are used to form the product u(i)*A. The timing for stage 2 (words 6 and 7) is typical of all the rest of the stages.
The timing signals used to control the switches are generated using shift registers as shown in figure 6. With LSBtime-in and Word0-in grounded the lattice filter runs in self timed mode. The signals LSBtime-out and Word0-out are used by the host processor to synchronize the inputs and outputs of the lattice filter. The LSBtime-out and Word0-out signals may also be used to drive the LSBtime-in and Word0-in of another Synth chip to maintain sync. This is useful in a mixed source synthesizer.
A map of where the various blocks of figure 3 occur in the layout is shown in figure 10. Schematics for all of the blocks are shown in figures 6 through 9.