Dragon Architecture - SunShow 2 February 1988
DRAGON System Architecture

I Introduction
II The DynaBus
III VLSI Chip Set
IV Packaging
V Computer Architectures
VI Conclusions
I. INTRODUCTION





· MOTIVATIONS

· DRAGON TECHNOLOGY

· APPLICATIONS
Motivations
Foundations for building architectures for a wide range of document processing machines

· controllers : Network, Scanners...

· servers : Data base servers, Printers, Gateway

· workstations : mid, high & very high end

=> high bandwidth of data
Parallel processing architecture research

· Project started in 81

· Follow a first generation of shared memory
multiprocessor on conventionnal Bus.
Currently implemented in a Xerox product.

· new generation using more advanced concepts
& technology : VLSI, packaging

· Compatible with Operating systems like Mach
or SunOS phase3 & langage like Cedar
Dragon Technology
· Communications studies : VLSI BUS
[Artwork node; type 'Artwork on' to command tool]
· VLSI Chip Set
[Artwork node; type 'Artwork on' to command tool]
· Packaging
[Artwork node; type 'Artwork on' to command tool]
Computer Architecture
[Artwork node; type 'Artwork on' to command tool]
APPLICATIONS

Applications & markets

 ·
High end parallel computer


 · Desk-Top multiprocessor
 ·
High end Workstations


 ·
High end Printers servers
 ·
File and Data-bases servers


 ·
Add-in boards in standards platform
 · Industrial control
 · OEM multiprocessors


 · Chips set & packaging
II. The DynaBus




· HIGH SPEED BUS ARCHITECTURE
 Pipeline
 Bus configurations


· ELECTRICAL CONSIDERATIONS
 Termination
 CK skew
 Packaging


· LOGIC OF THE BUS
 64 bits
 DBus
 Performances

· PROTOCOL
HIGH SPEED BUS ARCHITECTURE
Principle
Cycle of the Bus = Time to transmit information from one device to an another
[Artwork node; type 'Artwork on' to command tool]
Tcycle = TckQ +Tprop+Tsetup + Tskew
8ns = 1ns + 4ns + 1 ns + 2ns
Pipeline
[Artwork node; type 'Artwork on' to command tool]
Only one Bidirectionnal segment in a pipelined Bus
=> Backpanel in a Multi-Board system
Bus Configurations
Level 1 : Mono-Board Computer
[Artwork node; type 'Artwork on' to command tool]
Level 2 : Multi-Board Computer
[Artwork node; type 'Artwork on' to command tool]
Level 3 : Multi-Board & Multi-Module Computer
[Artwork node; type 'Artwork on' to command tool]
ELECTRICAL CONSIDERATIONS

Imperatif to have Bus termination for balancing lines
For CMOS version of BIC

· open Drain for dissipation in resistors

·
50 Ohms at each end

· Power dissipation=U^2 /R

2 Volts swing => 80 mw/resitance
128 resistances => 10 Watts / Bus
10 Watts / BackPanel
15 Watts per Boards
Clock Skew
Clock distribution is critical low Skew is crucial

Tcycle > (TckQ)max+Tprop+(Tsetup)max-Tskew
Tskew < (TckQ)min + Tprop + (Tsetup)min
CMOS Chips needs a huge uncontrolled Clock amplifier, for driving internal High capacitance Clock
Hierarchical Clock distribution with BIC generating the Clock of each Chip
[Artwork node; type 'Artwork on' to command tool] ·
Packaging & Transmisson Lines
To obtain short cycles, lines must be balanced and act as perfect transmission lines
[Artwork node; type 'Artwork on' to command tool]
Using standards PGA difficult because stub, but SMD FQPC are very good
[Artwork node; type 'Artwork on' to command tool]
Next step is using Hybrid module
[Artwork node; type 'Artwork on' to command tool]
LOGIC OF THE BUS
Minimal number of wires for 64 bits data path. All commands are coded on 64 bits
[Artwork node; type 'Artwork on' to command tool]
Performances
Tcycle  Row BdWidth Usable BdWidth(*4/7)
25 ns 320 MB/sec 182 MB/sec
10 ns 800 MB/sec 457 MB/sec
DBus
Seven wires. Used for initialisation & debuging
PROTOCOL

Protocol oriented for multiprocessor with shared memory


· Hardware data consistency

· Split-cycle for very High speed

· Support multi-bank of memory

· Bridges with industrial standards Bus

· Support Multi-level Caches

· Mathematical model and proof of coherency
III. VLSI Chip Set




· CURRENT FAMILLY OF SEVEN CHIPS

  BIC
  Arbiter
  Small Cache
  Memory Controller
  IOBridge
  Display/Printer
  Map Cache
   
Bus Interface Chip : BIC

Contains all the electrical specificity of the bus


· Slice of Pipelined register ( 2 * 24 bits)

· Control acess of the Bus

· Contains low voltage driver & receiver

· Clock Skew regeneration

· Current implementation for Hybrid Modules
Arbiter

Control Bus access for up to 64 masters


· Distributed arbiter. Current implementation :
one arbiter chip control 8 masters
connection up to eight arbiters

· Works with all of pipeline configuration

· 7 priority levels, Round robin inside one level

· Hold management, for lock of requestors

· System Stop generation

· Dbus predecoder
Small Cache

Interface between a Processor and the Bus


· Contains the Snoopy algorithm for consistency

· Full associative Dual port memory
Virtual Cam on the Processor Side
& Real Cam on the Bus side


· Efficient first Cache of the Virtual to Real Table
build-in for free

· Implements Conditionnal Write, for efficient
multiprocessor locking on a split-cycle Bus

· Entry point on the Bus for all devices playing
the consistency game. Exemple IOBridge.

· Current implementation 2 micron : 5 KB
with 0.8 => 32 KB / Chip
IOBridge

Interface between the DynaBus and an industrial standard Slow Bus


· Use a Small Cache for access to the DynaBus.
In future implementation in 1.2 micron :
merging of both IOB & Cache

· Provide transparent access & Maps from the Slow Bus to the DynaBus


· Provide transparent from the DynaBus to the SlowBus

· Choice of Virtual Address or Real Address for IO

· Current implementation for the PC/AT, but easily adaptable to others standards : Micro-Channel, NuBus..or special internal Bus
Memory Controller

Control a Bank of memory from 8 MByte, up to 128 MByte


· Implements consistency algorithm

· Use Memory in nibble-mode for fast access

· Implementing ECC on 64 bits + 8
Corrects one error, detects two

· Multi-Bank, init by the DBus
Display/Printer
Refresh of the Display/Printer from the Bus in Background


· Always low-priority, except when its Fifo is empty

· Fully configurable : from B&W to 8 bits/pixels
and up to 200 MByte/sec

· Up to three controllers for 24 bits Color

· Architecture adapted for multiprocessor, avoiding
flushing of caches

· Perfect with Second level Cache, in which the
Frame Buffer is a multi-MegaByte Cache
Map Cache

Provide a second level of Cache of the Table
Virtual to Real addresses


· Because the first level cache is inside the Small
Cache, this Map is not used at every misses

· Replacement algorithm completly done by software
by the processor which has a map miss. Control of

victimisation for IO

· Multi-Map Chip possible


· First implementation 256 entries


IV. Packaging




· HYBRID PACKAGE

· ASSEMBLY & CONNECTOR
HYBRID PACKAGE
Principles

·
Design of a "Chip Carrier" containing many Chips
·
Intermediate level between Chips and Boards
·
Perfect integration into our architecture
·
Substrate in Silicon for experimentation
· Low to very low cost using large area process
[Artwork node; type 'Artwork on' to command tool]
Advantages for lot of applications

· increase of speed
·
gain in space
·
solve power dissipation problems
·
gain in cost
HYBRID ASSEMBLY

[Artwork node; type 'Artwork on' to command tool]
V. Computer Architectures




· Multi-Board parallel Computer
using conventionnal packaging

· Mono-Board Computer
using conventionnal packaging

· Add-on Multiprocessor
using Hybrid packaging

· High performance parallel Computer
using Hybrid packaging
MultiBoard Parallel Computer
with conventionnal Packaging
Curent implementation allows 24 processors and 8 memory Banks for exemple
[Artwork node; type 'Artwork on' to command tool]
Add-on Multiprocessor
[Artwork node; type 'Artwork on' to command tool]
MultiBoard Parallel Computer
with Hybrid Modules
[Artwork node; type 'Artwork on' to command tool]
VI. Conclusions


· Key Features of the Architecture
· Results and Milestones
· Transfer of Research
· Future Architecture
Key Features
· Unique VLSI Bus
  => SPEED & PERFORMANCE
order of magnitude in speed, multiprocessor oriented & good for Standardization for VLSI
· Unique Chip Set implementation
  => COST & SIMPLICITY
only seven LSI for all the family, Chips replace functionnalities of complete boards
· Unique Packaging use
  => COST, SIZE & PERFORMANCE
Advanced packaging used at the architecture level
for mid & low end computer
Unique open architecture to industrial standards
  
=> EASY INTEGRATION
Any standard microprocessor, Bridge with existing standard busses, Standard operating system

order of magnitude in price/performance
1-2 years of advance to the competition
Results & Milestones

(Sparc Softcard :
· Wire-wrapped prototype in March)
Chips Set :
· 5 chips will return from fab in february 88
(BIC, Arbiter, IOBridge, MemController, Simplex)

· 3 chips will be send 1Q88
(Cache, MAP Cache, Display/Printer

Wire-Wrapped June87 Prototype :

· 3Q 88 with 4 Sparcs
   (depend of the Cache)
  
  
High Speed Bus Prototype & Packaging :
· 2Q 88
Transfer of Research

Try to Standardize of the BUS
a VLSI BUS does not exist yet
Partnership

with Equipment maker like :
SUN
with Semi-conductor companies :
Motorola, National, AMD, Cypress, ...
good chances because open architecture compatible with
industrial standards : any microprocessor, any add-in boards &
possible multi-operating system...
Future Directions
This architecture is only at its beginning : Lot of Evolutions are expected : Progressive use of the Advanced packaging, Second level of Cache & other topologies...
Future Directions on parallelism

Explore future parallel architecture in keeping this general model of
"Shared Memory with Caches"
Big opportunities with new operating systems like Mach, or SunOS phase3 which include this model of communication in the kernel, and languages like Ada or Cedar...

High Speed Bus is Crucial for supporting high performance VLSI operators

High speed vector processor, High speed network controller, graphic controller, High speed disk controller, Compression, Uncompression of images...