Dragon Architecture
2 February 1988
1
DRAGON System Architecture

I Introduction
II The DynaBus
III VLSI Chip Set
IV Packaging
V Computer Architectures
VI Conclusions
I. INTRODUCTION





· MOTIVATIONS

· DRAGON TECHNOLOGY

· APPLICATIONS
Motivations
Foundations for building architectures for a wide range of document processing machines

· controllers : Network, Scanners...

· servers : Data base servers, Printers, Gateway

· workstations : mid, high & very high end

=> high data bandwidth
Parallel processing architecture research

· Project started in 81

· Follows a first generation of shared memory
multiprocessor on conventional Bus.
Currently implemented in a Xerox product.

· new generation using more advanced concepts
& technology : VLSI, packaging

· Compatible with Operating systems like Mach
or SunOS phase3 & languages like Cedar
Dragon Technology
· Communications studies : VLSI BUS
[Artwork node; type 'Artwork on' to command tool]
· VLSI Chip Set
[Artwork node; type 'Artwork on' to command tool]
· Packaging
[Artwork node; type 'Artwork on' to command tool]
Computer Architecture
[Artwork node; type 'Artwork on' to command tool]
APPLICATIONS

Applications & markets

 ·
High end parallel computer


 · Desk-Top multiprocessor
 ·
High end Workstations


 ·
High end Printers servers
 ·
File and Data-bases servers


 ·
Add-in boards in standards platform
 · Industrial control
 · OEM multiprocessors


 · Chips set & packaging
II. The DynaBus




· HIGH SPEED BUS ARCHITECTURE
 Pipeline
 Bus configurations


· ELECTRICAL CONSIDERATIONS
 Termination
 Clock skew
 Packaging


· LOGIC OF THE BUS
 64 bits
 DBus
 Performances

· PROTOCOL
HIGH SPEED BUS ARCHITECTURE
Principle
Bus Cycle = Time to transmit information from one device to an another
[Artwork node; type 'Artwork on' to command tool]
Tcycle = TckQ +Tprop+Tsetup + Tskew
8ns = 1ns + 4ns + 1 ns + 2ns
Pipeline
[Artwork node; type 'Artwork on' to command tool]
Only one Bidirectional segment in a pipelined Bus
=> Backpanel in a Multi-Board system
Bus Configurations
Level 1 : Mono-Board Computer
[Artwork node; type 'Artwork on' to command tool]
Level 2 : Multi-Board Computer
[Artwork node; type 'Artwork on' to command tool]
Level 3 : Multi-Board & Multi-Module Computer
[Artwork node; type 'Artwork on' to command tool]
ELECTRICAL CONSIDERATIONS

Required to have Bus termination for balancing lines
For CMOS version of BIC

· open Drain for dissipation in resistors

·
50 Ohms at each end

· Power dissipation=U^2 /R

2 Volts swing => 80 mW/resistance
128 resistances => 10 Watts / Bus
10 Watts / BackPanel
15 Watts per Boards
Clock Skew
Clock distribution is critical, low Skew is crucial

Tcycle > (TckQ)max+Tprop+(Tsetup)max-Tskew
Tskew < (TckQ)min + Tprop + (Tsetup)min
CMOS Chips needs a huge uncontrolled Clock amplifier, for driving internal High capacitance Clock
Hierarchical Clock distribution with BIC generating the Clock of each Chip
[Artwork node; type 'Artwork on' to command tool] ·
Packaging & Transmission Lines
To obtain short cycles, lines must be balanced and act as perfect transmission lines
[Artwork node; type 'Artwork on' to command tool]
Using standards PGA difficult because stubs, but SMD FQPC are very good
[Artwork node; type 'Artwork on' to command tool]
Next step is using Hybrid module
[Artwork node; type 'Artwork on' to command tool]
LOGIC OF THE BUS
Minimal number of wires for 64 bits data path. All commands are coded on 64 bits
[Artwork node; type 'Artwork on' to command tool]
Performances
Tcycle  Raw BdWidth Usable BdWidth(*4/7)
25 ns 320 MB/sec 182 MB/sec
10 ns 800 MB/sec 457 MB/sec
DBus
Seven wires. Used for initialization & debugging
PROTOCOL

Protocol oriented for multiprocessor with shared memory


· Hardware data consistency

· Split-cycle for very High speed

· Supports multi-bank memory

· Bridges with industrial standard Bus

· Supports Multi-level Caches

· Mathematical model and proof of coherency
III. VLSI Chip Set




· CURRENT FAMILY OF SEVEN CHIPS

  BIC
  Arbiter
  Small Cache
  Memory Controller
  IOBridge
  Display/Printer
  Map Cache
   
Bus Interface Chip : BIC

Contains all the electrical specificity of the bus


· Slice of Pipelined register ( 2 * 24 bits)

· Controls access of the Bus

· Contains low voltage driver & receiver

· Clock Skew regeneration

· Current implementation for Hybrid Modules
Arbiter

Control Bus access for up to 64 masters


· Distributed arbiter. Current implementation :
one arbiter chip controls 8 masters
up to eight arbiters

· Works with all of pipeline configuration

· 7 priority levels, Round robin inside one level

· Hold management, for lock of requestors

· System Stop generation

· DBus predecoder
Small Cache

Interface between a Processor and the Bus


· Contains the Snoopy algorithm for consistency

· Full associative Dual port memory
Virtual Cam on the Processor Side
& Real Cam on the Bus side


· Efficient first Cache of the Virtual to Real Table
built-in for free

· Implements Conditional Write, for efficient
multiprocessor locking on a split-cycle Bus

· Entry point on the Bus for all devices playing
the consistency game. Example IOBridge.

· Current implementation 2 micron : 5 KB
with 0.8 => 32 KB / Chip
IOBridge

Interface between the DynaBus and an industrial standard Slow Bus


· Use a Small Cache for access to the DynaBus.
In future implementation in 1.2 micron :
merging of both IOB & Cache

· Provide transparent access & Maps from the Slow Bus to the DynaBus


· Provide transparent from the DynaBus to the SlowBus

· Choice of Virtual Address or Real Address for IO

· Current implementation for the PC/AT, but easily adaptable to others standards : Micro-Channel, NuBus..or special internal Bus
Memory Controller

Control a Bank of memory from 8 MByte up to 1 GByte


· Implements consistency algorithm

· Uses Memory in nibble-mode for fast access

· Implements ECC on 64 bits + 8
Corrects one error, detects two

· Multi-Bank, init by DBus
Display/Printer
Refresh of the Display/Printer from the Bus in Background


· Always low-priority, except when its Fifo is empty

· Fully configurable : from B&W to 8 bits/pixels
and up to 200 MByte/sec

· Up to three controllers for 24 bits Color

· Architecture adapted for multiprocessor, avoiding
cache flushing

· Perfect with Second level Cache, in which the
Frame Buffer is a multi-MegaByte Cache
Map Cache

Provide a second level of Cache of the Table
Virtual to Real addresses


· Because the first level cache is inside the Small
Cache, this Map is not used at every miss

· Replacement algorithm completely done by
software by the processor which has a map miss.
Control of victimization for IO

· Multiple MapCache Chips possible


· First implementation 256 entries


IV. Packaging




· HYBRID PACKAGE

· ASSEMBLY & CONNECTOR
HYBRID PACKAGE
Principles

·
Design of a "Chip Carrier" containing many Chips
·
Intermediate level between Chips and Boards
·
Perfect integration into our architecture
·
Substrate in Silicon for experimentation
· Low to very low cost using large area process
[Artwork node; type 'Artwork on' to command tool]
Advantages for lots of applications

· increase of speed
·
gain in space
·
solves power dissipation problems
·
gain in cost
HYBRID ASSEMBLY

[Artwork node; type 'Artwork on' to command tool]
V. Computer Architectures




· Multi-Board parallel Computer
using conventional packaging

· Mono-Board Computer
using conventional packaging

· Add-on Multiprocessor
using Hybrid packaging

· High performance parallel Computer
using Hybrid packaging
MultiBoard Parallel Computer
with conventional Packaging
Current implementation allows 24 processors and 8 memory Banks for example
[Artwork node; type 'Artwork on' to command tool]
Add-on Multiprocessor
[Artwork node; type 'Artwork on' to command tool]
MultiBoard Parallel Computer
with Hybrid Modules
[Artwork node; type 'Artwork on' to command tool]
VI. Conclusions


· Key Features of the Architecture
· Results and Milestones
· Transfer of Research
· Future Architecture
Key Features
· Unique VLSI Bus
  => SPEED & PERFORMANCE
order of magnitude in speed, multiprocessor oriented & good for Standardization for VLSI
· Unique Chip Set implementation
  => COST & SIMPLICITY
only seven LSI for all the family, Chips replace functionalities of complete boards
· Unique Packaging use
  => COST, SIZE & PERFORMANCE
Advanced packaging used at the architecture level
for mid & low end computer
Unique open architecture to industrial standards
  
=> EASY INTEGRATION
Any standard microprocessor, Bridge with existing standard busses, Standard operating system

order of magnitude in price/performance
1-2 years of advance on competition
Results & Milestones

(Sparc Softcard :
· Wire-wrapped prototype in March)
Chips Set :
· 5 chips will return from fab in february 88
(BIC, Arbiter, IOBridge, MemController, Simplex)

· 3 chips will be sent 1Q88
(Cache, MAP Cache, Display/Printer)

Wire-Wrapped June87 Prototype :

· 3Q 88 with 4 Sparcs
   (depends on the Cache)
  
  
High Speed Bus Prototype & Packaging :
· 2Q 88
Transfer of Research

Try to Standardize of the BUS
a VLSI BUS does not exist yet
Partnership

with Equipment maker like :
SUN
with Semi-conductor companies :
Motorola, National, AMD, Cypress, ...
good chances because open architecture compatible with
industrial standards : any microprocessor, any add-in boards &
possible multi-operating system...
Future Directions
This architecture is only at its beginning : Lot of Evolutions are expected : Progressive use of the Advanced packaging, Second level of Cache & other topologies...
Future Directions on parallelism

Explore future parallel architecture in keeping this general model of
"Shared Memory with Caches"
Big opportunities with new operating systems like Mach, or SunOS phase3 which include this model of communication in the kernel, and languages like Ada or Cedar...

High Speed Bus is Crucial for supporting high performance VLSI operators

High speed vector processor, High speed network controller, graphic controller, High speed disk controller, Compression/Decompression of images...