<<DisplayControllerStrawMan.tioga
Hoel, June 9, 1986 6:31:07 pm PDT>>
<<>>
<<Hoel, June 7, 1986 11:38:54 am PDT>>
Display Controller Straw Man
for June87 Dragon
Introduction
    For many years, the Dragon project in CSL has been working on defining and implementing the next generation of personal workstation 
    for use within PARC.  The workstation incorporates multiple Dragon processors, implemented with custom VLSI circuits, 
    interconnected in a shared memory architecture.  It is intended to be much more powerful than the Dorado workstation it will 
    replace, and will serve as a vehicle for research in multiprocessing software for years to come.
    The Dragon project had been attempting to become and remain bus-compatible with the Dragonfly project in Xerox's Electronics 
    Division, and had been relying on the Dragonfly project to provide many components of a Dragon workstation (chassis, backplane, 
    memory subsystem, I/O subsystem, etc.), so that CSL could concentrate on the components unique to Dragon and/or of particular 
    research value.
    But recently, the Dragonfly project was cancelled.  This caused some Dragon people to ask whether the Dragon project would benefit 
    by revisiting the choices of packaging, bus protocol, etc. to see if better alternatives could be found.
    On May 23, 1986, a subcommittee of people working on the Dragon project was selected to propose a "straw man" Dragon computer which 
    could be operational by June of 1987 and would meet our requirements for a first generation Dragon.
Purpose
    The subcommittee decided that various individuals would go off and write proposals for various subsystems of Dragon, to serve as a 
    starting point for further discussion.  I was assigned to write about the display controller.
    The purpose of this document is to propose a feasible plan for providing the June87 Dragon (we really need a better name) with a 
    satisfactory display controller, and to discuss and analyze alternatives, to motivate the proposed alternative.
    This document will be discussed by the aforementioned Dragon subcommittee, along with other proposals for other aspects of Dragon, 
    and then the straw man proposal we were chartered to produce will be written.
References
[FCJH]  Frank Crow and Jeff Hoel, The Dragon Display, CSL Notebook Entry 85CSLN-0002, May 14, 1985,
/Indigo/CSL-Notebook/Entries/85CSLN-0002.tioga
[JHRE]  Jeff Hoel, Display Controller Real Estate, June 6, 1986,
/Nimitz/Cedar/Users/Hoel.PA/Dragon/June87/DisplayControllerRealEstate.tioga
[PS]  Pradeep Sindhu, The DynaBus Logical Specifications, June 2, 1986,
/Indigo/Dragon/Documentation/DynaBusLSpecs.tioga.
Context
    Before considering specific proposals for display controllers, perhaps it would be good to consider the goals of the Dragon 
    project, both short-term and long-term.  Then we can evaluate the various proposals in terms of how well they meet the goals of 
    the project.
    (By the way, I don't mean to suggest that the personal goals of the project members shouldn't be considered.  But let's start by 
    considering the goals we think we share or ought to share project-wide.)
    By considering project goals, perhaps we can better understand the following:
o  how the June87 Dragon straw man fits within the overall Dragon plan;
o  how the June87 display controller straw man fits within the overall Dragon display plan; and
o  how the June87 display controller straw man fits within the June87 Dragon straw man.
    In the following sections, I've proposed some goals I think we probably share.  Please have a look, to see if you agree.  Please 
    feel free to propose modifications, additions, and/or deletions.
    Ultimate Dragon Goals
        In the long run, here's what we hope the Dragon project will do:
        o  Provide a vehicle for software research into multiprocessing using a shared memory architecture.
        o  Provide a state-of-the-art computing environment for researchers within PARC.  (Lots of CSL folks are depending on this.)
        o  Test and demonstrate the validity of the hardware architecture concepts.  These concepts include the idea of snoopy caches; the 
        DragOps instruction set; and the notion of multi-level busses, to name but a few.
        o  Provide the flexibility to integrate new hardware subsystems easily.  We want to build both workstations and servers.  We want 
        to plug in things we haven't even thought of yet.
        o  Provide a forcing function for research into DA tools.
        o  Gain experience in VLSI design.
        o  Gain experience in other technologies appropriate for making state-of-the-art hardware systems.
        o  Transfer Dragon technology to the wider Xerox community.
    June87 Dragon Goals
        For the June87 Dragon, here's what we think we want to do:
        o  Minimize the risk of not getting done by June 1987.  Actually, I'm not sure why it's June 1987 -- perhaps because it's close 
        enough to instill the right kind of immediacy and urgency.  What should we worry about if we don't make it?  (It's been 
        said that the difference between research and engineering is the level of acceptable risk.  What are we doing here?)
        o  Provide a vehicle for Dragon software development.  The Cedar port will be a big deal, and we won't have a state-of-the-art 
        computing environment until it's done.  So it's best to get started as soon as possible.  This will require a substantial 
        number of copies of the Dragon hardware.  But how many?  I've heard we might make as many as 100 June87 Dragons, but that 
        sounds high if Cedar porters are the only customers.  What other customers do we have in mind, if any, and what are their 
        requirements?  How soon after June 1987 do we intend to provide the next generation of Dragons?  What then happens to the 
        old ones?  What kind of budget do we need to do that?
        o  Maximize the opportunity for subsequent generations of Dragons.  We don't want to paint ourselves into a corner, by designing a 
        dead-end June87 Dragon.  For example, we're designing the bus protocol to work for multiple bus levels, although we'll 
        propose implementing a single bus level for June87.
        o  Minimize "throw-away" engineering.  In a way, this is a continuation of the previous point.  After June87, what will have to be 
        completely redesigned for the next generation?  Functions within chips?  Chips?  Chip carriers?  Boards?  Backplane?  
        Chassis?  Or do we want to gain the experience and throw away all the artifacts?  I'll bet there's no consensus yet.
        o  Maximize "spiffiness," while still minimizing risk.
        o  Demonstrate viability of the Dragon concept.  I'm not sure this is really a goal, per se, but I thought I'd mention it.  Should 
        the June87 Dragon compare favorably with its competition?  What will the competition be?  What will the basis of comparison 
        be?
        o  Live within budget.  It seems prudent to assume that headcount is frozen indefinitely.  How much can we spend for non-recurring 
        engineering?  How much can we spend per June87 Dragon?  Can we spend money to reduce risk?  To accelerate the schedule?  To 
        make something spiffy?
        o  Have fun.  Actually, I'm not sure what to say here.  There's a time to ****
        <Must be fun??  -- some of us have been having fun longer than others....>
            <community service or self-expression?>
    Ultimate Display Controller Goals
        In the long run, here's what we hope to do with display controllers for Dragons.  Let me limit the list to features I think have 
        system implications.
        o  Provide the flexibility to integrate new display controllers easily.
        o  Provide sufficient packaging real estate to do interesting things.
        o  Support multiple displays.
        o  Support highest quality displays.  (CRT technology is likely to predominate for many years, although other technologies should 
        not be precluded.)
        -  Both monochromatic and color displays.
        -  Ultra-high resolution, e.g., 2048 x 2048 pixels.  Resolution should not be limited by the bandwidth of the main bus, which 
        implies that the display ought not to have to depend on the main bus for refresh.
        -  Flicker-free image.  This requires a refresh rate of at least 60 Hz, non-interlaced.
        o  Support up to at least 24 bits per pixel
        o  Have a memory mapped frame buffer.  This architecture provides convenient, high-bandwidth access to the frame buffer for Dragon 
        and other processors.  In a research environment, it seems appropriate to rely on the general-purpose Dragon processors to 
        provide the horsepower required for most graphics applications; but special-purpose processors should not be precluded.  
        Within CSL the idea of supporting video over high-speed local area networks is popular, so perhaps a special-purpose local 
        area network processor for video would make sense.  In any case, a mechanism is required for assuring data consistency of 
        the frame buffer.
    June87 Display Controller Goals
        For the June87 Dragon, here's what we think we need to do:
        o  Minimize the risk of not getting done by June 1987.  This applies both to the display controller hardware and to the display 
        head software.
        o  Provide a display adequate for software development.
June87 Pre Straw Men
    At the Dragon Meeting of May 23, 1986, Jean Gastinel presented his ideas of what the 1987 Dragon straw man might turn out to be.  
    (These ideas were also discussed at subsequent subcommittee meetings.  I may be a little confused about what I heard where, but 
    it shouldn't really matter.)  The Dragon subcommittee was not required to accept any of what was presented as cast in stone.  
    But Ed McCreight recommended that we consider it as a place to start.  Let me call these ideas "pre straw men."
June87 Dragon Pre Straw Man
    Here's a sketch of the features of the June87 Dragon pre straw man:
    o  High-speed bus:  The main bus proposed has a raw bandwidth of 200 MB/s, five times as much as its predecessor, the M bus.  It is 
    64 bits wide (not 32) and runs at 25 MHz (not 10 MHz).  Due to bus protocol overhead, the maximum effective data bandwidth, 
    assuming 100% bus utilization, is 114 MB/s.  (The estimate of 25 MHz is said to be "conserative;" more optimistic estimates 
    have ranged as high as 100 MHz.)
    o  One-board implementation:  To achieve the proposed bus speed, it is necessary for the physical extent of the bus to be rather 
    small.  Putting the whole Dragon on one 10.9 x 16 inch board addresses that requirement.  (An optional two-board system was 
    discussed, but the main bus was confined to one of the two boards, and an extended memory system was proposed for the other.)  
    A case can be made that a one-board is simpler to make than a multi-board system, since the backplane need only supply power.
    o  Chip carrier packaging:  To fit everything on one board seems to require more exotic packaging technology than we had previously 
    been considering.  Rather than package chips individually in pin grid arrays, it was proposed that Dragon clusters (consisting 
    of EU, IFU, and two caches -- no floating point) be mounted on 3 x 6 cm chip carrier modules.  Also, it was proposed that 
    memory be packaged as nine-chip hybrid SIPs, using 1M DRAMs.
    o  Processors:  It was proposed to put four processor modules on the system board, since they will fit and won't swamp the bus.
    o  Chassis:  It was proposed to use a 6085 (Daybreak) chassis, because it's done and is reasonably clever.  The system board would 
    be allowed to occupy two slots, since some of its components are rather tall.
June87 Dragon Display Controller Pre Straw Man
    Here's a sketch of the features of the display controller proposed for the June87 Dragon pre straw man:
    o  Display:  There need be only one display.  Color would be awfully nice.  Refreshed at 60 Hz, non-interlaced.  Reasonably high 
    resolution, by today's commercial standards:  1000 x 1000 pixels.  Pixels can be 8, 4, 2, or 1 bits.  There's a colormap.
    o  Screen Refresh:  The frame buffer is kept in main memory, and accessed over the main bus, even for refreshing the screen.  This 
    solves the data consistency problem.  But the display requires 105 MB/s of raw bus bandwidth -- more than half of the total 
    available bus bandwidth.  A case was made that this bandwidth can be had almost for free, provided the display controller is 
    designed normally to use bus cycles not usable by other processors.  To keep the average bus latency for processors small, the 
    processor utilization of the available bus bandwidth should not exceed, say, 50%, so the other 50% is available for use by the 
    display.
    o  Simple:  The whole display controller consists of one custom chip, to inhale the pixels from the main bus, and an off the shelf 
    colormap-and-DAC chip, plus incidentals.  This is consistent with putting the whole Dragon system on one board.
Discussion of Pre Straw Men
June87 Dragon Pre Straw Man
Packaging
    (Please excuse my intruding on the domain of the packaging subsubcommittee.  I expect their report to be much more thorough than 
    this is.  I just wanted to be sure the following points were raised somewhere, because I think they affect the display 
    controller.)
    The packaging technology proposed for the June87 Dragon pre straw man is substantially more aggressive than what we had previously 
    been planning to use in the Dragonfly-compatible days.  It's neat stuff, of course.  But do we need exotic packaging?
    One result of using exotic packaging is that the main bus bandwidth is vastly increased.  Let's compare exotic packaging with 
    conventional packaging, with the main bus on a backplane and, say, four boards plugged into it.  Let's assume the number of 
    data wires and the bus protocol efficiency are fixed, so the performance differences are due only to bus cycle time.  Suppose 
    the conventional packaging's cycle time is between 80 ns and 100 ns.  (I think that's conservative, but I don't have the 
    evidence.)  Suppose the exotic packaging's cycle time is between 40 ns and 20 ns.  So the exotic packaging is between twice and 
    five times as fast as conventional packaging.  That's neat.  But do we need it for the June87 Dragon?
    The one real-time requirement placed on the main bus is the requirement of refreshing the display.  (Let's assume the disk and 
    ethernet are sufficiently buffered on the slow I/O bus side so that they don't present a real-time requirement to the main bus. 
     Let's further assume that tasks running on Dragon processors don't have real-time requirements.)  But it's possible to reduce 
    the display's required bus bandwidth by a factor of eight by demoting it to a monochromatic binary pixel display.  (For Dragons 
    after June87, displays probably rely on frame buffer caching, so they don't require extreme bus bandwidths; but even if they 
    did, the backplane bus, now a second-level bus, could have a bandwidth as high as a first-level bus with exotic packaging.)
    In terms of the goals for the June87 Dragon, sacrificing a color display would not be "spiffy" and wouldn't be as much fun 
    (perhaps).  But it would be adequate for software development and it might reduce the risk a great deal.
    The same is true of Dragon processors.  They wouldn't be as "spiffy" if they were slower, but they would be adequate for software 
    development.
    Another reason to prefer exotic packaging technology is that it's more compact.  It gives us the opportunity to make a one-board 
    (or perhaps two-board) system.  But do we need a one-board system for June87 Dragon?
    The June87 Dragon pre straw man proposes to have four Dragon processors.  Suppose, as a worst case, that conventional packaging 
    could support only two processors.  Would that be adequate for software development?  How about three processors?
    Exotic packaging is likely to be expensive.  When we visited Raychem, we got a very rough estimate that each design iteration of a 
    Dragon processor chip carrier might cost about $100,000.  In terms of risk, the Raychem chip carrier program is still in the 
    research phase.  They won't have a production facility until late this year at the earliest, and Richard Bruce guessed they 
    might not be in routine production for at least a year.  To meet June87 schedules, we'd be sharing their research risk.  Jean 
    points out that conventional (thick-film ceramic) hybrid technology has been around for years, used mostly by the military.  
    True, but I'm not sure how expensive it is.
    Perhaps the goal of demonstrating the viability of the Dragon concept is a bigger deal than I had realized.  For June87, who will 
    the competition be and how will we have to compare with it?  In particular, will we have to compare favorably in terms of speed 
    and size?  And will the competition be using exotic packaging?  (It is rumored that DEC has just bought Trilogy.)
    I guess there are no easy answers.

    <Do we need "exotic" packaging?  Why?>
        <Do we need more bus bandwidth that conventional packaging can provide?  Why?>
            <(A factor of between 2x and 4x?)>
            <Support display refresh -- but requirement could be more modest>
                <both for June87 (fewer bits per pixel) and beyond (frame buffer caching)>
            <Keeping Up With The Joneses -- who's the competition?  (DEC bought Trilogy)>
        <Do we need a one-board (or two-board) design?>
            <Propose four-board design -- here or elsewhere?>
        <Do we need a 6085 chassis?>
DA Tools
    In a recent discussion, Jean and I considered one reasonable development schedule for getting a design done in a year:
    Schematics:        3.5 months
    Layout:            3.5 months
    Fab & Test 1:    2.5 months
    Fab & Test 2:    2.5 months
    To meet such a schedule, it's clear that the DA tools to be used must be selected and provided to the environment very soon.  A 
    workable approach must be recommended for each of the following:
    Custom ICs
    Standard Cell ICs
    Chip Carriers
    Printed Circuit Boards
June87 Dragon Display Controller Pre Straw Man
    The June87 Dragon Display Controller pre straw man is not at all "spiffy," which
Display
    A single display is proposed.  That's fine, given that the goal is software development.  But is a color display needed or even 
    desirable?  Many people think that for reading text, a monochromatic display causes less eye fatigue.  I'm not sure whether 
    that's because a monochromatic display assures that garish colors are not used or because the color tube's shadowmask limits 
    effective screen resolution.  Some folks who saw Versatec's 1280 x 1024-pixel color display, refreshed at 60 Hz, had the 
    impression that eye fatigue wouldn't be a problem.  I guess it's no big deal to design for color and then plug in monochromatic 
    at the last minute.  But it might entail somewhat less risk to design for monochromatic from the beginning.
    The proposed display resolution is 1000 x 1000 pixels.  (How about 1280 x 960 pixels instead?  Some people like the 4:3 aspect 
    ratio, which is said to conform to available monitor shapes.)  The current LF display is 1024 x 800 pixels; this, I feel, 
    should be regarded as an absolute minimum for June87.  (Since we know how to do resolution-independent software, I guess 
    "spiffiness" is the main reason.  Of course, lower resolution would be more painful to use for software development -- or 
    anything else.)
    The proposed refresh rate, 60 Hz non-interlaced, is fine.  To do less than this would be an embarrassment.
    Supporting 8, 4, 2, and 1-bit pixels seems fine.  Beyond June87, extension to 24 bits is an open question.  Perhaps the extension 
    should be to 32 bits.  (Can we say memory is cheap enough to waste 8 bits out of every 32?  Can we find a use of them?)  
    Alternatively, perhaps 24 bits should be divided into three groups of 8 bits, with each group stored separately.  Anyhow, I 
    guess the scheme proposed for June87 doesn't preclude different schemes for subsequent generations.
Screen Refresh
    Refreshing the display over the main bus is fine, if the bus can support it.  It obviates having to manage a local frame buffer 
    memory, so it makes the display controller simpler and reduces risk.  I'm not convinced it points the way to future 
    generations, but perhaps it doesn't have to.
    If the display is 1280 x 960 8-bit pixels, refreshed at 60 Hz non-interlaced, the required raw bus bandwidth is 130 MB/s (not the 
    proposed 105 MB/s).
    The proposal hopes that the display's required bus bandwidth can be had almost for free, because the display controller will 
    normally use bus cycles no other processor can use.  To meet this requirement, the display controller must be able to request 
    bus transfers at a priority lower than that of any other processor.  But to guarantee that the screen is refreshed in real 
    time, the display must also be able to request bus transfers at a priority higher than that of any other processor.  The 
    display processor must keep a queue of pixels to be displayed.  Whenever the queue is fuller than a certain critical amount, it 
    makes low priority requests, but whenever the queue is less full than the critical amount, it makes high priority requests.  
    (This priority scheme must be implemented by the arbiter.  It it different from the one proposed in Pradeep's DynaBus spec 
    [PS], which I understand was going to be revised anyhow.  Multi-priority requests can be implemented with multiple request 
    wires or with a single wire, presenting a sequence of bits to the arbiter.  Details later.)
    The critical amount is directly related to the maximum number of bus cycles which can elapse between the time the display 
    controller asks for a transfer at high priority until the data is received.   One estimate of the critical amount follows.  
    (You may not agree with the numbers, but it doesnn't matter; the idea is the main point.  I'm assuming 40-ns cycles.)
    Completion of current request:        360 ns        9
    Main memory refresh:                    200 ns        5
    Request:                                        2
    Memory latency:                        360 ns        9
    Reply:                                            6
    Misc Pipelining:                                    2
                                                        
    TOTAL                                            28 cycles
    Whenever the pixel queue is fuller than the critical amount, the display can be a good citizen and make low priority requests.  
    Clearly, the larger the queue the better, but we really don't have any statistics to say how much better.  The question is how 
    badly "bursty" bus traffic from other processors deviates from average traffic.  For example, when a Dragon processor switches 
    tasks, it's likely that the caches contain nothing of relevance to the new task, so the processor will present a heavy load to 
    the bus until its working sets are established in caches.  (A single Dragon processor, running at 5% miss rate, is said to 
    require 16 MB/s, so at a 100% miss rate, it can swamp the bus.)  How long after such a task switch must the display controller 
    wait to find a free bus cycle?  We don't know.  If we assume just one Dragon processor on the bus, filling its caches at top 
    speed until full, then the display controller might like to have a queue with as much capacity as two Dragon caches.  That 
    seems to me an impractically large size.  More realistically, there are multiple processors on the bus, which makes things even 
    worse, but the peak cache activity profiles are likely to be less severe.  Pradeep has volunteered to do some simulations.  I 
    suspect that we'll have to take what we get in any case.  That is, if the display controller turns out not to be such a good 
    citizen on average, then so be it.
    By the way, implementing a huge pixel queue in custom silicon seems to me out of balance.  It costs a lot, and all you get is a 
    reduction in bus latency for Dragon processors.
    Also by the way, there has been some talk of "improving" the bus protocol for future generations of Dragons by allowing a higher 
    priority request to truncate the current packet to gain access to the bus.  I suppose that this would complicate life for the 
    sender and receiver of the truncated packet.  The purpose seems to be to reduce latency of high priority things.  Why are we so 
    concerned about doing this?
Area
    The pre straw man proposes to put the display controller on a small corner of the one printed circuit board.  I have the feeling 
    that this approach is risky.
    Last week I prepared an estimate of area required for the pre straw man display controller [JHRE].  Although conceptually the 
    design is just a custom chip plus a DAC chip plus incidentals, I counted upwards of 30 components altogether!  The footprint 
    area alone was estimated at about 13.5 square inches, which I doubled, to 27 square inches, to account for the fact that the 
    footprints can't be densely packed into a given rectangle.  Printed circuit routing requirements were not considered.  The 
    display controller has to connect to the main bus and the slow I/O bus, and perhaps also a remote video oscillator.  The coax 
    video connectors must be on the edge of the board.  These constraints affect whether or not the display controller can 
    effectively use any given shape of printed circuit board real estate.  The component list is absolutely guaranteed to change.  
    (For example, should I be using Alfred's bus interface chips?  Probably so, but I assumed otherwise for the estimate.)  What we 
    really need, to verify that the design will fit in a given area, is to do a detailed layout, not only of the display controller 
    but of the whole board.  Do we really want to commit to a one-board design before doing this?  I have the further concern that 
    renegotiating space among subsystems could turn out to consume a lot of time and energy and might result in compromises nobody 
    really wants.
    There has been some talk of perhaps putting the dispay contoller on a chip carrier module too, perhaps with some other subsystem.  
    I think mixing in another subsystem at that level is asking for trouble.  What's our non-recurring engineering budget for chip 
    carrier modules?  How many different types can we afford?
Miscellaneous
    The idea of limiting the design to using at most one custom chip is prudent under the circumstances.
    It has been suggested that a hardware cursor would be nice.  Perhaps so, but I wouldn't feel too badly about leaving it out, given 
    that the goal is software development.  The current display software, dealing with the Dorado color display, already knows how 
    to do software cursors.  (Sure it's a pain, but so what?)  Display controllers for generations beyond June87 may have entirely 
    different ideas about how to implement things like cursors, so it's not clear a June87 implementation of hardware cursors would 
    be of lasting value.
Discussion of Other Alternatives
Future Dragon Display Controller
    Here's a list of some of the features a future Dragon display controller might have.  The list is limited to features that affect 
    system integration.
    Packaged on its own printed circuit board:  This will be desirable if we want to include a local frame buffer with the display 
    controller, which I think is inevitable.  The area could be used for all sorts of other things too.  This plan provides the 
    flexibility required to plug in new display controllers as they are developed.
    Attaches to second-level main bus:  This bus, because it is short (say five connections?) is said to be potentially as fast as 
    first-level busses.
    Has its own frame buffer:  Ultra-high resolution displays require more bandwidth for screen refresh than is reasonable to expect 
    from the fastest main busses imaginable.  But we want to retain a memory mapped architecture, so that Dragon processors can 
    treat the frame buffer as main memory.  What is needed is a mechanism for assuring data consistency.  We imagine that the 
    display controller works something like a secondary cache, in that it maintains a valid bit for each oct of frame buffer 
    memory.  For example, for a 32 MB frame buffer, 1 Mb of valid bits would needed.  Unlike the secondary cache, the valid bits 
    must be stored externally in static RAM chips.  This raises a concern that the display controller be as fast as the secondary 
    cache in responding to bus packets that affect valid bits; we assume that a satisfactory solution can be found.  According to 
    the present bus protocol, after a Dragon processor reads an oct from memory, it is obliged to issue a write single when it 
    first writes any word of the oct, so that a secondary cache (or frame buffer) can mark the oct invalid.  When an oct is written 
    to the frame buffer, its valid bit can be set.  When the display processor does screen refresh from the local frame buffer, it 
    requests of the main bus only the octs which are marked invalid.  This scheme offers a viable alternative to the previous plan, 
    flushing and laundering caches, which has fallen into disfavor.  By the way, unlike a secondary cache, the frame buffer memory 
    needn't be backed up by an equal amount of main memory.
Previous Display Controller Proposal
    Perhaps it would be interesting to compare the current display processor proposals with a proposal made last year, described in CSL 
    Notebook Entry 85CSLN-0002 [FCJH].  That proposal was definitely more ambitious than now seems appropriate for June87.
    The central concept of this proposal was that it could do "thing composition on the fly."  That is, it could compose each scan 
    line, in real time, from a set of rectangular pixelmaps stored in the frame buffer, as directed by a display list.  The 
    composition was performed by painting pixel spans into the scan line in back-to-front order, with pixels in front overwriting 
    pixels behind; however, pixels were permitted to have a "transparent" value.  This paradigm served as an implementation of 
    cursors, overlays, and multiple windows.
    The proposed packaging was an entire printed circuit board.  (Actually, the proposed 24-bit implementation required three such 
    boards.  Details of inter-board synchronization were not entirely worked out.)
        There was a local frame buffer, implemented with dual port dynamic RAMs.  An internal bus, 256 bits wide and operating at 20 MHz, 
        connected the DRAMs' second port to the pixel composer.  This large bandwidth was thought to be necessary for "thing 
        composition on the fly," because the pixel painting rate can be many times the screen refresh rate.  The system relied on 
        flushing and laundering caches to assure data consistency.
        A double-buffered scan line (of, say, 1280 pixels each -- the details were never really frozen) served not only as a place to 
        compose scan lines but as a pixel output queue.  The pixel painter painted one scan line while the pixel reader 
        asynchronously output the other scan line to the screen.  Once per scan line, during horizontal retrace, resynchronization 
        occurred so that the painter and the reader could swap scan lines.  In this way, the memory for scan lines could be single 
        port, not dual port, as is common for FIFOs.  To handle the large number of pins of the internal bus (256) and the silicon 
        complexity of the double-buffered scan line, the composition function was split among four identical chips, each processing 
        two bits of each eight-bit pixel.
June87 Dragon Alternative
    I'd like the Dragon subcommittee to consider the possibility of proposing for the June87 Dragon a four-board system (processor, 
    memory, I/O, display) that uses conventional printed circuit board technology and pin grid arrays of the kind already developed 
    by Bill Gunning.
    I'm not saying I'm convinced it's the best alternative.  But I think it has the advantage of having lower risk than the June87 
    Dragon pre straw man.  (If I'm wrong, convince me.)  We might decide that such a system wouldn't meet the goals for a June87 
    Dragon.  But then I'd like to understand better what those goals are.
    As discussed in a previous section (Discussion of Pre Straw Men), such as system would be much slower than the pre straw man.
June87 Dragon Display Controller Alternative A
    Following from the previous section, I'd like the Dragon subcommittee to consider putting the display subsystem on its own printed 
    circuit board, interfacing to the main bus on the backplane via a conventional two-piece edge connector.  Ideally, I'd like to 
    see the slow I/O bus on the backplane too, since that's the proposed vehicle for loading the colormap.
    For a June87 display controller, I'd propose to implement only the functionality proposed by the pre straw man.  That would mean 
    that most of the printed circuit board would be wasted.  A 10.9 x 16 inch printed circuit board has about 175 square inches of 
    gross area, before subtracting forbidden areas around the edges.  By one aggressive estimate, a display controller might need 
    only 27 square inches.  But so what?  Printed circuit board real estated can be relatively inexpensive.
    Anyhow, if the real estate is there, we might find a use for it.
June87 Dragon Display Controller Alternative B
    One of the things that worries me about putting the bus interface and the DAC interface on the same custom chip is that the two 
    interfaces are asynchronous.  Testing such a part could be a real problem.  Discussions with Bill Gunning have reminded me of 
    how much respect asynchronous interfaces should be given.  He points out, for example, that the theoretical formula for 
    estimating the probability of metastability (inability to get a digital result when sampling an asynchronous signal) fails to 
    take into account physical factors.  For example, an event on the bus side, say, could cause the power distribution wires to 
    bounce, which could affect the timing of the DAC side in such a way that a metastable event is much more likely than one might 
    suppose.
    Given that printed circuit board real estate is cheap, I had considered the possibility of putting the pixel queue into 
    off-the-shelf FIFO parts.  Given that state-of-the-art CMOS FIFOs run at 35 MHz maximum, a width of 64 is apparently needed to 
    keep up with the bus.  On the DAC side, this must be multiplexed down to 32 wires.  (The scheme had somewhat more appeal when I 
    mistakenly thought a 32-bit FIFO would be wide enough.)  Anyhow, since such a FIFO would have a capacity of only, say, 512 
    8-bit pixels, its ability to outlast the "bursty" bus traffic due to Dragon processors and hence to avoid making high priority 
    bus requests might be rather limited.  Before the simulations are in, it's hard to be sure.  (There's also a detail of knowing 
    when the FIFO is critically full to be worked out.)  In the end, I'm not sure I like the idea of actually using commercially 
    available FIFOs, but I think I like the idea of having the printed circuit board real estate available to think about it.
    Similarly, if there were space available, it might make sense to implement the display timing generator outside the custom chip.  
    One approach that seems interesting is to use a Xilinx Logic Cell Array, sort of a programmable logic device with downloadable 
    static RAM "fuses."  There's a 33 MHz version, on paper at least, which is adequate.  I don't yet know whether it's even 
    feasible, let alone desirable.
June87 Dragon Display Controller Alternative C
    If building a custom chip with an asynchronous interface seems risky and using off-chip standard FIFOs is undesirable, then what 
    about defining the bus clock and pixel clock to be synchronously related?  To match the proposed 40 ns bus clock, we might 
    choose a 10 ns pixel clock.  (For a 1280 x 960 pixel display, refreshed at 60 Hz, a 10.9 ns pixel would be better, but a 10 ns 
    pixel would work.)
        The difficulty is that over time we may wish to change the speed of one of these clocks without changing the other.
June87 Dragon Display Controller Alternative D
    For the June87 Dragon, the possibility of purchasing a commercially available display controller, with a standard bus interface 
    (e.g., VME, Multibus, etc.), deserves to be mentioned.  This is the approach we always used to say we would take.  (The latest 
    such candidate was the Drgonfly's monochromatic display.)  This approach has the advantage that the "real" display can be 
    developed on its own schedule without impacting the schedule of the rest of the machine.
    As I understand it, the June87 Dragon will have a slow I/O bus of some kind implemented, but we don't yet know what kind.  Does 
    this mean that it will have a standard bus interface such as VME or Multibus?
    One problem with this alternative is that it will not a memory mapped architecture.  Rather, it will have its own local frame 
    buffer, which must be written as if it were I/O.  So it will probably be more work to convert existing software to use such a 
    display.  This software work is not to be taken lightly, as it also has an impact on the overall Dragon schedule.
    The packaging of such a display controller withing the June87 Dragon is another problem.
Final Remarks
    What display controller to propose for the June87 Dragon depends on lots of factors, which the subcommittee should further 
    consider.  It is important to understand the goals for June87 Dragon.  And having understood them, I think it's important to 
    write them down.
    If we decide to implement a one-board solution, using exotic packaging and allocating the display controller about 30 square inches 
    of real estate, then it's hard to see an option other than the one proposed by the pre straw man.  Unless we were willing to 
    settle for a commercially available display (Alternative D above).
    A minimalist approach ought to suffice.  I put a hardware cursor at low priority.  It might turn out that we want a monochromatic 
    display, but for now I like the idea of a color DAC chip, with its entourage of support components (as an area placeholder if 
    nothing else).
    The asynchronous nature of the proposed custom continues to make me nervous.
    I'd definitely like some simulation data to support a choice of FIFO size.
    If we decide to implement a four-board solution, using more or less conventional packaging, then I think the risk of making the 
    display controller will be reduced substantially.  Again, I don't know if that makes sense in terms of the goals for the June87 
    Dragon.
    I'd be very interested in seeing if we can agree on a plan for interfacing future display controllers.