Number: 344

Date: 28-Mar-84 10':28':43

Submitter: Sannella.PA

Source: Masinter.pa

Subject: Run BITBLT Benchmark

Lisp Version: 

Description: '
Date': 27 Mar 84 23':14 PST'
From': Masinter.pa'
Subject': [masinter.pa': [Deutsch.pa': BitBlt timings]]'
To': LispSupport'
'
AR, category Benchmarks (I know we don''t have that category, maybe we should start marking the ones that don''t have a good category with something that our converter can recogize?)'
'
Anyway, the AR is to please run these benchmarks, distribute the results to LispCore↑ first for comment (e.g., if not better than Smalltalk figures, there is something wrong because ST-80 has more overhead getting in and out.)'
'
'
     ----- Forwarded Messages -----'
'
Date': Wed, 21 Mar 84 21':36':09 PST'
From': Deutsch.pa'
Subject': BitBlt timings'
To': Wyatt, Masinter, Trow'
cc': Adele, Ingalls'
'
Folks,'
'
I would appreciate it if you would run Rob''s timing measurements on, respectively, Mesa (Cedar or not, your choice) on the Dorado and Dolphin, Interlisp on the Dorado and Dolphin, and Smalltalk on the Dolphin.  The Smalltalk code I used for my measurements is on [Filene]<Deutsch>BitBlt-class-timing.st.'
'
I consider the Smalltalk measurements non-proprietary, since the essential information about the performance of Smalltalk BitBlt on the Dorado and Dolphin has already been published (in the second Smalltalk book).  However, if you feel more proprietary about the other systems, feel free to decline.  Also, if you feel it would be appropriate to run timings on the Dandelion for Mesa and Interlisp, please feel free to do so.'
'
Please send the results directly to Rob, cc to me.'
'
------------------------------'
'
Date': Wed, 20 Jan 15 0':16':00 EST'
From': rob.btl@csnet-relay.arpa'
To': deutsch%parc-maxc.csnet-relay@csnet-relay.arpa'
Received': from CSNET-RELAY.ARPA by PARC-MAXC.ARPA; 21 MAR 84 15':55':03 PST, by csnet-relay via btlpob;  21 Mar 84 5':05 EST'
'
Dear Peter, BitBlt Aficionado':'
'
I am working on a bitmap graphics course for SIGGRAPH this year, and would'
like to include in the course notes a sort of catalogue of bitblt'
implementations.  I would greatly appreciate timing figures on the'
following uses of BitBlt':'
'
	scroll 800wide x 1024high bitmap 1 pixel horizontally'
	scroll 800wide x 1024high bitmap 1 pixel vertically'
	draw 8wide x 7high character in XOR mode at random bitmap positions'
	texturing in XOR mode a 40x40 square at random bitmap position'
'
For the last two, I want the time averaged over all distinguishable x'
positions, to remove any artifacts due to word boundaries.  The 8x7'
character is chosen by the minimum enclosing rectangle of our ''a''.'
These tests are, like most benchmarks, quite arbitrary, but I am hoping'
to get enough different machines to make comparisons interesting.'
Note that I am after BitBlt timings, not equivalent timings for special-'
purpose primitives or what you an achieve by careful hand-coding for the'
examples.  Basically, how fast can BitBlt do these tasks?'
'
On the phone you mentioned the Dolphin and Dorado as subjects.'
'
Many thanks for your time.'
'
				-rob'
'
------------------------------'
'
Date': Wednesday, 21 March 1984, 9':27':45 pm'
From': Deutsch.pa'
To': rob.btl@csnet-relay'
In-Reply-To': "rob.btl@csnet-relay.arpa''s message of Wed, 20 Jan 15 0':16':00 EST"'
sent-from': Chelsea'
'
Rob,'
'
Here are the results for the Dorado running Smalltalk.  The scrolling numbers were obtained by taking the numbers for an 800 wide x 512 high bitmap and doubling, since the Dorado''s display is 1024 wide x 808 high.  The numbers appear to be repeatable to within about 2%.'
'
	scroll 800wide x 1024high bitmap 1 pixel horizontally'
		23.8 ms'
	scroll 800wide x 1024high bitmap 1 pixel vertically'
		23.3 ms'
	draw 8wide x 7high character in XOR mode at random bitmap positions'
		51.2 microseconds'
	texturing in XOR mode a 40x40 square at random bitmap position'
		156 microseconds'
'
By "texturing" I assume you mean tiling with a non-white, non-black bitmap (on the Dorado the tile size is 16 x 16).'
'
Note that for the last two tests I used uniformly distributed but not random bitmap positions, namely, for i = 0 to 15 do test(i * 247 rem 400).  This makes the Dorado look a little worse than necessary, because it tends to produce more faults in the memory cache.  On the second test, this appears to be significant (the time was only 133 microseconds when I used consecutive coordinates). '
'
I''ll ask someone else to run it on the Dolphin, and perhaps on the Dorado in Mesa and/or Lisp.'
'
Please send us a copy of the complete set of timings from all the machines you find out about.'
'
Have fun,'
'
P.'
'
-----'
'
Date': 22 MAR 84 09':59 PST'
From': MASINTER.PA'
Subject': BITBLT timings'
To':   deutsch'
cc':   wyatt, masinter, trow, adele, ingalls'
'
Sometimes it isn''t that the figures are proprietary, but merely that it is in bad form to distribute them. We''ve been cautious about benchmarks of Interlisp-D coming from Xerox, because, while Xerox customers are free to perform timings and publish them, if XEROX publishes them, there is some implication that we think that they are meaningful, complete, consistent with the methodology of other benchmarks, etc.'
'
For example, in Interlisp-D one CAN construct ahead of time the bitblt table, and then just measure the time of the BITBLT opcode itself; alternatively, one can include the setup overhead. '
'
While setup overhead should be included because it is incurred by most users, I wonder about the methodology others might use in generating the same figures for his tables. Distributed benchmarking performed by lots of people (some more gung-ho vendors than others) can have a lot of variation in the method of measurement. (As opposed, say, to the Smalltalk benchmarks which have some hope of being the same program.'
'
With all of that said, I would be interested in contributing Interlisp-D timings if in exchange I could get a copy of all of his course materials. (This is what is known as Scientific Interchange.)'
'
-----'
'
Date': Thu, 22 Mar 84 11':17':30 PST'
From': Deutsch.pa'
Subject': Re': BITBLT timings'
To': MASINTER'
cc': wyatt, trow, adele, ingalls'
In-Reply-To': "MASINTER''s message of Thu, 22 Mar 84 9':59':00 PST"'
'
I''m sure Rob would be happy to send us a copy of the full course materials.  I understand your concerns about benchmarks, and I think it would be appropriate for you to communicate them to Rob when you send him the numbers.'
'
Just for your information, the measurements I did from Smalltalk did NOT include user-level setup overhead.  In other words, they were the best that a user can get out of the system, by calling the BitBlt method with an already set-up table.  (Since Smalltalk doesn''t have anything corresponding to the LL level, you still have to pay some lower-level overhead for testing the BitBlt table arguments for being SmallIntegers, extracting some parameters from the bitmap objects. etc.)  I believe that this is the most appropriate measure of BitBlt''s performance for Rob''s purposes, since some Smalltalk system code actually approaches this level of performance pretty closely.'
'
-----'
'
Date': 30 Apr 84 18':57 PDT'
From': JonL.pa'
Subject': BitBLt timings in Interlisp-D'
To': Rob.btl@csnet-relay'
cc': Deutsch, Wyatt, Masinter, Trow, Adele, Ingalls, JonL, Sheil'
'
'
Your request to Peter Deutsch for bitblt timings on Xerox''s D machines ultimately came to me, to do them for Interlisp-D running on a Dorado.'
'
I must first mention that there are essentially two interfaces to BitBlt in Interlisp-D -- one at a very-low system level, and one at the full bells-and-whistles, documented, user-interface level.  I''m going to report times for both levels since the former will more accurately reflect the load that BitBlt lays on system code, and the latter will reflect that on "naive" user code.  I''ll also report a more significant difference with respect to the effect of the Dorado cache.'
'
The very-low-level interface to bitblt  PilotBitBlt table as an argument, and to varying degrees the Interlisp system''s usage of BITBLT falls into a paradigm of (1) building such a table and cacheing it in some data structure, and then (2) modifying a couple of fields in this table followed by an invocation of the bitblt opcode on that table.  The following timings were obtained by a "differential" technique, and for the system-level entry they represent the actual time spent in the bitblt opcode.  Peter''s smalltalk timings are essentially comparable when you consider that he timed a low-level but documented interface which does a small amount extra beyond running the bitblt opcode.  There is also the difference that Interlisp-D uses the newer Pilot format bitblt, whereas SmallTalk uses the older Alto one, but I suspect that this difference is in the noise.'
'
The tests are':'
  BBT.SCROLLRIGHT -- scroll a 800wide by 1024high bitmap 1 pixel horizontally'
                     (as Peter did, I used an 800x512 bitmap and multiplied'
                     the resulting times by 2)'
  UBBT.SCROLLRIGHT -- same, except user-level entry through function BITBLT'
  BBT.SCROLLUP  -- scroll a 800wide by 1024high bitmap 1 pixel vertically'
  UBBT.SCROLLUP -- same, except user-level entry through function BITBLT'
  BBT.BLTCHAR -- draw a 8wide by 7high character in XOR mode at *random*'
                 bitmap positions (500 trials, with X and Y coordinates'
                 selected by a RANDom function, in the same bitmap as for'
                 scroll right and up tests).'
  UBBT.BLTCHAR -- same, except user-level entry through function BLTCHAR'
  BBT.TEXTURE -- texturing a 40x40 square at random bitmap positions (used'
                 the same bitmap as above, with X coordinate chosen by a '
                 RANDom function over the interval 0 to BitMapWidth-40, and'
                 similarly for the Y coordinate).  I used a motley shade of'
                 gray for the texture pattern.'
  UBBT.TEXTURE -- same, except user-level entry through function BITBLT'
'
The timings are (ms = milliseconds, us = microseconds)'
  BBT.SCROLLRIGHT    24.2ms'
  UBBT.SCROLLRIGHT   25.2ms'
  BBT.SCROLLUP       23.0ms'
  UBBT.SCROLLUP      23.8ms'
  BBT.BLTCHAR        33.3us'
  UBBT.BLTCHAR      336. us'
  BBT.TEXTURE       153.8us   (but for "same" place -- 90.0us)'
  UBBT.TEXTURE      478. us'
'
I ran the BBT.TEXTURE once defeating the randomization of the X and Y coordinates  -- the result was a 70% speedup! -- (153.8 - 90.0)/90.0 --'
which implies that the Dorado cache can be a very important factor in '
these timings.'
'
I hope that you can send a summary of your results to me, in addition to sending them to Peter; I''ll re-circulate them among the Lisp development'
group here at Xerox.'
'
-- Jon L White --'
'
-----'
'
Date': 30 Apr 84 19':21 PDT'
From': JonL.pa'
Subject': BitBlt timings in Interlisp-D -- the DLion story'
To': Deutsch'
cc': Wyatt, Masinter, Trow, Adele, Ingalls, LispCore↑.pa'
'
The bitblt timings mentioned in the message to Rob.btl@csnet-relay were obtained by running functions of the same name as the test, to be found on the file [Phylum]<Gabriel>BBTEST (and .DCOM).  Loading in this file sets up the requisite bitmaps, windows, etc.  Each test will call TIMEALL twice -- once to obtain the overhead of the testing code and once "for real" (this is the so-called "differential technique").  Each of the scroll functions does it''s work exactly once; the smaller ones for which the test requires "random positionings" does 500 bitblts in "random" positions.'
'
I ran these tests on a DLion also, but Beau suggested holding back the numbers from the outside world in order to minimize potential conflict with OPD about Dlion timings.  They are reproduced herein with the understanding that they *may* be Xerox confidential data.'
'
The timings are (ms = milliseconds, us = microseconds)'
                     Dorado      DLion        Ratio'
  BBT.SCROLLRIGHT    24.2ms      191.6ms       7.9 '
  UBBT.SCROLLRIGHT   25.2ms      194.4ms       7.7'
  BBT.SCROLLUP       23.0ms      187.6ms       8.2'
  UBBT.SCROLLUP      23.8ms      190.6ms       8.0'
  BBT.BLTCHAR        33.3us      137. us       4.1'
  UBBT.BLTCHAR      336. us     1220. us       3.6'
  BBT.TEXTURE       153.8us      885. us       5.8'
  UBBT.TEXTURE      478. us     1918. us       4.0'
'
As one can see, the data-movement-intensive tests give the Dorado a factor of 8 edge over the DLion; the computation-intensive tests give it only a factor of 4.  I may have verbally suggested a better showing for the DLion at the Lisp group meeting two weeks ago -- I was in error then.'
'
-- JonL --'
'
-----'
'
Date': Tue, 1 May 84 10':51':09 PDT'
From': Deutsch.pa'
Subject': BitBlt timings on the Sun'
To': JonL, Wyatt, Masinter, Trow, Adele, Ingalls, LispCore↑.pa'
cc': McCullough, MRoberts'
'
I ran Rob''s timings on our current implementation of Smalltalk-80 on the Sun.  The current implementation works in the following simple-minded way': the system keeps a copy of the display bitmap in main memory.  It always performs the BitBlt operation memory-to-memory.  *Then*, if the destination is the display, it either copies the target rectangle into the hardware display bitmap (if a source was involved), or does the operation a second time directly on the display bitmap (if the operation only involved a texture).  By changing over to doing the operation directly on the display bitmap, we can probably speed up the normal cases of texturing and painting by about a factor of 2, and speed up scrolling by somewhat less, at the expense of more complexity and also perhaps slowing down operations where the display is the source.'
'
Anyway, here are the timings.  They should not be disseminated, given possible sensitivities on all sides (us, XSIS, Sun Microsystems, ...).  For comparison, JonL''s Lisp timings are given below.'
'
Scroll right		579.2 ms'
Scroll left		325.2 ms'
Scroll down		315.4 ms'
Scroll up		312.8 ms'
Character		769 us'
Texture			1675 us'
'
The slow scroll-right time results from the fact that the blt has to be done right-to-left in this case, and in my implementation this uses the slowest and most general loops.'
'
The slow character time results from a large overhead in calling the Smalltalk BitBlt primitive -- it has to unbox about 15 arguments, among other things.  Smalltalk has a string display primitive that goes much faster per character.'
'
The Interlisp-D timings are (ms = milliseconds, us = microseconds)'
                     Dorado      DLion        Ratio'
  BBT.SCROLLRIGHT    24.2ms      191.6ms       7.9 '
  UBBT.SCROLLRIGHT   25.2ms      194.4ms       7.7'
  BBT.SCROLLUP       23.0ms      187.6ms       8.2'
  UBBT.SCROLLUP      23.8ms      190.6ms       8.0'
  BBT.BLTCHAR        33.3us      137. us       4.1'
  UBBT.BLTCHAR      336. us     1220. us       3.6'
  BBT.TEXTURE       153.8us      885. us       5.8'
  UBBT.TEXTURE      478. us     1918. us       4.0'
'
(The last 3 DLion numbers were ms rather than us in JonL''s message, I assume this was a typo.)'
[[I corrected this in the AR text -- JonL, 11-May-84]]'
'
These comparisons actually make the Sun look quite respectable compared to the DLion.  This is somewhat surprising given that the Sun also has no barrel shifter (although it can shift a 32-bit quantity N bits in 800+200N ns) and its memory system is just about the same speed (400 ns for a 16-bit access vs. 411 -- but I guess on the DLion accesses to the display bitmap run slower).  I don''t think there''s a bug in my timing program, since I ran the identical program on the Dorado and got numbers very close to JonL''s.'
'
Incidentally, these tests were run on a Sun-1 with a 10 MHz 68000 with no wait states.  The times should be just a little faster with a 10 MHz 68010.'
'
     ----- Next Message -----'
'
Date':  1 May 84 11':34 PDT'
From': masinter.pa'
Subject': Re': BitBlt timings on the Sun'
In-reply-to': Deutsch.pa''s message of Tue, 1 May 84 10':51':09 PDT'
To': Deutsch.pa'
cc': JonL.pa, Wyatt.pa, Masinter.pa, Trow.pa, Adele.pa, Ingalls.pa, LispCore↑.pa, McCullough.pa, MRoberts.pa'
'
Yes, the DLion has a fairly heavy performance hit when doing BITBLT to the display (as opposed to memory-to-memory) because the DLion display bank is dual ported, and you don''t get all the availble memory references.'
'
JonL, is it possible to retry the timings to non-display bitmaps? I don''t think it will make any difference on the Dorado, but would on the DLion.'
'
-----'
'
Date': Wed, 2 May 84 0':24':15 PDT'
From': Deutsch.pa'
Subject': More on Sun BitBlt'
To': JonL, Wyatt, Masinter, Trow, Adele, Ingalls, LispCore↑.pa'
cc': McCullough, MRoberts'
'
By breaking out vertical scrolling as a special case I was able to reduce the vertical scrolling times on the Sun to 204.8 and 202.2 ms respectively (from 315.4 and 312.8 ms), making them approximately 7% slower than the faster of the two Lisp times on the DLion.  The times could be reduced by approximately another 5% by unrolling a loop a few more times.  (Remember that these times still actually do the blt twice, once memory-to-memory, once memory-to-screen.)'
'
I''m tempted to draw a moral': the DLion memory system is only about as good as the Sun''s, the DLion wins on computation mostly because its instruction memory is 3 times as fast, but this doesn''t help it on data-movement-intensive tasks.'
'
-----'
'
Date':  4 May 84 14':16 PDT'
From': JonL.pa'
Subject': Re': More on Sun BitBlt -- and real time animation'
In-reply-to': Deutsch.pa''s message of Wed, 2 May 84 0':24':15 PDT'
To': Deutsch.pa'
cc': JonL.pa, Wyatt.pa, Masinter.pa, Trow.pa, Adele.pa, Ingalls.pa, LispCore↑.pa, McCullough.pa, MRoberts.pa, Tong.pa'
'
The DLion times for memoryBandwidth-limited BitBlt''ing would look modestly better if the destination memory weren''t the screen bitmap -- this memory appears to be slower, partially due to the contention with the display task (Don Charnley': any other opinions?).  '
'
Indeed, a year ago, from our comparisons of the asymptotic per-pixel times between the DLion, Dolphin, and Dorado, we conjectured that the DLion would be generally faster than the Dolphin, except for large-area BitBlts; I believe I volunteered these data to the Xerox-internal person who requested BitBlt timings about six months ago (mostly, he wanted to evalueate micro effects, and our results were "asymptotic", so Ed Fiala''s exposition was probably the best answer).'
'
However, just the other day I saw a VERY INTERESTING demonstration of real-time animation from within LOOPS -- Chris Tong had a "cute" little hack, for which there ought to be an interesting question about the help the LOOPS provides in organizing such tasks.  However, I can''t beleive that anything other than a Dorado, as of this point in time, could handle the BitBlt load.  A factor of 8 over the DLion has got the be insurmountable, considering that the "think time" to shuffle bitmaps is probably so much less than the bit-movement time.'
'
-- JonL --'


Workaround: 

Test Case: 

Edit-By: masinter.PA

Edit-Date: 17-Jul-84 17':19':56

Attn: 

Assigned To: JonL

In/By: 

Disposition: Code to run these benchmarks is in the file {Phylum}<Gabriel>BBTEST.  Results were discussed at the Lisp group meetings on 16-Apr-84 and 23-Apr-84, and were communicated to the original interested parties on 30-Apr-84.

System: Windows and Graphics

Subsystem: Other

Machine: 

Disk: 

Microcode Version: 

Memory Size: 

File Server: 

Server Software Version: 

Difficulty: 

Frequency: 

Impact: Minor

Priority: 

Status: 

Problem Type: 

Source Files: