TimerLoopDoc.tioga
Copyright Ó 1992 by Xerox Corporation. All rights reserved.
Bier, December 9, 1992 11:36 am PST
TimerLoop
CEDAR 10.1
TimerLoop
Timing Short Code Fragments
Eric A. Bier
Ó Copyright 1992 Xerox Corporation. All rights reserved.
Abstract: While the CodeTimer package is useful for timing routines of reasonably long duration (at least several milliseconds), it doesn't have the resolution for shorter code fragments. The well-known way to time a short fragment is to put that fragment in a loop, run it for many iterations, record the total ellapsed time, subtract the overhead for the loop, and divide by the number of iterations. In a nutshell, that's what TimerLoop does. You can use it to time very simple constructs, like x ← 2*y. At this point, TimerLoop assumes that the fragment to be timed takes less than 5 milliseconds on average. WARNING: These statistics are far from perfect. They will be affected by garbage collections, other threads or processes using the CPU, and other factors. TimerLoop does run some simple tests to see if the CPU load is obviously fluctuating wildly, but it won't always detect problems.
Created by: Bier:PARC
Maintained by: Eric A. Bier [Bier:PARC]
Keywords: performance, tuning, timing
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

1. TimerLoop Demos
There are several demonstrations provided with TimerLoop. To use them type the given command to a Cedar Commander. See TimerLoopImpl to see how they are implemented.
TimeArithmetic
Evaluates a number of expressions such as, x ← 2*x, or x ← RealFns.SqRt[x] and prints out the time taken by each in a human-readable format to the Commander typescript.
TimeIO
Performs calls to IO.GetCedarToken and IO.GetCedarTokenRope. Note that these statistics are much less reliable than those for TimeArithmetic. This is probably due to the fact that GetCedarTokenRope allocates storage.
Sample Statistics
Here are some results I got on my SPARC-2 on December 9, 1992:
% TimeArithmetic
All times in microseconds. Spending at least 1 second on each test.
x ← 3: 0.27 avg, 10.15 worst avg over 516 calls, 764 trials (init: 0.52 avg)
x ← x+1: 0.97 avg, 135.02 worst avg over 455 calls, 514 trials (init: 0.67 avg)
x←x+1; x←x+1: 1.75 avg, 143.94 worst avg over 428 calls, 453 trials (init: 0.68 avg)
x ← x*2: 1.00 avg, 18.33 worst avg over 457 calls, 459 trials (init: 0.70 avg)
x ← RealFns.SqRt[x]: 5.34 avg, 194.40 worst avg over 322 calls, 315 trials (init: 0.72 avg)

n ← 3: 0.34 avg, 17.66 worst avg over 508 calls, 708 trials (init: 0.52 avg)
n ← n+1: 0.28 avg, 10.64 worst avg over 496 calls, 641 trials (init: 0.63 avg)
n ← n*2: 0.27 avg, 6.12 worst avg over 493 calls, 474 trials (init: 0.64 avg)
% TimeIO
All times in microseconds. Spending at least 3 seconds on each test.
GetCedarTokenRope: 149.95 avg, 5819.42 worst avg over 19 calls, 340 trials (init: 5.79 avg)
GetCedarToken: 80.96 avg, 1326.78 worst avg over 49 calls, 522 trials (init: 3.94 avg)
SkipWhitespace: 37.74 avg, 718.97 worst avg over 91 calls, 469 trials (init: 3.37 avg)
There are a few disturbing results here. Floating point assignment appears faster than natural number assignment. x←x+1; x←x+1 does not take twice as long as x ← x+1 (although it is not far off).
These results also point out that GetCedarTokenRope, GetCedarToken, and SkipWhitespace are all painfully slow.
2. Performing Your Own TimerLoop Tests
The TimerLoop Interface
To use TimerLoop for your own code fragments, use the routines in TimerLoop.mesa. TimeForAPeriod and ReportForAPeriod are probably the most generally useful. See TimerLoopImpl.TimeArithmetic and TimerLoopImpl.TimeIO for examples of init and doit routines.
TimeForAPeriod and ReportForAPeriod assume that the code fragments you are testing take a good deal less than 5 milliseconds. Contact the implementor if you need to test longer routines, or use the CodeTimer package.
Hints for Better Statistics
Keep your init routines short. If your init routines are longer than your doit routines, there will be a lot of noise in the measurements.
Avoid allocating storage in the init routines. If you get the garbage collector going, your statistics become much less deterministic.
Run your statistics several times to see if you are getting consistent results.
ReportForAPeriod will retry a trial if it determines that the statistics are varying wildly. As a result, if other threads or UNIX processes are making CPU times unpredictable, fewer usable trials will be run in a given period of time. Looks at the "trials" numbers to see how many trials were actually run before you put much faith in the statistics.
Try running your tests in a fresh Cedar world. Older worlds are likely to have more threads interrupting the test and longer garbage collections.
Try running both your code fragment and your code fragment repeated twice as separate statistics. For example, you might run x←x+1 and then run x←x+1; x←x+1. If the second takes twice as long as the first, you will have more confidence in the statistic.