SpyDoc.tioga
Copyright Ó 1985, 1986, 1987 by Xerox Corporation. All rights reserved.
Rick Beach, March 12, 1987 6:58:04 pm PST
Russ Atkinson (RRA) May 14, 1986 5:12:50 pm PDT
Mike Spreitzer September 24, 1986 12:49:34 pm PDT
Eric Nickell, December 1, 1986 12:38:39 pm PST
Doug Wyatt, February 10, 1987 11:39:50 am PST
SPY — A CEDAR PERFORMANCE TOOL
CEDAR 7.0 — FOR INTERNAL XEROX USE ONLY
Spy
-- A Cedar performance tool
Russ Atkinson
© Copyright 1985, 1986, 1987 Xerox Corporation. All rights reserved.
Abstract: The Cedar Spy is a tool for monitoring the performance of programs. It can measure several different aspects of performance: CPU usage, page faults, allocations, or process time. Hopefully a programmer will find that the Spy is the only tool he needs for the bulk of his performance analysis.
Created by: John Maxwell
Maintained by: Russ Atkinson <Atkinson.pa>
Keywords: performance measurement, tuning, page faults, memory allocations, timing, CPU usage
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
Introduction.
The Cedar Spy is designed to be the main tool for analyzing the performance of programs in Cedar. It consists of of an array of means for viewing the execution of a program all presented in a single, consistent form. With the Spy, the programmer can see which procedures are consuming CPU cycles, which are causing page faults, which are using the allocator, or which are calling a particular procedure. When the programmer narrows his focus to just one process, the Spy will tell him where that process is spending its time, where it is waiting on page faults, where it is waiting on monitor locks, where it is waiting on condition variables, and when it is preempted by other processes. In addition, the programmer can measure precisely what he is interested in since the Spy provides a facility for setting breakpoints to determine where the Spy should start and stop its measurements. Hopefully a programmer will find that the Spy is the only tool he needs for the bulk of his performance analysis.
The main paradigm of the Spy is to record the call stack of an interesting process. The definition of "interesting" varies according to what the Spy is measuring. If the Spy is measuring page faults, it records the stack of the process that just page faulted. If the Spy is measuring CPU usage, it records the stack of the top-most active process at regular intervals. The only things recorded are the procedures in the call stack; parameters and local variables are not recorded.
All of the functionality provided by the Spy viewer can also be accessed programmatically through the interface "SpyClient". For documentation, see that interface.
Obtaining the Spy.
The Spy is normally part of the Cedar command environment (brought over by Environment.df into the ///Commands/ directory). The released copy of the Spy can be obtained with the command:
Bringover -p [Cedar]<CedarChest6.0>Top>Spy.df
to the CommandTool.
Issuing the "Spy" command in a CommandTool will cause the Spy package to be loaded and a Spy viewer to appear at the bottom of the screen. (The Spy viewer looks like a magnifying glass laying on top of the word "Spy"). If you destroy the Spy viewer, you can obtain another one by typing another "Spy" command to the CommandTool.
Preparing the Spy for its Mission.
The Spy provides the programmer with a number of parameters to specify what it will measure. These parameters allow the programmer to tailor the Spy somewhat to his particular needs. The parameters are changed by clicking them with the mouse; repeated clickings will cycle a parameter through all of the possible values. Most people will find that the default parameters provide a good starting point.
Watching: {CPU, process number, breakProcess, pagefaults, wordsAllocated, allocations, user breaks}
The "Watching" parameter determines what the Spy should measure.
* If "CPU" is chosen, the Spy will wake up 80 times a second (10 on machines other than the Dorado) and record the call stack of the process that is currently running. This frequency can be modified --- see `Frequency Divisor' below.
* If "process number: " is chosen, the Spy will monitor the process given by the user. It will keep track of what the process was doing: whether it was currently running, currently pre-empted by another process, waiting on a condition variable, waiting on a monitor lock, or waiting on a page fault.
* If "breakProcess" is chosen, the Spy will monitor the first process that causes a start break. This is useful when the user doesn't know which process is going to run. Otherwise, this is just like the previous mode.
* If "pagefaults" is chosen, the Spy will record the call stack of the process that pagefaulted.
* The difference between "allocations" and "wordsAllocated" is that in the former 1 count is recorded for each allocation and in the second 1 count is recorded for each word allocated, which would be the same a recording n counts for the allocation, where n is the number of words allocated.
* If "user breaks" is chosen, the Spy will record data each time it encounters a breakpoint set using "SetUserBreak!".
Frequency Divisor
The frequency divisor is used to reduce the amount of data recorded, so that you can run the Spy over a long period of time, and yet not take so many samples that it takes another long period of time to analyze them.
When watching processor usage (i.e., the "Watching" parameter is "CPU", "process number: ", or "breakProcess"), the Spy works by waking up every N*FrequencyDivisor vertical retraces and looking at the stack. For a Dorado, N is 1; for lesser machines, N is 8. When watching allocator activity (i.e., the "Watching" parameter is "allocations" or "wordsAllocated"), the Spy is notified of every invocation of the allocator. The Spy pays attention to only one out of every FrequencyDivisor of these notifications. For the other things you might watch (i.e., "pagefaults" and "user breaks"), the FrequencyDivisor has no effect.
Left-clicking the `Frequency Divisor' button will multiply the frequency divisor by 5. Right-clicking will divide it by 5. Middle-click restores it to its default value. It is presented in an editable text viewer, so the user can also edit it directly.
SetUserBreak! ClearUserBreaks!
These two commands allow the user to specify breakpoints that will cause the Spy to log data. The breaks are only enabled while the Spy is running and the "Watching" mode is "user-defined". There are a number of ways of specifying a breakpoint:
* Selection in an Impl module. The selection does not have to be a procedure name, it can be anywhere in the code (just like the Interpreter).
* Selection of a rope of the form "FooImpl.Proc". The selection can be anywhere, even the Spy's output.
You should know that the Spy will not be able to set a breakpoint if a) the module has not been loaded yet, or b) the symbols are not on the disk.
If "ClearUserBreaks!" is bugged, ALL of the breakpoints will go away.
SetTrace!
Whenever a trace breakpoint is encountered, the Spy will put an entry in its log saying when and where it was. The traces are only enabled while the Spy is running and the "Watching" mode is "user-defined". Trace breaks are set just like user breaks. They are cleared with `ClearUserBreaks!'.
Starting and Stopping the Spy.
There are two main ways of starting and stopping the Spy: by bugging "Spy: {on}" and "Spy: {off}", or by setting breakpoints that determine programmatically where the Spy is to start and stop. Start and Stop breakpoints are set in exactly the same manner as given above for user breakpoints. The only difference is that if a Stop breakpoint is set on a the beginning procedure, the break will occur at the exit of the procedure rather than its entry.
Setting a Start or Stop breakpoint does not actually enable the Spy to run. Instead, the "Spy: {off}" switch will be changed to say "Break: {off}". The Start and Stop breaks are only enabled when the programmer has set the switch to "Break: {on}". This allows the programmer to specify all of the breaks he wants before the Spy starts measuring. Changing the switch to "Break: {off}" doesn't remove the breaks, it just disables them. Thus the user can reuse the breaks he has already set.
The breakpoint facility is designed so that the Spy is recording data as long as the number of Start breaks exceeds the number of Stop breaks. This allows breaks to be set in recursive procedures.
When "ClearBreaks!" is bugged, all of the Start and Stop breaks are removed and the on/off switch is changed to say "Spy: {off}".
Displaying the Data.
Clicking "DisplayData!" will cause the data to be displayed in a new typescript. It will also automatically turn off the Spy if it is running. "DisplayData!" does not destroy the data stored. If some of the symbols were missing, the user can fetch them and click "DisplayData!" again. The second invocation will take advantage of the new symbols.
To the left of the "DisplayData!" button is a button called "cutoff". The "cutoff" button lets you specify how much information you want to be displayed. The number after the cutoff is the percentage of times a procedure must be in the call stack in order to be displayed. If a procedure is in the call stack fewer times than this, then it will not be displayed.
Interpreting the Results.
The Spy's output is divided into three main parts: a header, a breakdown of the processes, and a breakdown of the procedures. The header gives overall statistics about the execution of the Spy while the two breakdown sections are focussed on detailed data. The three sections are separated by double rows of tildas (~~~).
==================================
Cedar Spy of: 29-Apr-82 9:19:41.
Executed at: 29-Apr-82 9:20:15.
.
.
.
The header is pretty much self-evident. At the top it records the parameters that the programmer specified for this particular run: what was being measured, where breaks were set, etc. Below this it gives overall statistics such as how long the Spy ran, how many wakeups were recorded, what priorities did things run at, how many page faults occurred overall, and so on. At the bottom is a section called "Statistics on the execution of the Spy" which gives some data about the running of the Spy itself. This is only for debugging purposes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Breakdown of interesting processes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The second section gives a list of the interesting processes that ran and some data about them. A typical entry might look like this:

Process 220B running at priority [normal, pagefaultLow] (39 page faults) = 609 (32.6%).
BBStart.EvalBase = 608 (32.5%).
CachedRegionImplB = 1 (0.1%).
The first line says that process 220B ran at two different priorities (normal and pagefaultLow) and that 39 page faults occurred as an immediate consequence of its work. It also says that 609 counts were recorded for this process, and that this accounted for 32.6% of the time.
The second and third lines says that of these 609 counts, 608 counts were in BBStart.EvalBase and 1 count was in CachedRegionImplB. These counts corresponded to 32.5% and 0.1% of all of the counts respectively.
Process 220B has two entries because it was shared by more than one logical process. Most processes will only have one entry unless a lot of processes are being forked.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Breakdown of interesting procedures.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The next section gives a list of modules and procedures indented so as to indicate precedence relationships. (Module information is only printed when the Spy is unable to obtain the symbols necessary to print out the procedures.) Each module or procedure that occurs in more than 3.0% of the call stacks will be printed exactly once somewhere in this section. A typical entry might look like this:

Procedure StackSpy.AddSon (2 refs) = 93, 2794 (2.9%) (92.6%).
(20154) StackSpy.AddProcedure = 1946 (62.4%).
(20277) RTPrefAllocImpl.NewPrefixedObject = 78 (2.5%).
(20348) StackSpy.AddProcedure = 632 (20.2%).
(20398) RTPrefAllocImpl.NewPrefixedObject = 60 (1.9%).
(20444) waiting.pagefault = 59 (1.9%).
The first line indicates that the procedure StackSpy.AddSon was called from two places (2 refs) and was in the call stack a total of 92.6% of the time. This 92.6% of the time was divided into 93 counts when the CPU was executing code in the procedure itself, and 2794 counts when the CPU was executing code in procedures called from AddSon. The 2.9% is only there because the number of counts that were in the procedure itself was great enough (>1.0%) to force it to print a percentage for just those counts. Such heavily used procedures are identified in bold type to help locate them. Note that percentages are always for the amount of time that a procedure is in the call stack. This means that percentages may add up to more than 100% since most call stacks have more than one procedure. The meanings of these numbers are printed in the header of the output for easy reference.
The number of refs to a procedure is not the same as the number of calls; it indicates the number of different places that a call to this procedure occurs in the static text of the code. In this case, AddSon was called from two different places in the procedure StackSpy.EnterStack.
The next five lines give the list of procedures that occur in the call stack under AddSon more than 1.0%. These are the significant procedures that AddSon calls. They are sorted by PC, and so they are listed in the order that they appear in the code listings. The numbers in parentheses at the beginning are the source indexes for the calls. Selecting one of the numbers and clicking "Position" in the file StackSpy.mesa will show you where the call was made. If a procedure is called from more than one place (as StackSpy.AddProcedure is) then it will appear in the list more than once. Since the Spy does not print procedures that are called less than 1.0%, the sum of the counts for the procedures listed (1946+78+632+60+59 => 2775) may be less than the total number of calls given in the procedure header (2794).
The count that appears after each procedure listed is the number of times that that procedure was in the call stack under AddSon. The percentage is the ratio of that count to the total number of counts. The last entry, waiting.pagefault, is an entry that only appears when exactly one process is being observed. It says that AddSon was waiting on a pagefault for 1.9% of the time. (Other "waiting" procedures are ML (monitor lock), CV (condition variable) and preempted. Preempted means that the process was ready to run, but another process got to run instead.)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Breakdown of highly active procedures.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The last section profiles where (referring to the pc) in a procedure the Spy found execution taking place, for those procedures which the Spy found were actually being executed (i.e. not just in the call stack) better than 1% of the time. These are the procedures printed in bold face in the section before. A typical entry might look like this:
Waterlily.PruneList (2) = 984, 5 (31.8%) (32.0%).
(5835) = 8 (0.3%).
(5917) = 2 (0.1%).
(5950) = 3 (0.1%).
(5984) = 1 (0.1%).
(6004) = 2 (0.1%).
(6022) = 196 (6.3%).
(6048) = 214 (6.9%).
(6066) = 70 (2.3%).
(6093) = 482 (15.6%).
(6150) = 6 (0.2%).
The first line is exactly the same information found in the previous section (sans indentation) provided again here for convenience. In this case, we can see that the procedure Waterlily.PruneList was at the top of the call stack 31.8% of the time.
The next ten lines break out where the pc was for all 984 times that Waterlily.PruneList was at the top of the call stack (8+2+3+1+2+196+214+70+482+6=984). Each line is of the form:
(<Tioga position in mesa file>) = <number of refs> (<percentage of total cpu>%)
Thus, selecting "6093" and clicking "Position" in the file Waterlily.mesa will show you the line which consumed 15.6% of the total cpu usage on this run of the Spy.
Interpreting Indentation.
The procedures as a whole are indented according to the following rule: A procedure is indented immediately under the deepest procedure that completely contains all of the calls to it. Usually this just means that a procedure is indented under the procedure that calls it, but it means something different if a procedure is called from more than one place. Indentation can be used to eliminate whole sub-trees of the output. If you aren't interested in a procedure, you can ignore all of the procedures indented below it.
Occasionally, a procedure will be indented under a procedure that doesn't call it directly. (The names of such procedures are printed in italics for easy identification.) For instance, if A calls B which calls C, and A calls D which calls C, then C is indented under A rather than B or D. This is because only A completely contains all of the calls to C. One consequence of this scheme is that utility procedures that are called from many places will appear high in the hierarchy even though they are always low in the call stack. I consider this a feature, since if you are spending 50% of your time in the allocator, you want that fact to jump out at you, especially if it is buried in hundreds of little calls.
Procedures that are all at the same level of indentation are grouped into sets of disjoint procedures using exclamation points (!). All of the procedures that are connected by a line of exclamation points are disjoint. That is, none of the procedures in the set ever calls any of the other procedures in the set, either directly or indirectly. Sets of disjoint procedures are connected with periods (.) to help the reader keep track of indentation. Disjoint sets are just a hint. Procedures from different sets may be disjoint even though they are not indicated as such. However, procedures in the same disjoint set are guaranteed to be disjoint.
This indentation scheme may not be the best, but it does not hurt since it can always be ignored. The indentations do not add any information, they just summarize information already given by the calls from other procedures. However I will be glad to entertain suggestions for a better scheme.
Using the Spy Effectively.
1) First use `Watching: {CPU}'. This will tell you which processes and procedures are consuming CPU cycles. Look at `Total page faults' and `Total words allocated' in the header to see if there are a lot of page faults or allocations. If there are, you might want to run the Spy with `Watching: {pagefaults}' or `Watching: {wordsAllocated}'.
2) If the Spy can't find the symbols it needs when it prints its log, it will print the data with numbers instead of procedure names. You do not need to run your experiment again to get the procedure names -- just acquire the symbols (by adding appropriate debug search rules, for instance) and click 'DisplayData!' again.
3) If the processor hog is a single process, use 'Watching: {breakProcess}' to gather more information about that process. Select the name of a top level procedure in the Spy's typescript and then click 'SetStartBreak!' and 'SetStopBreak!'. Then run your experiment again. When the first break is hit, the Spy will start monitoring the process that was calling the procedure.
4) If you suspect that things are running slowly because a procedure is being called too often, set a user break (click 'SetUserBreak!') on the procedure and run the experiment again. This will show how many times the procedure gets called and who called it.
How the Spy Works.
We know from Heisenberg that no tool could measure an event without influencing it. However, if you have an idea of how the tool measures, you may be able to factor out its influence. Most programmers should be able to use the Spy without understanding how it gathers data. But those who are really particular about accurate data (or where the data is likely to be inaccurate) should read this section.
The Spy works by a) choosing a process, b) making up a "stack" of GFI's and PC's for that process, and c) writing the stack into a special log which interfaces to the disk. A process is chosen according to the parameters set by the programmer at the beginning of the session. If the programmer tells the Spy to watch pagefaults then processes that cause pagefaults will be logged. If the programmer tells the Spy to watch user-defined breaks, then processes that cause breaks will be logged.
If the programmer tells the Spy to watch the CPU, then the Spy will wake up at regular intervals and log the top process on the ready list. It finds the "top process" by checking priority levels from top to bottom looking for a process that is ready and able to run (A process would not be able to run if there were no state vector at its level). There are two ways to wake up at regular intervals: use a special IntervalTimer or wake up on the vertical retrace. The IntervalTimer the best way to wakeup but it isn't always available. So sometimes the Spy wakes up on the vertical retrace.
There are some bad side effects of waking up on the vertical retrace. Since timeouts are all synchronized to the vertical retrace, actions that are triggered by timeouts may appear to be running far more than they actually are. To counter this, the Spy tries to guess which processes run off of the vertical retrace. When it is first run, it scans the process table looking for processes that have very short timeouts. If it finds any, it saves them in a table. When it scans the ready list it skips over those processes if they are still at the pc locations that they were when they were waiting. This heuristic is only partically effective, but it is better than nothing.
Once the Spy has decided which process to record, it builds a stack by chasing up the frame links and writing down the GFI's and PC's. If the stack has more that 75 frames, it aborts its work and writes an error entry in the log (if this ever occurs, it is noted under "Statistics on the execution of the Spy"). Otherwise it writes the stack it constructed into the log.
The log consists of two pinned buffers backed up by a file named "Spy.log". The Spy will write in one buffer until it is full and then write in the other. As it writes into the second, a background process writes the first one to the disk and remaps it to the next section of the file. A background process is used since the Spy cannot page fault. The background process may appear in the Spy's output. If it does, it will have the name "SpyLogImpl.NewBuffer".
Occasionally, the disk will be so busy that the first buffer will not be ready for the Spy when it has finished filling the second. When this happens the Spy just hangs until the first buffer is ready. If the Spy is watching the CPU, this means that it may miss a few ticks of the vertical retrace. You can get some idea of how many ticks were missed by looking at the average number of counts per second that is given at the beginning of the output. If the average is less that 80, some information was lost. If you suspect that this is a problem, set SpyKernelImpl.monitor to TRUE and SpyLogReaderImpl.recordSpyTime to TRUE. From then on the Spy will record the distribution of delays between counts under the header "Statistics on the execution of the Spy".