DOCUMENTATION FOR THE CEDAR SPY last edited by: John Maxwell on: November 12, 1982 3:07 pm -- INTRODUCTION -- The Cedar Spy is designed to be the main tool for analyzing the performance of programs in Cedar. It consists of of an array of means for viewing the execution of a program all presented in a single, consistent form. With the Spy, the programmer can see which procedures are consuming CPU cycles, which are causing page faults, which are using the allocator, or which are calling a particular procedure. When the programmer narrows his focus to just one process, the Spy will tell him where that process is spending its time, where it is waiting on page faults, where it is waiting on monitor locks, where it is waiting on condition variables, and when it is preempted by other processes. In addition, the programmer can measure precisely what he is interested in since the Spy provides a facility for setting breakpoints to determine where the Spy should start and stop its measurements. Hopefully a programmer will find that the Spy is all he needs for the bulk of his performance analysis. The main paradigm of the Spy is to record the call stack of an interesting process. The definition of "interesting" varies according to what the Spy is measuring. If the Spy is measuring page faults, it records the stack of the process that just page faulted. If the Spy is measuring CPU usage, it records the stack of the top-most active process at the time of a vertical retrace. The only things recorded are the procedures in the call stack; parameters and local variables are not saved. All of the functionality provided by the Spy viewer can also be accessed programmatically through the interface "SpyClient". For documentation, see that interface. -- OBTAINING THE SPY -- The Spy can be obtained by bringing over the appropriate files and loading "Spy.bcd". To bring over the files, type Bringover /a /p [Indigo]<Cedar>Top>Spy.df to the user executive. Typing "Run Spy" will cause a Spy viewer to appear at the bottom of the screen. (The Spy viewer looks like a magnifying glass laying on top of the word "Spy"). If you destroy the Spy viewer, you can obtain another one by typing "Spy" to the UserExec. -- PREPARING THE SPY FOR ITS MISSION -- The Spy provides the programmer with a number of parameters that determine exactly what it will measure. These parameters allow the programmer to tailor the Spy somewhat to his particular needs. The parameters are changed by clicking them with the mouse; repeated clickings will cycle a parameter through all of the possible values. Most people will find that the default parameters provide a good starting point. Watching: {CPU, pagefaults, wordsAllocated, allocations, spaces, user breaks} The "Watching" parameter determines what the Spy should measure. * If "CPU" is chosen, the Spy will wake up 80 times a second (10 on a Dolphin) and record the call stack of the process that is scheduled to run next. (If the Spy is to watch exactly one process, the data will be recorded for that process no matter what it is doing.) * If "pagefaults" is chosen, the Spy will record the call stack of the process that pagefaulted. * The difference between "allocations" and "wordsAllocated" is that in the former 1 count is recorded for each ALLOCATION and in the second 1 count is recorded for each WORD ALLOCATED, which would be the same a recording n counts for the allocation, where n is the number of words allocated. * If "spaces" is chosen, the spy will count the number of spaces created and deleted. This is useful for looking for space leaks. Note that you cannot just compare creations with deletions: if subspaces are created, deleting the parent space will also delete its children. * If "user breaks" is chosen, the Spy will record data each time it encounters a breakpoint set using "SetUserBreak!". SetUserBreak! ClearUserBreaks! These two commands allow the user to specify breakpoints that will cause the Spy to log data. The breaks are only enabled while the Spy is running and the "Watching" mode is "user-defined". There are a number of ways of specifying a breakpoint: * Selection in an Impl module. The selection does not have to be a procedure name, it can be anywhere in the code (just like Bugbane). * Selection of a procedure name in an Interface module. This method will only work if the implementation module of Foo is FooImpl. * Selection of a rope of the form "FooImpl.Proc". The selection can be anywhere, even the User Exec. You should know that the Spy will not be able to set a breakpoint if a) the module has not been loaded yet, or b) the symbols are not on the disk. If "ClearUserBreaks!" is bugged, ALL of the breakpoints will go away. Watch {all processes, this process: , break process} [ ] This parameter is used only in conjunction with "watching: {CPU}". If "all processes" is specified, then the Spy will record the call stack of the next process to run. If "this process: " is specified, then the Spy will record the call stack of the process given in the box that follows. If "break process" is specified, the Spy will watch the process of the first start break encountered. -- STARTING AND STOPPING THE SPY -- There are two main ways of starting and stopping the Spy: by bugging "Spy: {on}" and "Spy: {off}", or by setting breakpoints that determine programmatically where the Spy is to start and stop. Start and Stop breakpoints are set in exactly the same manner as given above for user breakpoints. The only difference is that if a Stop breakpoint is set on a procedure, the break will occur at the EXIT of the procedure rather than its entry. Setting a Start or Stop breakpoint does not actually enable the Spy to run. Instead, the "Spy: {off}" switch will be changed to say "Break: {off}". The Start and Stop breaks are only enabled when the programmer has set the switch to "Break: {on}". This allows the programmer to specify all of the breaks he wants before the Spy starts measuring. Changing the switch to "Break: {off}" doesn't remove the breaks, it just disables them. Thus the user can reuse the breaks he has already set. The breakpoint facility is designed so that the Spy is recording data as long as the number of Start breaks exceeds the number of Stop breaks. This allows breaks to be set in recursive procedures. When "ClearBreaks!" is bugged, ALL of the Start and Stop breaks are removed and the on/off switch is changed to say "Spy: {off}". -- DISPLAYING THE DATA -- Clicking "DisplayData!" will cause the data to be displayed in a new typescript. It will also automatically turn off the Spy if it is running. "DisplayData!" does not destroy the data stored. If some of the symbols were missing, the user can fetch them and run the command again. The next invocation will take advantage of the new symbols. -- INTERPRETING THE RESULTS -- The Spy's output is divided into three main parts: a header, a breakdown of the processes, and a breakdown of the procedures. The header gives overall statistics about the execution of the Spy while the two breakdown sections are focussed on detailed data. The three sections are separated by double rows of tildas (~~~). ================================== Cedar Spy of: 29-Apr-82 9:19:41. Executed at: 29-Apr-82 9:20:15. . . . The header is pretty much self-evident. At the top it records the parameters that the programmer specified for this particular run: what was being measured, where breaks were set, etc. Below this it gives overall statistics such as how long the Spy ran, how many wakeups were recorded, what priorities did things run at, how many page faults occurred overall, and so on. At the bottom is a section called "Statistics on the execution of the Spy" which gives some data about the running of the Spy itself. This is only for debugging purposes. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Breakdown of interesting processes. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The second section gives a list of the interesting processes that ran and some data about them. A typical entry might look like this: Process 220B running at priority [normal, pagefaultLow] (39 page faults) = 609 (32.6%). BBStart.EvalBase = 608 (32.5%). CachedRegionImplB = 1 (0.1%). The first line says that process 220B ran at two different priorities (normal and pagefaultLow) and that 39 page faults occurred as an immediate consequence of its work. It also says that 609 counts were recorded for this process, and that this accounted for 32.6% of the time. The second and third lines says that of these 609 counts, 608 counts were in BBStart.EvalBase and 1 count was in CachedRegionImplB. These counts corresponded to 32.5% and 0.1% of all of the counts respectively. Process 220B has two entries because it was shared by more than one logical process. Most processes will only have one entry unless a lot of processes are being forked. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Breakdown of interesting procedures. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The last section gives a list of modules and procedures indented so as to indicate precedence relationships. (Module information is only printed when the Spy is unable to obtain the symbols necessary to print out the procedures.) Each module or procedure that occurs in more than 3.0% of the call stacks will be printed exactly once somewhere in this section. A typical entry might look like this: Procedure StackSpy.AddSon (2 refs) = 93, 2794 (2.9%) (92.6%). (20154) StackSpy.AddProcedure = 1946 (62.4%). (20277) RTPrefAllocImpl.NewPrefixedObject = 78 (2.5%). (20348) StackSpy.AddProcedure = 632 (20.2%). (20398) RTPrefAllocImpl.NewPrefixedObject = 60 (1.9%). (20444) waiting.pagefault = 59 (1.9%). The first line indicates that the procedure StackSpy.AddSon was called from two places (2 refs) and was in the call stack a total of 92.6% of the time. This 92.6% of the time was divided into 93 counts when the CPU was executing code in the procedure itself, and 2794 counts when the CPU was executing code in procedures called from AddSon. The 2.9% is only there because the number of counts that were in the procedure itself was great enough (>1.0%) to force it to print a percentage for just those counts. Note that percentages are always for the amount of time that a procedure is IN THE CALL STACK. This means that percentages may add up to more than 100% since most call stacks have more than one procedure. The meanings of these numbers are printed in the header of the output for easy reference. The number of refs to a procedure is not the same as the number of calls; it indicates the number of different places that a call to this procedure occurs in the static text of the code. In this case, AddSon was called from two different places in the procedure StackSpy.EnterStack. The next five lines give the list of procedures that occur in the call stack under AddSon more than 1.0%. These are the significant procedures that AddSon calls. They are sorted by PC, and so they are listed in the order that they appear in the code listings. The numbers in parenthesis at the beginning are the source indexes for the calls. Selecting one of the numbers and clicking "Position" in the file StackSpy.mesa will show you where the call was made. If a procedure is called from more than one place (as StackSpy.AddProcedure is) then it will appear in the list more than once. Since the Spy does not print procedures that are called less than 1.0%, the sum of the counts for the procedures listed (1946+78+632+60+59 => 2775) may be less that the total number of calls given in the procedure header (2794). The count that appears after each procedure listed is the number of times that that procedure was in the call stack under AddSon. The percentage is the ratio of that count to the total number of counts. The last entry, waiting.pagefault, is an entry that only appears when exactly one process is being observed. It says that AddSon was waiting on a pagefault for 1.9% of the time. (Other "waiting" procedures are ML (monitor lock), CV (condition variable) and preempted. Preempted means that the process was ready to run, but another process got to run instead.) -- INTERPRETING INDENTATION -- The procedures as a whole are INDENTED according to the following rule: A procedure is indented immediately under the deepest procedure that completely contains all of the calls to it. Usually this just means that a procedure is indented under the procedure that calls it, but it means something different if a procedure is called from more than one place. Indentation can be used to eliminate whole sub-trees of the print out. If you aren't interested in a procedure, you can ignore all of the procedures indented below it. Occassionally, a procedure will be indented under a procedure that doesn't call it directly. (The names of such procedures are printed in italics for easy identification.) For instance, if A calls B which calls C, and A calls D which calls C, then C is indented under A rather than B or D. This is because only A completely contains all of the calls to C. One consequence of this scheme is that utility procedures that are called from many places will appear high in the hierarchy even though they are always low in the call stack. I consider this a feature, since if you are spending 50% of your time in the allocator, you want that fact to jump out at you, especially if it is buried in hundreds of little calls. Procedures that are all at the same level of indentation are grouped into sets of disjoint procedures using exclamation points (!). All of the procedures that are connected by a line of exclamation points are disjoint. That is, none of the procedures in the set ever calls any of the other procedures in the set, either directly or indirectly. Sets of disjoint procedures are connected with periods (.) to help the reader keep track of indentation. Disjoint sets are just a hint. Procedures from different sets may be disjoint even though they are not indicated as such. However, procedures in the same disjoint set are guaranteed to be disjoint. This indentation scheme may not be the best, but it does not hurt since it can always be ignored. The indentations do not add any information, they just summarize information already given by the calls from other procedures. However I will be glad to entertain suggestions for a better scheme. -- HOW THE SPY WORKS -- Heisenburg indicated that no tool could measure an event without influencing it. However, if you have an idea of how the tool measures, you may be able to factor out its influence. Most programmers should be able to use the Spy without understanding how it gathers data. But those who are really particular about accurate data (or where the data is likely to be inaccurate) should read this section. The Spy works by a) choosing a process, b) making up a "stack" of GFI's and PC's for that process, and c) writing the stack into a special log which interfaces to the disk. A process is chosen according to the parameters set by the programmer at the beginning of the session. If the programmer tells the Spy to watch pagefaults then processes that cause pagefaults will be logged. If the programmer tells the Spy to watch user-defined breaks, then processes that cause breaks will be logged. If the programmer tells the Spy to watch the CPU, then the Spy will wake up on the vertical retrace and log the top process on the ready list. It finds the "top process" by checking priority levels from top to bottom for a process that is ready and able to run (A process would not be able to run if there were no state vector at that level). Since timeouts are all synchronized to the vertical retrace, actions that are triggered by timeouts may appear to be running far more than they actually are. To counter this, the Spy skips over a process if it appears at one of the several well-known timeout locations (such as WAIT tty.hasChar in TTYSWsB). This allows the Spy to accurately measure the rest of the processes. Note, however, that there is no way to accurately measure a process that is itself driven by timeouts. (Ed Taft is looking into means of breaking up the synchonization by providing some sort of random wakeup in the microcode. Until that happens, we shall have to live with the problem.) Once the Spy has decided on a process, it builds a stack by chasing up the frame links and writing down the GFI's and PC's. If the stack has more that 75 frames, it aborts its work and writes an error entry in the log (if this ever occurs, it is noted under "Statistics on the execution of the Spy"). Otherwise it writes the stack it constructed into the log. The log consists of two pinned buffers backed up by a file named "Trace.data". The Spy will write in one buffer until it is full and then write in the other. As it writes into the second, a background process writes the first one to the disk and remaps it to the next section of the file. A background process is used since the Spy CANNOT page fault. The background process may appear in the Spy's output. If it does, it will have the name "SpyLogImpl.NewBuffer". Occasionally, the disk will be so busy that the first buffer will not be ready for the Spy when it has finished filling the second. When this happens the Spy just hangs until the first buffer is ready. If the Spy is watching the CPU, this means that it may miss a few ticks of the vertical retrace. You can get some idea of how many ticks were missed by looking at the average number of counts per second that is given at the beginning of the output. If the average is less that 80, some information was lost. If you suspect that this is a problem, set the debugger's context to "SpyKernel" (in the configuration "Spy") and set the local variable "monitor" to TRUE. From then on the Spy will record the distribution of delays between counts under the header "Statistics on the execution of the Spy".