<<>>
-- PerftoolsSummary.tioga
-- Foote, June 14, 1991 6:15 pm PDT

Here's a summary of the Performance monitoring tools available in PCedar:

Spy
What it does:
The Spy is designed to be the main tool for analyzing the performance of programs in PCedar.  With the Spy, the programmer can see 
which procedures are consuming CPU cycles, which are using the allocator, or which are calling a particular procedure. The main 
paradigm of the Spy is to record the call stack of an interesting process.  The definition of "interesting" varies according to 
what the Spy is measuring.  If the Spy is measuring CPU usage, it records the stack of the top-most active process at regular 
intervals.  The only things recorded are the procedures in the call stack; parameters and local variables are not recorded.

Strategy you should use:
Get your data, fire up a spy on it.  Undo the node merging.  Follow the yellow (brick?) path to the right.  Stay with the tallest 
yellow box.  When you get down a few nodes pick one, do the "ctrl-shift-click" thing and set it as the new root. Now look for 
bottlenecks.
 
Documentation:
SpyDoc.tioga
Describes the data-collection part of the PCedar Spy.  

SpyToolDoc.tioga
Describes the PCedar based SpyTool used to anaylze data created with SpyStart/SpyStop commands.

Requirements:
You need a PCedar commander running in your pcr based runtime to run the data collection tools.  (It would be an easy hack to add a 
couple of buttons to BWS to eliminate this requirement).
You have to be running PCR 3_1.X or later.
You don't have to recompile your code - it works fine on optimized code.
You don't need to be running any special kind of kernel.

Editorial:
You really need a color workstation to browse (not collect) the data effectively.  This might as well be a requirement.
The definition of "interesting" isn't quite right for some things (like looking into PCR). As I understand it counts are taken on 
thread switches which means that PCR calls will never be leaves at sample time.
The SpyTool documentation is good, but could be stronger. I wish it described the Grey color coding. I wish it gave some hints on 
user strategies.
This should be the first tool used.


SparcAids
What it does:
SparcAids is really two tools, an object code disassembler (useful for getting into the inner loops) and a register window stat 
package. There's nothing much to say about the object code disassembler (it works, some limitations on disassembling floating point 
ops).  The register window stat package is useful to examine the code locality of your application.  Lots of register window 
overflows and underflows => poor code locality (ie lots of stack frames).  The strategy is to get the kernel to keep these stats by 
running a modified SunOS kernel; this tool just allows you to grab these stats from the modified kernel.

Strategy you should use:
You probably want to use the register window stat package in conjuction with the other perf tools.  When you can't find a hot spot 
the reason may be that your call stacks are deep.  You can use the register window stat package to find out. Use the object code 
disassembler to get into the inner loops of frequently called primatives.

Documentation:
SparcAidsDoc.tioga

Requirements:
You need a PCedar commander running in your pcr based runtime to run these tools.  
You don't have to recompile your code - it works fine on optimized code.
You need to be running a modified kernel.

Editorial:
The documentation is good.  But I wish it would describe the kernel mods in more detail (like what they are and how to apply them).
If you're using the disassembler you're probably missing the big picture, unless you're working on something like BitBlt.


Leafy
What it does:
Leafy is a simple command to measure take leaf-level samples of control flow and provide a brief summary of those samples sorted by 
frequency.  It requires a SunOS 4.1 kernel with a modified profil call.

Strategy you should use:
The samples taken by Leafy are more frequent and regular than those taken by the Spy - so leafy is real good for looking at lower 
level primatives like pcr.  Use Leafy like you'd use spy.

Documentation:
LeafySampleDoc.tioga
This should be in LeafySampleDoc:
Leafy only works with VP=1.  It does not check this; it simply gives results that may be very wrong.

Kernel mods
To use LeafySample you need be running on a kernel which has had the following modifications:
1. In the kernel build area replace .../sun4c/OBJ/kern_clock.o with 
/net/gharlane/cherokee/SunOSNetInstall/sun4c-fcs/kernel/profiling/kern_clock.o (from kern_clock.c in the same directory).
2. In the kernel build area replace .../sun4c/OBJ/addupc.o with 
/net/gharlane/cherokee/SunOSNetInstall/sun4c-fcs/kernel/profiling/addupc.o (from addupc.s in the same directory).

Requirements:
You need to run your PCR with VP=1.  
You need a PCedar commander running in your pcr based runtime to run these tools.  
You don't have to recompile your code - it works fine on optimized code.
You do need to be running a modified kernel.

Editorial:
The documentation is very weak.  I wish it would describe the kernel mods in more detail (like what they are and how to apply 
them).  It doesn't give any hints on the strategy of the use of the tool.

PerformanceMon
What it does:
PerformanceMon is the package that allows use of traditional unix style profiling in pcr based worlds.  If a count of procedure 
entries is needed or if calling graphs are desired, PerformanceMon depends on recompiling the application to allow the compiler to 
insert extra code a procedure entries.  PerformanceMon allocates a vector of size proportional to the size of the code address 
space. With the advent of Spy this code is obsolete.

Documentation:
PerformanceMonDoc.tioga

Requirements:
You need a PCedar commander running in your pcr based runtime to run these tools.  
You have to recompile your code with profiling.
You don't need to be running a modified kernel.

Editorial:
This tool was just the thing when the Spy wasn't available.  It can still be useful if you don't like the Spy's idea of when 
samples should be taken.  Better yet, use Leafy which has the same sampling pattern as PerformanceMon.

CodeTimer 
What it does:
Whereas Spy can tell you what fractions of the CPU time spent between line A and line B went to which routines, CodeTimer tells you 
how long it took to get from A to B.  Together, they tell you what to fix and how well you did.  To use CodeTimer you add a 
procedure StartInterval before line A, and StopInterval after line B.  CodeTimer maintains several tables of named intervals (e.g. 
one per application).  For each interval, it remembers the minimum, maximum, and average time taken to execute that interval since 
the last time the interval counter was reset.  StartInterval and StopInterval commands are intended to be fast enough that you can 
leave them in your code.  Currently CodeTimer prints the statistics onto a STREAM in human readable format.  If there is a demand, 
I (Eric Bier) will add procedures to query the intervals procedurally..

Strategy you should use:
Here's how the gargoyle project usese codetimer:
    The Gargoyle project has many timed intervals in its code.  After large changes to Gargoyle, the implementors run a standard set of 
    Gargoyle scripts.  Each script resets the CodeTimer tables, runs the benchmark, opens a Typescript, and prints the times for 
    all timed intervals that were exercised.  The results for intervals of interest are copied into a large tioga file, 
    GGPerformance, which contains a history of measurements.

Documentation:
CodeTimerDoc.tioga

Requirements:
You need a PCedar commander running in your pcr based runtime to run these tools.  
You have to edit and recompile your code, but it works fine on optimized code.
You don't need to be running a modified kernel.

Editorial:
The documentation is very good.  This is the kind of thing you'd want to routinely use in test suites to notify you when some 
underlying software update hosed your performance.


DeltaResource,
What it does:
DeltaResource is a simple command to measure resource usage during execution of other commands.  DeltaResource is also a simple 
interface that provides a thin veneer over the Unix rusage call.

Strategy you should use:
You might want to use DeltaResource to help you look at system resource usage of a particular application.

Documentation:
DeltaResourceDoc.tioga

Requirements:
You need a PCedar commander running in your pcr based runtime to run these tools.  
You don't have to recompile your code - it works fine on optimized code.
You don't need to be running a modified kernel.

Editorial:
The documentation is very good.  Think of it as an interface to the rusage() system call.


ExamineStorage
What it does:
ExamineStorage provides a command that enumerates the heap, and prints out statistics on the N most memory-consuming TYPEs of 
objects.

Strategy you should use:
You might want to use ExamineStorage in conjunction with the other perf tools to track down particular storage leaks or frequent 
allocations.

Documentation:
ExamineStorageDoc.tioga

Requirements:
You need a PCedar commander running in your pcr based runtime to run ExamineStorage.  
You don't have to recompile your code - it works fine on optimized code.
You don't need to be running a modified kernel.

Editorial:
The documentation is very good.  


Druid
What it does:
Druid is a package that can be used to count how many times control passes through given points in the system.  Although much more 
limited than the Spy, it is also more interactive and perturbs the system less.  Counting breakpoints can be set on high-priority 
processes or even in code as sensitive as SafeStorage allocation.  The Druid interface provides procedures for setting, clearing, 
and querying counting breakpoints.  Counting breakpoints can either be monitored or unmonitored.  

Strategy you should use:
You can use Druid to interactively browse calling sequences.

Documentation:
DruidDoc.tioga

Requirements:
Currently you can only run Druid from within a Viewers based world.
Depends on Cirio for symbol lookup.
You can only set counting breakpoints in unoptimized code.
You don't need to be running a modified kernel.

Editorial:
The documentation is good.  
The support is likely to be spotty as there isn't anybody in CSL currently maintaining this tool.
It shouldn't be too hard to build a command line U/I for this tool so that the Viewers world requirement could be relaxed.


TimeZone
What it does:
TimeZone is a package that can be used to measure how long it takes to exeucte a piece of code.  The TimeZone interface provides 
procedures for setting, clearing, and querying timing breakpoints.  Timing breakpoints come in two flavors (though more are 
possible).  Interval breakpoints record how long it takes to get from point a program to another point in the program.  Use these, 
for example to measure how long it takes to execute a block.  IterationTiming breakpoints record how long it takes to get from a 
point in the program back to that same point.  Either type of timing breakpoint can be either monitored or unmonitored.  

Strategy you should use:
You might want to use TimeZone to help you look at the performance of a particular stretch of code.

Documentation:
TimeZoneDoc.tioga

Requirements:
Currently you can only run TimeZone from within a Viewers based world.
You can only set counting breakpoints in unoptimized code.
You don't need to be running a modified kernel.

Editorial:
The documentation is good.  
The support is likely to be spotty as there isn't anybody in CSL currently maintaining this tool.
It shouldn't be too hard to build a command line U/I for this tool so that the Viewers world requirement could be relaxed.


FullStatsPackage (tentative name) 
What it does:
We have been focusing on the threads behavior of PCR and have added monitoring code to a copy of PCR that provides information 
about thread switching behavior, monitor lock usage, and condition variable usage.  The information can be gathered in one or both 
of two ways: via a hash table that gathers information for each thread, monitor lock, and condition variable in the system, and via 
two trace buffers that gather information for individual events, such as thread switches or monitor entries.  One of the trace 
buffers is the SunOS vtrace buffer in the Unix kernel.  The other trace buffer resides at the PCR level. 
Strategy you should use:
Wait until this is released.

Documentation:
None yet

Requirements:
You need to be running the modified pcr.
You need to be running a modified kernel.
It works fine with optimized code.

Editorial
I asked Marvin for this brief description just so that you'd have some idea of what they're working on.  He'd like me to restate:

Let me finish by reiterating that things are in flux and that we're not sure how much of our code should make its way into any 
"released" version of PCR.  That's something for discussion, I suppose.