<<>> -- PerftoolsSummary.tioga -- Foote, June 14, 1991 6:15 pm PDT Here's a summary of the Performance monitoring tools available in PCedar: Spy What it does: The Spy is designed to be the main tool for analyzing the performance of programs in PCedar. With the Spy, the programmer can see which procedures are consuming CPU cycles, which are using the allocator, or which are calling a particular procedure. The main paradigm of the Spy is to record the call stack of an interesting process. The definition of "interesting" varies according to what the Spy is measuring. If the Spy is measuring CPU usage, it records the stack of the top-most active process at regular intervals. The only things recorded are the procedures in the call stack; parameters and local variables are not recorded. Strategy you should use: Get your data, fire up a spy on it. Undo the node merging. Follow the yellow (brick?) path to the right. Stay with the tallest yellow box. When you get down a few nodes pick one, do the "ctrl-shift-click" thing and set it as the new root. Now look for bottlenecks. Documentation: SpyDoc.tioga Describes the data-collection part of the PCedar Spy. SpyToolDoc.tioga Describes the PCedar based SpyTool used to anaylze data created with SpyStart/SpyStop commands. Requirements: You need a PCedar commander running in your pcr based runtime to run the data collection tools. (It would be an easy hack to add a couple of buttons to BWS to eliminate this requirement). You have to be running PCR 3_1.X or later. You don't have to recompile your code - it works fine on optimized code. You don't need to be running any special kind of kernel. Editorial: You really need a color workstation to browse (not collect) the data effectively. This might as well be a requirement. The definition of "interesting" isn't quite right for some things (like looking into PCR). As I understand it counts are taken on thread switches which means that PCR calls will never be leaves at sample time. The SpyTool documentation is good, but could be stronger. I wish it described the Grey color coding. I wish it gave some hints on user strategies. This should be the first tool used. SparcAids What it does: SparcAids is really two tools, an object code disassembler (useful for getting into the inner loops) and a register window stat package. There's nothing much to say about the object code disassembler (it works, some limitations on disassembling floating point ops). The register window stat package is useful to examine the code locality of your application. Lots of register window overflows and underflows => poor code locality (ie lots of stack frames). The strategy is to get the kernel to keep these stats by running a modified SunOS kernel; this tool just allows you to grab these stats from the modified kernel. Strategy you should use: You probably want to use the register window stat package in conjuction with the other perf tools. When you can't find a hot spot the reason may be that your call stacks are deep. You can use the register window stat package to find out. Use the object code disassembler to get into the inner loops of frequently called primatives. Documentation: SparcAidsDoc.tioga Requirements: You need a PCedar commander running in your pcr based runtime to run these tools. You don't have to recompile your code - it works fine on optimized code. You need to be running a modified kernel. Editorial: The documentation is good. But I wish it would describe the kernel mods in more detail (like what they are and how to apply them). If you're using the disassembler you're probably missing the big picture, unless you're working on something like BitBlt. Leafy What it does: Leafy is a simple command to measure take leaf-level samples of control flow and provide a brief summary of those samples sorted by frequency. It requires a SunOS 4.1 kernel with a modified profil call. Strategy you should use: The samples taken by Leafy are more frequent and regular than those taken by the Spy - so leafy is real good for looking at lower level primatives like pcr. Use Leafy like you'd use spy. Documentation: LeafySampleDoc.tioga This should be in LeafySampleDoc: Leafy only works with VP=1. It does not check this; it simply gives results that may be very wrong. Kernel mods To use LeafySample you need be running on a kernel which has had the following modifications: 1. In the kernel build area replace .../sun4c/OBJ/kern_clock.o with /net/gharlane/cherokee/SunOSNetInstall/sun4c-fcs/kernel/profiling/kern_clock.o (from kern_clock.c in the same directory). 2. In the kernel build area replace .../sun4c/OBJ/addupc.o with /net/gharlane/cherokee/SunOSNetInstall/sun4c-fcs/kernel/profiling/addupc.o (from addupc.s in the same directory). Requirements: You need to run your PCR with VP=1. You need a PCedar commander running in your pcr based runtime to run these tools. You don't have to recompile your code - it works fine on optimized code. You do need to be running a modified kernel. Editorial: The documentation is very weak. I wish it would describe the kernel mods in more detail (like what they are and how to apply them). It doesn't give any hints on the strategy of the use of the tool. PerformanceMon What it does: PerformanceMon is the package that allows use of traditional unix style profiling in pcr based worlds. If a count of procedure entries is needed or if calling graphs are desired, PerformanceMon depends on recompiling the application to allow the compiler to insert extra code a procedure entries. PerformanceMon allocates a vector of size proportional to the size of the code address space. With the advent of Spy this code is obsolete. Documentation: PerformanceMonDoc.tioga Requirements: You need a PCedar commander running in your pcr based runtime to run these tools. You have to recompile your code with profiling. You don't need to be running a modified kernel. Editorial: This tool was just the thing when the Spy wasn't available. It can still be useful if you don't like the Spy's idea of when samples should be taken. Better yet, use Leafy which has the same sampling pattern as PerformanceMon. CodeTimer What it does: Whereas Spy can tell you what fractions of the CPU time spent between line A and line B went to which routines, CodeTimer tells you how long it took to get from A to B. Together, they tell you what to fix and how well you did. To use CodeTimer you add a procedure StartInterval before line A, and StopInterval after line B. CodeTimer maintains several tables of named intervals (e.g. one per application). For each interval, it remembers the minimum, maximum, and average time taken to execute that interval since the last time the interval counter was reset. StartInterval and StopInterval commands are intended to be fast enough that you can leave them in your code. Currently CodeTimer prints the statistics onto a STREAM in human readable format. If there is a demand, I (Eric Bier) will add procedures to query the intervals procedurally.. Strategy you should use: Here's how the gargoyle project usese codetimer: The Gargoyle project has many timed intervals in its code. After large changes to Gargoyle, the implementors run a standard set of Gargoyle scripts. Each script resets the CodeTimer tables, runs the benchmark, opens a Typescript, and prints the times for all timed intervals that were exercised. The results for intervals of interest are copied into a large tioga file, GGPerformance, which contains a history of measurements. Documentation: CodeTimerDoc.tioga Requirements: You need a PCedar commander running in your pcr based runtime to run these tools. You have to edit and recompile your code, but it works fine on optimized code. You don't need to be running a modified kernel. Editorial: The documentation is very good. This is the kind of thing you'd want to routinely use in test suites to notify you when some underlying software update hosed your performance. DeltaResource, What it does: DeltaResource is a simple command to measure resource usage during execution of other commands. DeltaResource is also a simple interface that provides a thin veneer over the Unix rusage call. Strategy you should use: You might want to use DeltaResource to help you look at system resource usage of a particular application. Documentation: DeltaResourceDoc.tioga Requirements: You need a PCedar commander running in your pcr based runtime to run these tools. You don't have to recompile your code - it works fine on optimized code. You don't need to be running a modified kernel. Editorial: The documentation is very good. Think of it as an interface to the rusage() system call. ExamineStorage What it does: ExamineStorage provides a command that enumerates the heap, and prints out statistics on the N most memory-consuming TYPEs of objects. Strategy you should use: You might want to use ExamineStorage in conjunction with the other perf tools to track down particular storage leaks or frequent allocations. Documentation: ExamineStorageDoc.tioga Requirements: You need a PCedar commander running in your pcr based runtime to run ExamineStorage. You don't have to recompile your code - it works fine on optimized code. You don't need to be running a modified kernel. Editorial: The documentation is very good. Druid What it does: Druid is a package that can be used to count how many times control passes through given points in the system. Although much more limited than the Spy, it is also more interactive and perturbs the system less. Counting breakpoints can be set on high-priority processes or even in code as sensitive as SafeStorage allocation. The Druid interface provides procedures for setting, clearing, and querying counting breakpoints. Counting breakpoints can either be monitored or unmonitored. Strategy you should use: You can use Druid to interactively browse calling sequences. Documentation: DruidDoc.tioga Requirements: Currently you can only run Druid from within a Viewers based world. Depends on Cirio for symbol lookup. You can only set counting breakpoints in unoptimized code. You don't need to be running a modified kernel. Editorial: The documentation is good. The support is likely to be spotty as there isn't anybody in CSL currently maintaining this tool. It shouldn't be too hard to build a command line U/I for this tool so that the Viewers world requirement could be relaxed. TimeZone What it does: TimeZone is a package that can be used to measure how long it takes to exeucte a piece of code. The TimeZone interface provides procedures for setting, clearing, and querying timing breakpoints. Timing breakpoints come in two flavors (though more are possible). Interval breakpoints record how long it takes to get from point a program to another point in the program. Use these, for example to measure how long it takes to execute a block. IterationTiming breakpoints record how long it takes to get from a point in the program back to that same point. Either type of timing breakpoint can be either monitored or unmonitored. Strategy you should use: You might want to use TimeZone to help you look at the performance of a particular stretch of code. Documentation: TimeZoneDoc.tioga Requirements: Currently you can only run TimeZone from within a Viewers based world. You can only set counting breakpoints in unoptimized code. You don't need to be running a modified kernel. Editorial: The documentation is good. The support is likely to be spotty as there isn't anybody in CSL currently maintaining this tool. It shouldn't be too hard to build a command line U/I for this tool so that the Viewers world requirement could be relaxed. FullStatsPackage (tentative name) What it does: We have been focusing on the threads behavior of PCR and have added monitoring code to a copy of PCR that provides information about thread switching behavior, monitor lock usage, and condition variable usage. The information can be gathered in one or both of two ways: via a hash table that gathers information for each thread, monitor lock, and condition variable in the system, and via two trace buffers that gather information for individual events, such as thread switches or monitor entries. One of the trace buffers is the SunOS vtrace buffer in the Unix kernel. The other trace buffer resides at the PCR level. Strategy you should use: Wait until this is released. Documentation: None yet Requirements: You need to be running the modified pcr. You need to be running a modified kernel. It works fine with optimized code. Editorial I asked Marvin for this brief description just so that you'd have some idea of what they're working on. He'd like me to restate: Let me finish by reiterating that things are in flux and that we're not sure how much of our code should make its way into any "released" version of PCR. That's something for discussion, I suppose.