{Begin SubSec Performance Considerations}
{Title Performance Considerations}
{Text


Most Interlisp-D users will have experience using Interlisp-10.  Although Interlisp-D is completely upward compatible with Interlisp-10, there are differences in the exact implementation which may influence the performance of applications programs.  This chapter contains a collection of notes which may help the user improve the performance of Interlisp-D programs.


{Begin SubSec Variable Bindings}
{Title Variable Bindings}
{Text

{Tag GlobalvarsPerformance}


A major difference between Interlisp-10 and Interlisp-D is the method of accessing free variables.  Interlisp-10 uses what is called "shallow" binding.   Interlisp-D uses what is called "deep" binding.

The binding of variables occurs when a function or a {fn PROG} is entered.  For example, if the function {lisp FOO} has the definition {lisp (LAMBDA (A B) {arg BODY})}, the variables {lisp A} and {lisp B} are bound so that any reference to {lisp A} or {lisp B} from {arg BODY} or any function called from {arg BODY} will refer to the arguments to the function {lisp FOO} and not to the value of {lisp A} or {lisp B} from a higher level function.  All variable names (atoms) have a top level value cell which is used if the variable has not been bound in any function.   In discussions of variable access, it is useful to distinquish between three types of variable access: local, special and global.  Local variable access is the use of a variable that is bound within the function from which it is used.  Special variable access is the use of a variable that is bound by another function.  Global variable access is the use of a variable that has not been bound in any function.  We will often refer to a variable all of whose accesses are local as a "local variable."  Similarly, a variable all of whose accesses are global we call a "global variable."

In a "deep" bound system, a variable is bound by saving on the stack the variable's name together with a value cell which contains that variable's new value.  When a variable is accessed, its value is found by searching the stack for the most recent binding (occurrence) and retrieving the value stored there.  If the variable is not found on the stack, the variable's top level value cell is used.

In a "shallow" bound system, a variable is bound by saving on the stack the variable name and the variable's old value and putting the new value in the variable's top level value cell.  When a variable is accessed, its value is always found in its top level value cell.

The deep binding scheme has one disadvantage: the amount of cpu time required to fetch the value of a variable depends on the stack distance between its use and its binding.  The compiler can determine local variable accesses and compiles them as fetches directly from the stack.  Thus this computation cost only arises in the use of variable not bound in the local frame ("free" variables).  The process of finding the value of a free variable is called free variable lookup.

In a shallow bound system, the amount of cpu time required to fetch the value of a variable is constant regardless of whether the variable is local, special or global.  The disadvantages of this scheme are that the actual binding of a variable takes longer (thus slowing down function call), the cells that contain the current in use values are spread throughout the space of all atom value cells (thus increasing the working set size of functions) and context switching between processes requires unwinding and rewinding the stack (thus effectively prohibiting the use of context switching for many applications).

A deep binding scheme was choosen for Interlisp-D because of the working set considerations and the speed of context switching, which we expected to use heavily when processes were added.  The free variable lookup routine was microcoded, thus greatly reducing the search time.  In the benchmarks we performed, the largest percentage of free variable lookup time was 20 percent of the total ellapsed time; the normal time was between 5 and 10 percent.

One consequence of Interlisp-D's deep binding scheme is that users may significantly improve performance by declaring global variables in certain situations.  If a variable is declared global, the compiler will compile an access to that variable as a retrieval of its top level value, completely bypassing a stack search.  This should be done only for variables that are never bound in functions such as global databases and flags.

Global variable declarations should be done using the {filecom GLOBALVARS} file package command ({PageRef FileCom GLOBALVARS}).  Its form is {lisp (GLOBALVARS  {arg VAR{sub 1}} {ellipsis} {arg VAR{sub N}})}.

Another way of improving performance is to declare variables as local within a function.  Normally, all variables bound within a function have their names put on the stack, and these names are scanned during free variable lookup.  If a variable is declared to be local within a function, its name is not put on the stack, so it is not scanned during free variable lookup, which may increase the speed of lookups.  The compiler can also make some other optimizations if a variable is known to be local to a function.

A variable may be declared as local within a function by including the form
{lisp (DECLARE (LOCALVARS {arg VAR{sub 1}} {ellipsis} {arg VAR{sub N}}))} following the argument list in the definition of the function.  Note: local variable declarations only effect the compilation of a function.  Interpreted functions put all of their variable names on the stack, regardless of any declarations.


}{End SubSec Variable Bindings}



{Begin SubSec Garbage Collection}
{Title Garbage Collection}
{Text

{note I know that this starts a bit too simple --- but I wanted to emphasize the problem of data fragmentation, so that I could later point out that frequent incremental GC will help with this by keeping the working set small (and so that I could point out that just having a large address space won't help with this problem).  I think that if this section is simple and clear that it might become something you could show customers, rather than having to repeatedly make the same arguments about reference counting vs. mark-and-sweep GC  ---mjs}


As an Interlisp-D applications program runs, it creates data structures (allocated out of free storage space), manipulates them, and then discards them.  If there were no way of reclaiming this space, over time the Interlisp-D memory (both the physical memory in the machine and the virtual memory stored on the disk) would get filled up, and the computation would come to a halt.  Actually, long before this would happen the system would probably become intolerably slow, due to "data fragmentation", which occurs when the data currently in use are spread over many virtual memory pages, so that most of the computer time must be spent swapping disk pages into physical memory.  This problem ("fragmentation") will occur in any situation where the virtual memory is significantly larger than the real, physical memory. To reduce swapping, it is desirable to keep the "working set" (the set of pages containing actively referenced data) as small as possible.

It is possible to write programs that don't generate much "garbage" data, or which recycle data, but such programs tend to be overly complicated and frought with pitfalls.  Spending effort writing such programs defeats the whole point of using a system with automatic storage allocation.  An important part of any Lisp implementation is the "garbage collector" which identifies discarded data and reclaims its space.  There are several well-known approaches to garbage collection.  Interlisp-10 uses the traditional mark-and-sweep garbage collection algorithm, which identifies "garbage" data by "walking" through and "marking" all accessible data structures, and then sweeping through the data spaces to find all unmarked objects (i.e., not referenced by any other object).   Although this method is guaranteed to reclaim all garbage, it takes time proportional to the number of allocated objects, which may be very large.  (Some allocated objects will have been marked during the "mark" phase, and the remainder will be collected during the "sweep" phase; so all will have to be touched in some way.)  Also, the time that a mark-and-sweep garbage collection takes is independent of the amount of garbage collected; it is possible to sweep through the whole virtual memory, and only recover a small amount of garbage.

For interactive applications, it is simply not acceptable to have long interruptions in a computation for the purpose of garbage collection.  Interlisp-D solves this problem by using a reference-counting garbage collector.  With this scheme, there is a table containing counts of how many times each object is referenced.  This table is incrementally updated as pointers are created and discarded, incurring a small overhead distributed over the computation as a whole.  (Note: References from the stack are not counted, but are handled separately at "sweep" time; thus the vast majority of data manipulations do not cause updates to this table.)  At opportune moments, the garbage collector scans this table, and reclaims all objects that are no longer accessible (have a reference count of zero).  The time for scanning the reference count tables is very nearly constant (about 0.2 seconds on the Xerox 1100); the sweep time then is this small value, plus time proportional to the amount of garbage that has to be collected (typically less than a second).  "Opportune" times occur when a certain number of cells have been allocated or when the system has been waiting for the user to type something for long enough.  The frequency of garbage collection is controlled by the functions and variables described on {PageRef Fn RECLAIMMIN}.  For the best system performance, it is desirable to adjust these parameters for frequent, short garbage collections, which will not interrupt interactive applications for very long, and which will have the added benefit of reducing data fragmentation, keeping the working set small.

{note
    There seems to be a great deal of confusion as to the advantages of the Interlisp-D garbage collection scheme.  Please note that it's "sweep" phase is *** isomorphic *** to that of the standard mark-and-sweep algorithm; the latter merely scans the non-zero "bit tables", collecting the unmarked items. The "Collecting" is of course exactly the same in each scheme; and since the Interlisp-D scheme requires scanning the hashtable *** which grows in size as the number of allocated objects grows *** then this is entirely equivalent to scanning the "bit tables" (of, say, PDP10 MacLisp -- I'm confident that PDP10 Interlisp is done the same way).  There are no "bit tables" allocated for virtual memory which hasn't been "cons'd into" yet!  In fact, a typical PDP10 MacLisp application has about 3% of its in-use address space devoted to "bit tables";  I've observed the same figure for the size of the gc hashtables in Interlisp-D.
    Thus, the real advantage is the incremental nature of "marking" -- the only time that Interlisp-D is "catatonic" is during the sweep phase, whereas Interlisp-10 will be "catatonic" during the entire mark-and-sweep phase; and it is the mark phase that generally consumes the lion's share of GC time.
    JonL  8/10/83
}

One problem with the Interlisp-D garbage collector is that not all garbage is guaranteed to be collected.  Circular data structures, which point to themselves directly or indirectly, are never reclaimed, since their reference counts are always at least one.  With time, this unreclaimable garbage may increase the working set to unacceptable levels.  Some users have worked with the same Interlisp-D virtual memory for a very long time, but it is a good idea to occasionally save all of your functions in files, reinitialize Interlisp-D, and rebuild your system.  Many users end their working day by issuing a command to rebuild their system and then leaving the machine to perform this task in their absence.  If the system seems to be spending too much time swapping (an indication of fragmented working set), this procedure is definitely recommended.


}{End SubSec Garbage Collection}



{Begin SubSec Datatypes}
{Title Datatypes}
{Text

If an applications program uses data structures that are large (more than 8 fields) and that are used a lot, there are several advantages to representing them as user {lisp DATATYPE}s rather than as {lisp RECORD}s.  The primary advantage is increased speed: accessing and setting the fields of a {lisp DATATYPE} can be significantly faster than walking through a {lisp RECORD} list with repeated {lisp CAR}s and {lisp CDR}s.  Also, compiled code for referencing user {lisp DATATYPE}s is usually smaller.  Finally, by reducing the number of objects created (one {lisp DATATYPE} object against many {lisp RECORD} list cells), this can reduce the expense of garbage collection.

For code that has been written using the record package's {lisp fetch}, {lisp replace}, and {lisp create} operations, changing from {lisp RECORD}s to {lisp DATATYPE}s only requires editing the record declaration (using {fn EDITREC}) to replace declaration type {lisp RECORD} by {lisp DATATYPE}, and recompiling.

{Note There are some minor disadvantages with using {lisp DATATYPE}s:  First, there is an upper limit on the number of {lisp DATATYPE}s which can exist.  Also, space for {lisp DATATYPE}s is allocated a page at a time, so each {lisp DATATYPE} has at least one page assigned to it, which may be wasteful of space if there are only a few examples of a given {lisp DATATYPE}.  These problems should not effect most applications programs.
}

}{End SubSec Datatypes}




{Begin SubSec Incomplete Filenames}
{Title Incomplete Filenames}
{Text

There is a significant problem in Interlisp-D (and in Interlisp-10) with respect to using incomplete filenames.  Whenever an I/O function is given an incomplete filename (one which doesn't have the device/host, directory, name, extension, and version number all supplied), the system has to convert it to a complete filename, by supplying defaults and searching through directories (which may be on remote file servers).  Currently, work is being done on speeding up the filename-completion process, but in any case it is much faster to convert an incomplete filename once, and use the complete filename from then on.  For example, suppose a file is opened with {lisp (SETQ FULLNAME (OPENFILE 'MYNAME 'INPUT))}.  After doing this, {lisp (READC 'MYNAME)} and {lisp (READC FULLNAME)} would both work, but {lisp (READC 'MYNAME)} would take longer (sometimes orders of magnitude longer).  This could seriously effect the performance if a program which is doing many I/O operations.

}{End SubSec Incomplete Filenames}


{Begin SubSec Turning Off the Display}
{Title Turning Off the Display}
{Text

Maintaining the video image on the screen uses about 30% of the cpu cycles (on the Xerox 1100), so turning off the display will improve the speed of compute-bound tasks.  When the display is off, the screen will be white but any printing or displaying that the program does will be visible when the display is turned back on.  Note:  Breaks and {fn PAGEFULLFN} waiting turn the display on, but users should be aware that it is possible to have the system waiting for a response to a question printed or a menu displayed on a non-visible part of the screen.  The following functions are provided to turn the display off:

{FnDef {Name SETDISPLAYHEIGHT} {Args NSCANLINES}
{Text
Sets the display to only show the top {arg NSCANLINES} of the screen.
If {arg NSCANLINES} is {lisp T}, resets the display to show the full screen.
Returns the previous setting.
}}


{FnDef {Name DISPLAYDOWN} {Args FORM NSCANLINES}
{Text
Evaluates {arg FORM} (with the display set to only show the top {arg NSCANLINES} of the screen), and returns the value of {arg FORM}.
It restores the screen to its previous setting.
If {arg NSCANLINES} is not given, it defaults to 0.
}}


{note
add info about: (WITHOUT-INTERRUPTS ---)?

Date:  6-Oct-82 17:08:37 PDT (Wednesday)
From: Masinter.PA
I think WITHOUT-INTERRUPTS should just get strong warnings: do SAVEVM before timing anything because if it doesn't work, the only way out is to boot.

Date:  6-Oct-82 16:12:16 PDT (Wednesday)
From: vanMelle.PA
WITHOUT-INTERRUPTS in its current state is far too dangerous to document.
}

{Begin Note}
(WITHOUT-INTERRUPTS FORM)			[NLAMBDA function]
WITHOUT-INTERRUPTS is a special form for running real-time critical sections of code, or for doing benchmarks, since it filters out extraneous computations.  It evaluates FORM with the display refresh, keyboard sampling, ethernet process, etc. all turned off. [Note: There is no way of recovering from an error underneath WITHOUT-INTERRUPTS except by booting!]
{End Note}

}{End SubSec Turning Off the Display}



{Begin SubSec Gathering Statistics}
{Title Gathering Statistics}
{Text

Interlisp-D has an extended set of statistics-gathering tools.  An extended version of the {fn TIME} function is provided:

{FnDef {FnName TIMEALL} {FnArgs TIMEFORM #TIMES TIMEWHAT INTERPFLG {anonarg}}
{Type NLAMBDA}
{Text
Largely subsumes the function {fn TIME}.  Evaluates the form {arg TIMEFORM} and prints statistics on time spent in various categories (elapsed, keyboard wait, swapping time, gc) and datatype allocation.

For more accurate measurement on small computations, {arg #TIMES} may be specified (its default is 1) to cause {arg TIMEFORM} to be executed {arg #TIMES} number of times.  To improve the accuracy of timing open-coded operations in this case, {fn TIMEALL} compiles a form to execute {arg TIMEFORM} {arg #TIMES} number of times (unless {arg INTERPFLG} is non-{lisp NIL}), and then times the execution of the compiled form.

Note:  If {fn TIMEALL} is called with {arg #TIMES}>1, the dummy form is compiled with compiler optimizations on.  This means that it is not meaningful to use {fn TIMEALL} with very simple forms that are optimized out by the compiler.  For example, {lisp (TIMEALL '(IPLUS 2 3) 1000)} will time a compiled function which simply returns the number 5, since {lisp (IPLUS 2 3)} is optimized to the integer 5.

{arg TIMEWHAT} exists largely for compatibility with {fn TIME}; it restricts the statistics to specific categories.  It can be an atom or list of datatypes to monitor, and/or the atom {Lisp TIME} to monitor time spent.  Note that ordinarily, {fn TIMEALL} monitors all time and datatype usage, so this argument is rarely needed.  

The value of {fn TIMEALL} is the value of the last evaluation of {arg
TIMEFORM}.
}}

}{End SubSec Gathering Statistics}


}{End SubSec Performance Considerations}