[qv]<IDL>History>DesignMemos.dm!1>Design.PPLSemantics

In order to simplify the task of reading the PPL code that will form the basis of the design discussions and the implementation effort, this memo presents a guide to some of the more subtle aspects of PPL’s behavior. Most of PPL’s syntax and semantics are fairly straightforward given any exposure to conventional procedural languages. We will mention here only those aspects which are not self evident and which are heavily relied on by the IDL-PPL code.

The only syntax for grouping computations within a numbered statement is parentheses. Thus (expressiona; expressionb; ...; expressionn) is an expression with value expressionn. This construct is heavily used to delimit the range of conditional and repeat clauses (which otherwise extend through to the end of the statement!) as there is no other syntax with which to do this.

GOTO’s are sparingly used (but neeeded, as there is no other way to escape several levels of loop). Their targets are expressions of the form %n which is a relocatable statement number. This is used so that, when the function is edited and its statements are renumbered, the GOTO will be altered so that it continues to reference the same statement rather than the statement which has the old number in the new numbering.

PPL does not type its variables - all variables are represented internally as pointers to the data objects which are their values. Consequently, names are sometimes bound to values of quite different type at different points in the function. This was done to limit the size of the stack frame to be allocated for locals, but is bad programming practice and should not be transferred to the LISP system.

One of the major weaknesses of PPL considered as a systems programming language (which it was never meant to be) is that it does not have a pointer data type. However, as almost all PPL objects are represented internally as pointers, this gives rise to some persistent ambiguities in the behavior of copying and sharing. Simple PPL avoids these by using copy semantics for assignment and argument passing (i.e. A←B means that A gets a copy of B, rather than shared use of the same storage). The impracticality of this for large objects means that some amount of guile is used in the IDL-PPL code to prevent excess copying. The primary tools of guile are

1. The use of argument passing by reference rather than by value. This is indicated by the use of parameters in the function header of the form $ARG as opposed to ARG. This invariably indicates a large data object that is being passed by reference to avoid copying. On the few occasions that the object is being side effected by the function this is stated with very emphatic comments!

2. PPL has two methods of returning values from functions. One is to assign into the function nanme and then let control fall out of the body (either by sequentially reaching the end of the function body, or by the unusual construction of branching to statement 0). This assignment can be either copy or non-copy (with the attendant problems raised below). Alternatively, the result can be returned using the RETURN expression construction. This has the important property of returning the value of expression by reference. Hence, be aware that a RETURN could indicate either a desire to avoid copying or a possible sharing relationship. The difference is not always clearly commented in the code.

3. The most important consequence of the lack of pointer data types is the presence of two different types of assignment. Regular assignment (←) copies its right hand argument (i.e. is equivalent to (SETQ LHS (COPY RHS)) ) whereas noncopy assignment (←←) copies the (implicit) pointer only. The latter is used extensively in the code to factor the unpacking of data structures. As structure references in PPL are all interpreted, an expression like A.B.C.D takes three structure references, each of which involves looking up the selector name in the data definition block of the appropriate type. Consequently, when repeated references are being made to the subfields of a structure, you will find code like
X ←← A.B.C.D
... X[2] .... X ... X[1] ... X
which exist solely to eliminate the repeated computation of the address of the B.C.D component of structure A. In the LISP system, the compiler can worry about this as, even if the elimination of common subexpressions does not take place, the offsets can be precomputed by the compiler and are not all that costly.

Much more subtle, however, is that regular assignment is often used to force copying of its RHS, or that noncopy assignment is sometimes used to get a window onto some data structure that is to be smashed (for the same reasons given above). As both have other uses (←← to obtain pointer-like semantics for reasons of efficiency (both space and time) and ← for simple integer/real assignment where the objects appear on the stack and are therefore copied in any case (e.g. in FOR loop index calculations!) ), it is sometimes unclear exactly what effect is required by any given use of either construction. One must be aware of the issues involved.

Another aspect of the argument passing by reference/RETURN semantics that should not be overlooked is that this mechanism can be used to pass/return objects that can not be assigned in PPL. Thus, data types, functions, and selectors can be passed in this manner. This behavior is occasionally made use of in IDL-PPL. However, a crucial weakness in this scheme is that it does not allow such arguments to be passed to variadic functions. The arguments of a variadic function must be passed as a tuple (either by the [ARG syntax (which copies!), or by the explicit preparation of a non-copy tuple in the code, e.g. with the BINDn function), and these objects cannot be assigned into data structures.