[qv]<IDL>History>DesignMemos.dm!1>Design.Declarations

Heading:

IDL DESIGN: Declarations

Page Numbers: Yes X: 527 Y: 10.5"

XEROX Palo Alto Research Center 17 February 1977
Inter-Office Memorandum

ToIDL GroupIDL Design Note Number: 5

FromBeau SheilFile:<IDL>Design.Declarations

Subject Declaration Mechanisms: Examples

As input to the discussions on the syntax and semantics of the notation to be used to inform the LISP compiler how to compile our code in an efficient manner, I offer an analysis of the declaration needs of three of the functions from the PPL system. The three are of very different types - one is a system function (AT.LINLOC) which calculates the offset within an array (which may be virtual) of the element addressed by a given subscript vector; one (GENVEC) is the function that constructs linear sequences; and one (NORM) normalizes an array using floating point operations.

Although not necessary for discussion of the declaration needs, some comments on other issues raised by these functions are included. Finally, a very brief proposal summarizing the suggestions scattered through the memo is attached.

This memo should be read after (despite the contradictory numbering) Design Note 7 on the interpretation of PPL code. As mentioned there, indentation is used to indicate logical blocks of statements (in particular, scopes of compound IFs and FORs). At certain points, however, the layout used here was changed from that used in the PPL listings to accomodate the narrower page width.

For ease of reference, each function analysis begins a new page.

... AT.LINLOC TAKES AN ARRAY AND A SHAPE ROWINT AND RETURNS A LINEAR
... INDEX THAT PUTELT OR GETELT CAN USE. LINLOC KNOWS ABOUT VIRTUAL
... ARRAYS WHICH AT.LINFAST DOES NOT.

$AT.LINLOC($A,$S);D,I,J,K,SH,SL,T,TT,IDX,X
[1] IF NOT(GET.VIRTP(A)) THEN
AT.LINLOC←AT.LINFAST(GET.SHAPE(A),GET.FORMAT(A),S);
GOTO %0
[2] X←←GET.BASE(A); SH←←GET.SHAPE(X); TT←←GET.TTAB(A)
[3] IF LENGTH(GET.SHAPE(A))=0 THEN
AT.LINLOC←AT.LINFAST(SH,GET.FORMAT(X),TT);
GOTO %0
[4] SL←LENGTH(SH)
[5] IDX←SH; J←1
[6] FOR I←1:SL DO
T←←TT[I];
D←(IF T==ARRAY THEN GET.NDIMS(T) ELSE IF T=0 THEN 1 ELSE 0);
IDX[I]←(IF D=0 THEN T
ELSE
IF D=1 THEN IF T==INT THEN S[J] ELSE AT.GETELT(T,S[J])
ELSE
X←←MAKE(ROWINT,D,0);
(FOR K←1:D DO X[K]←S[J+K-1]);
AT.GETELT(T,AT.LINFAST(GET.SHAPE(T),GET.FORMAT(T),X)));
J←J+D
[7] AT.LINLOC←AT.LINFAST(SH,GET.FORMAT(GET.BASE(A)),IDX)
$

The variable AT.LINLOC, which appears at [1], [3] and [7] is being used to return the function’s results. The function name variable is sometimes used also for local storage within a function, but not in this case. Note that, as the result returned is an integer, conventional assignment is used.

Argument decls:

A:ARRAY
S:ROWINT st LENGTH(S)=LENGTH(SHAPE(A))

Local decls:

The variables X, SH, and TT in [2] are being used as structure dereferences. Their data types are

X:ARRAY
SH:ROWINT
TT:TUPLE (a set of selectors)

Various predicates can be asserted about them from the context, but I feel that this would be (a) redundant given the context and (b) not of any great use as I expect this use of local variables to be less common in LISP.

The variables I, J, K, D, and SL in [4-6] are all integers restricted to be between one and the maximum dimensionality of some array. Other variables (not in this program) are restricted to be less than the maximum extent on some dimension. While we could decl these directly as small-ints and half word ints respectively, I suggest instead that we decl them in terms of their semantics, and substitute appropriate LISP objects during decl translation. Hence, I propose

I, J, K, D, SL:DIMENSION

where DIMENSION will map (probably) into a small-int, and an analogous "data type" EXTENT (which will probably map into half words) be used for integers constrained to be less than the maximum value of subscripts.

IDX (which clearly has the same decl as SH) is initialized in [5]. Note that ← is used to force a copy of the substructure from A so that IDX can be smashed in the code which follows.

[6] has instances of two other interesting constructions. The first is the use of T as a holder for the ith element of the selector tuple. If we reflected that factoring in the LISP code it should be decled

T:ARRAY or INT

Alternative data types like this are quite common for the reason it is used here - the selector may be either and you can’t tell which until you grab it. The second point of interest is the assignment to X. This illustrates a persistent problem in the code - the use of one name for two quite different purposes in a function. This use of X should be decled

X:ROWINT

in contrast to the ARRAY decl in [2]. Note that this form of data type uncertainty is quite different from that of T above. This instance comes from the second use of the variable being for a quite different purpose than the first, although each use is itself of determinate data type. It seems that, in the LISP system, this would be a foolish economy.

Decls tend to be complex in proportion to their "height" - or, alternatively, to their closeness to user supplied input - in the system. Contrast the much simpler function that is one level deeper in the system.

... AT.LINFAST IS A FAST, NO-CHECKING FUNCTION FOR CONVERTING A REFERENCE
... (IDX), FOR SHAPE (SHP), AND FORMAT (FMT) INTO A GETELT/PUTELT INDEX.
... [THIS FUNCTION SLIGHTLY CHANGED FROM THE PPL VERSION]

$AT.LINFAST($SHP,$FMT,$IDX);I,J,T
[1]IF FMT=SYMMETRIC THEN
I←IDX[1];
J←IDX[2];
(IF I<J THEN T←I; I←J; J←T);
RETURN J+(I*(I-1)/2)
[2]T←IDX[1]
[3]FOR I←2:LENGTH(SHP) DO T← ((T-1)*SHP[I])+IDX[I]
$

Here, the decls are completely determined and corresponding efficiencies can be gained. We have

SHP, IDX:ROWINT

I, J, T:EXTENT

Although FMT could also be decled as EXTENT (it is a bounded integer code), an obvious generalization suggests itself

FMT:FORMAT

where FORMAT can be expanded by the decl translator in some appropriate fashion.

... GENVEC IS THE USER ENTRY TO THE VECTOR GENERATOR. ALL LINEAR
... SEQUENCES OF EITHER INTS OR REALS MAY BE OBTAINED FROM GENVEC.

$GENVEC($X,$Y);A,B,I,N,S,D
[1] IF UNASSIGNED(X) ! UNASSIGNED(Y) THEN NOBIND; BREAK()
[2] A←←CONV.ROWARITH(X)
[3] I←LENGTH(A)
[4] IF I=0 THEN "GENVEC: EMPTY INITIAL SEQUENCE"; BREAK()
[5] B←←CONV.ARITH(Y)
[6] S←A[1]
[7] D←(IF I=2 THEN A[2]-S ELSE IF B>S THEN 1 ELSE -1)
[8] N←INT((B-S)/D)
[9] IF N<0 THEN "GENVEC: INCONSISTENT INCREMENT AND BOUNDS"; BREAK()
[10] N←N+1
[11] GENVEC←←MAKE(IDL.ROWMODE(S),N,S)
[12] FOR I←2:N DO GENVEC[I]←(S←S+D)
[13] GENVEC←←CONV.SARRAY(GENVEC)
$

The two major points of interest here are the distinction between the argument decls of user and system functions, and polymorphic arithmetic.

Argument decls:

Note the check in [1]. This comes about because we have used PPL call by reference (to avoid copying) but, in the process, have allowed all types of garbage to be passed in. This will presumably not be an issue in LISP. However, it raises the interesting point that this type of error reflects a user’s mistake which should be sent back to him/her, in contrast to declaration failure of a system function, which indicates internal malfunction. Ideally, this should all be done from the decls of the formals of the function without explicit code, the response being to flush the stack to the entry point of the function and then break (in a similar fashion as that in which the LISP system would respond to an uba). Some notation should be found to express the difference between these two situations.

To further complicate matters, it is clear from the code which follows that the actual types of the arguments are less important than that they be coerceable to ROWARITH and ARITH respectively. Ideally, one would like the argument decls to generate all this automatically (but still return any error to the user)

In any event, the decls for the arguments are
X:ROWARITH (an alternate data type) st LENGTH(X)<3
Y:ARITH (also alternate)

Local decls:

The issue in the local decls is that this function can produce either REAL or INT results depending on the type of its arguments. Thus polymorphic arithmetic is used throughout. This only shows in the use of alternate data types for the locals (i.e. hardly at all in the PPL)

The decls are (easily)

S, D:ARITH (depends on the type of the input)
I, N:EXTENT

... NORM1 NORMALIZES A MATRIX BY DIVIDING EACH ENTRY BY THE PRODUCT OF THE
... SQRTS OF ITS DIAGONAL ELEMENTS. THE USER ENTRY FN (NORM) IS NOT SHOWN.

$NORM1($A);I,J,L,SHP,ATYP,IDX,DIAG,T
[1] SHP←←GET.SHAPE(A)
[2] ATYP←GET.ELEMTYPE(A)(0)+0.
[3] IF LENGTH(SHP)#2 THEN "NORM: INPUT NOT A MATRIX"; BREAK()
[4] J←MIN.MIN(SHP[1],SHP[2])
[5] L←0; IDX←←MAKE(IDL.ROWMODE(ATYP),J,ATYP)
[6] FOR I←1:J DO
T←←AT.GETELT(A,AT.LINLOC2(A,I,I));
IF NOT(T==NONE) THEN
IF T>ATYP THEN IDX[I]←T; L←L+1
[7] NORM1←←ALLOC.RESULT(ROWINT(L,L),GET.FORMAT(A),ATYP)
[8] DIAG←←MAKE(IDL.ROWMODE(ATYP),L,ATYP); SHP←←MAKE(ROWINT,L,0)
[9] L←0; FOR I←1:J DO
IF IDX[I]#ATYP THEN DIAG[L←L+1]←SQRT(IDX[I]); SHP[L]←I
[10] FOR I←1:L DO
FOR J←1:(IF GET.FORMAT(A)=SYMMETRIC THEN I ELSE L) DO
T←AT.PUTELT(NORM1,AT.LINLOC2(NORM1,I,J),
AT.GETELT(A,AT.LINLOC2(A,SHP[I],SHP[J]))
/ (DIAG[I]*DIAG[J]))
[11] IF GET.LABELF()[1]>0 & GET.LABELP(A) THEN
SHP←←CONV.SARRAY(SHP); T←←AT.FAST(A,BIND2(SHP,SHP));
J←←LAB.LABSIZE(T)[1];
(FOR I←1:2 DO IF J[I]=3 THEN J[I]←2);
NORM1←←ALLOC.LARRAY(NORM1,J,J); LAB.COPYALL(T,NORM1);
T←←GET.TITLE(NORM1);
IF LENGTH(T)>0 THEN PUT.TITLE(NORM1,CONCAT("NORMALIZED ",T))
$

NORM requires dealing both with the issue of polymorphic arithmetic in a major way, and also with the problem of missing values. The current system has lapses in the latter area because of the inconvenience of dealing with the missing values at each point. In general, the missing value semantics are that any arithmetic operation which has a missing value operand returns a missing value result. This is ok as long as the missing values are destined for a user object, as they almost always are. The arrays which are used internally (for selectors etc) are always checked to make sure that they do not contain missing values. The issue arises in the decls from the distinction between an ARITH and a SCALAR (= ARITH or missing value). My feeling is that we should use specialized functions, rather than the system SUBRs, for performing arithmetic operations on arguments that might include missing values.

Argument decls:

A:ARRAY st number of dimensions (LENGTH(SHAPE(A)) ) is 2 (aka MATRIX ?).

Local decls:

Polymorphic arithmetic at its slimiest arises in statement [2], where ATYP is set to 0 in the datatype of the elements of the input, provided only that it not be INT (i.e. REAL or DBL). If the input is INT, the result is forced to be REAL by the addition. The decl, at least, is clearly ATYP: ARITH.

The other decls are straightforward except for SHP, T (twice!), and NORM1 being reused for different datatypes during label processing [11].

I, J, L:EXTENT (throughout the function), and we have one new data type

T:STRING

Summary

As an initial proposal, let me summarize the suggestions made in the preceding discussions, and add some more global details.

I suggest the following syntax:

(DECL declaration-list prog-body)

where declaration-list is of the form (decl decl decl), and

where decl is of the form (name type initial-value form), and

where name is an atom, type is a defined IDL data type (an initial list of which follows), initial-value is either an expression (whose value is of type) to which name is to be bound on entry to the block or the distinguished atom ARG or USER, and form is a form which the decl translator will cause to be evaluated, and will insist returns T, whenever name is rebound.

If the initial-value is ARG, the decl translator will not allocate new storage within the block, but will accept both the storage and initial value which is already associated with name, after having evaluated form.

If the initial-value is USER, the translator will treat it like ARG, except that the form will be evaluated only on block entry and, in the event that it fails, a graceful return to user level will be made. As the intent is for this facility to provide checking of user arguments to top level functions, USER code from the decl translator will remain in the system past debug time, unlike the code from other DECL forms.

A tentative list of basic IDL types might include:

INT, REAL, ARRAY, STRING, LIST, ROWINT, ROWREAL.

Some types which are derivative from these but useful are:

EXTENT, DIMENSION, FORMAT, MATRIX, ARITH, ROWARITH.

There must be many more.