[qv]<IDL>History>DesignMemos.dm!1>Reflections.10Feb77

Generators: We went around on this for a long time - trying out coroutines, generators for enumerating an array inside a function (to prevent the cost in transparency that is paid at the moment for modularity in AT.LINLOC etc), and generators across MAPed functions. It would seem that coroutines are necessary to deal with streaming functions which have substantial amounts of state to preserve across returns e.g. a function that takes a vector and produces its cumulative sum must preserve at least two state variables across consecutive returns - the only way to do this seems to involve coroutining and it would be silly to duplicate the LISP facilities for this. However they are too expensive to use in the parts of the system that are called often (cost is ~2 mill per return).

Generators for enumerating an array inside a function are more straightforward and might be a good idea inside even the PPL version. The idea would be that you pack up all the context that is used by AT.LINLOC, GETELT etc that is now recalculated from one call to the next, and pass that context around in a block that is given back to the enumerator fn at each call. Thus you would have two fns - a set-iterator which sets up the block (including the direction of traversal desired by the caller) and then you would just kick that each time you needed a new element avoiding the subscript manipulations entirely. This seems like a fine idea, and might even remove some of the hackery in functions like MOMENTS.

Another, more exciting idea, was the observation that one could effectively do generators without a full coroutine structure if and only if the calling function had no state to save across returns. (This is the general principle which accounts for the feasibility of the ennumerator idea above - the only structure that needs to be saved are the state vars of the address calculating fns, and this can be special cased). However, this property also holds for the source of one of the worst sources of large intermediate array results i.e. mapped operations. One rarely composes functions all that deeply which generate large objects, but one can build quite large trees of arith ops without much effort and these kill you in proportion to the size of the object involved. Furthermore, the objects are quite large quite often. Thus, this suggested the technique whereby MAP produces generators which contain its state variables (the function name, and generators for each of its operands (including some MAP control info if they are not simple arrays)). Thus MAP of an expression like CHI SQUARE could produce just a tree structure of generators which could be evaluated element by element as needed.

The next observation was that this object in fact functioned very much like an array and could actually be thought of as a virtual array of a certain type. Like virtual arrays, not all of its properties are calculable from the description more easily than by calculating the object itself, but some are. SHAPE for example could be computed provided only that the declarative info that is stored with the mapping info for a function specified the size of the object returned - whether this is possible in general (at the cost of excluding only bizarre instances (e.g. allowing the user to map fns whose result shapes depend on the contents of their inputs - e.g. the vector of 5 5’s) remains to be thought about). Some fanciful speculation about how far one could allow these virtual objects to wander followed. It would seem that there are problems with sharing and side effects to the copy of the array that is at the bottom of the generator stack if the generator is allowed to wander around the world too long. This is essentially the same problem that led to IDL PPL devirtualizing a virtual array on assignment (to prevent the virtual array from existing across any point at which the user could get in and assign into the base array). The need for a better solution to this problem lead to the second discussion.

Copying/sharing: A long discussion of this problem can be summarized very simply - there are two consistent points of view - assignment copies always; assignment shares always. The two points of view have symmetric defects. If assignment always copies then assignment can be a storage expensive operation, there is no way for the user to have two different names for the same (or overlapping) objects, and thus multiple updating is very difficult. The best that can be done is to have multiple windows onto an object which can be used as selectors on the left hand side of an asasignment to store thru (e.g. X@Y←whatever for different Y). If assignment always shares, however, there is no way to consistently implement storing into an aggregate unless all aggregates are considered to contain pointer values (as the unqualified names in LISP are all pointer values, and therefore obey pointer assignment semantics). This latter stratgey is considered unacceptably inefficient. A good compromise is to treat the world as if it always obeys copy semantics but to do this by sharing internally, using reference counts, and copying objects only when they are written into. This is in general a good solution - has only two drawbacks. (1) There is no way for the user to force sharing so as to provide multiple updating - this may be added by a special purpose hack, perhaps (requires some thought) (2) it is not clear how, when you copy on write, you attach the new copy back to the global binding in terms of which the request to write was made i.e. FOO[3]←8 for a shared array FOO requires the sharing to be broken and the new copy to be bound to FOO - the problem is that if FOO is the result of an expression (e.g. (if A=B then FOO else BAR) ) then you may not have the name whose binding requires correction at the time you have to do the copy. This may be a fatal flaw.

Labels: This was the last topic of discussion for the meeting and was discussed primarily at a meta-level. The technical discussion concerned how one maps from arrays to names and back without creating self justifying patterns of linkages. No real consensus was arrived at as the discussion moved to the question of whether it was worth debating the user function specifications that required this bidirectional link in greater detail. The two sides of this argument are, on the one hand, that the linkage should be put in at whatever cost (e.g. preventing any garbage collections, using disk storage, explicit free calls, or whatever) until sufficient experimentation has taken palce to determine whether the facility is worth it and, on the other hand, that such a large scale implementation decision that may have ramifications on other aspects of the system should not be taken without some thought as to the exact scope of facilities that should be provided and some tradeoffs as to the cost/vlaue of these. No definite conclusions were reached.