Design for Multiple Values in Interlisp-D Bill van Melle, 30 Apr 85 Introduction Common Lisp supports the notion that a function (or form) can return ``multiple values'', rather than the usual one value. The specification is based largely on experience with ZetaLisp, with a couple of spots cleaned up. Multiple values are produced by a special form (values v1 v2 ...). There are several forms that ``receive'' multiple values and spread them in some way that they can be manipulated as regular values: e.g., the receiver can bind one or more variables to the values, can set variables to the values, or can build a list of the values. For consistency, (values) returns zero values, although some receivers cannot distinguish this from the single value NIL. If the receiver is not expecting multiple values, then exactly one value (the first) is returned. If the receiver is expecting multiple values, but only a conventional single value is returned, that value is treated as one ``multiple value''; in fact, one ``multiple value'' is completely indistinguishable from a conventional single value from the program's point of view. Thus, roughly speaking, you only get multiple values if you are expecting them, and as a creator of multiple values you need not care whether your caller actually wanted multiple values. This is a nice feature that Common Lisp makes use of in many places: functions that principally return one value are defined to actually return several values in case the caller is interested in them all. An example of this might be a division function, which returns the quotient as its (first) value, but also returns the remainder as a second value, since it had to compute it anyway, and some callers actually do want the remainder (in Common Lisp, the function floor is so defined). The main complication in multiple values arises from the fact that most subforms that appear to transparently pass thru a value in the conventional way are also required to transparently pass thru multiple values as well. Roughly speaking, progn and anything that is in some way an implicit progn must return multiple values if its final form returns multiple values. Multiple values cannot be assigned to a variable, nor passed as arguments to functions; such cases always act as single-valued. There exist special forms for passing multiple values thru a prog1, and for spreading multiple values as arguments to a function. Unfortunately, this transparency requirement violates the notion that you can always tell by looking at a piece of code in isolation, whether multiple values are involved. It means the implementation needs to be clever enough that ordinary call and return are not substantially penalized by needing to account for the possibility that multiple values might somehow get involved. It also means that certain compiler optimizations cannot be made, although I believe this has relatively minor impact. Outline of Design The design presented here is based on the implementation for Spice Lisp, another microcoded implementation. The general idea is that multiple values appear on the stack spread as several values, with the top of stack being a distinguished marker that identifies the values as multiple and says how many there are. Forms that create multiple values push the values and then this marker. Forms that accept multiple values have special code that tests for the marker, push or pop values to get the stack into the state where it has the number of values it is expecting, then does something with the values. Everything is so arranged that only the few instructions that ``know about'' multiple values are ever executed when the top of stack might be a multiple values marker. The Multiple Values Marker. In a tagged architecture, the marker is easy to define: the marker is an immediate datatype whose tag is ``Multiple Values Marker'' and whose immediate datum is the number of values. We do not have the luxury of tags in the present implementation (though we may wish to plan on them for future implementations). Hence, we need to pick a reserved address or range of addresses not otherwise used. Thus, the choices are (a) choose a single address (say, all ones) and implement the marker as two cells, the top one being this distinguished address and the next being the number of values, as a smallp; or (b) choose a range of addresses, the low (or high) bits of which encode the number of values. For (b) the most likely choice is to choose the address {377,177400}+nvalues, since that page of virtual memory is currently forbidden. Unfortunately, the test for multiple-value-markerp is non-trivial. Encoding the number of values in the high half, with a constant low half, would be a possibility but for the problem that there exist some high halves (litatom, smallpos, smallneg) for which any low half is a reasonable value. In what follows, I will use MVMarker(n) to refer to a Multiple Values Marker for n values, independent of the final representation we choose, be it one cell or two. My current preference is for the one-celled {377,177400}+n representation, which yields a multiple-values-limit of 255. Multiple Values Calling Forms. Every end receiver of multiple values is known at compile time, by virtue of being one of the special receiving forms. A receiver makes known its desire to receive multiple values from a function call by using instructions that set a bit (in somebody's frame) signifying that it is willing to accept multiple values (the bit is tested by return). On return from the call, the caller executes instructions that coerce the top of stack to multiple values (treating no multiple values as exactly one), adjust the stack if necessary (popping excess values, pushing NILs), and finally perform the appropriate operation, depending on the calling form. There are several ways of calling functions (direct function call, apply, apply*, eval, and potentially the ufn's for instructions that could produce multiple values), so either there need to be multiple-valued counterparts to each such instruction (e.g., MVFNX, MVAPPLYFN, MVEVAL), or there needs to be an instruction to compound with the conventional instructions to obtain the desired effect. Open issue: where is the ``Multiple Values Acceptable'' (MVA) flag stored? The choices appear to be as follows: (a) Set the bit in the callee's frame; in other words, treat the bit as ``it is ok for me to return multiple values''. This, I believe, is how it is done in SpiceLisp. It is appealing in that it would set a flag in a location that function call has to write anyway (flag word of the new frame), and would not require any explicit action to clear the bit following the return. However, it might not make sense in the spaghetti stack architecture. In particular, we would have to prohibit spaghetti constructs (e.g., RETEVAL) from returning multiple values, since they could have no way of knowing whether the frame being returned to is willing to accept multiple values. This is a restriction I would not like to arbitrarily impose. (b) Set the bit in the caller's frame; in other words, treat the bit as ``I am prepared to have multiple values returned to me''. This seems to make the most logical sense, works with spaghetti without any changes in the current code, and in addition allows the possibility of having a special instruction to set the flag, rather than require several new instructions that act like old instructions but also set the flag. Its principal drawbacks are the slight performance hit (writing a word that otherwise wouldn't need it) and the fact that explicit code is then needed to turn off the bit in the code following the function call (impact: one extra instruction for multiple-value-prog1). In what follows, I shall assume the flag is in the caller's frame. I ask the microcoders to evaluate the expense. The design is easily changed to allow flag in callee's frame instead if we decide we prefer that. The opcode sequences do not change substantially, except for the ones that would set or clear the flag. Transparent Calling Forms. There is only one way to handle multiple values without explicitly being aware of it: returning directly, without any intermediate processing, the result of a function call. The ``returning'' part can be either a return from the entire function, or a return (via unbinding) from an internal binding form (LET or PROG). In terms of the code executed, the former is a RETURN, the latter an UNBIND. To handle these cases, all tail-recursive function calls (those whose values are to be returned from the current function or to a multiple-value receiver; affected opcodes are FNx, APPLYFN and EVAL) must be compiled as an instruction or instructions that set the MVA flag, in case multiple values are to be transmitted. In addition, the returning and unbinding instructions need to be aware of the possible existence of multiple values. Compiler Architecture. I believe the major constraints for the compiler can be handled by introducing a new compilation context. It is currently the case, I believe, that the compiler has a notion of ``compile expression for value''. This notion can be more finely discriminated. One special kind of value is ``compile expression for return value'', i.e., the value of the expression is to be returned from the function; such a context might even already exist as a means of detecting tail recursion to be eliminated. I propose to add another kind of value context, ``compile expression for multiple values'', meaning that multiple values are explicitly requested from the expression being compiled. Multiple value receiving forms compile one or more expressions in value context MULTIPLE. I shall write ``WithMV[form]'' to mean ``compile form in context MULTIPLE''. In terms of implementation, if there is currently a compile-for-value argument to the expression compiler, its range of values could be expanded to include RETURN and MULTIPLE; both imply for value. RETURN and MULTIPLE are mutually exclusive (MULTIPLE can only arise from a receiver within the function, and RETURN is never transmitted through the receiving form). In addition, transmission of MULTIPLE value context is largely like the transmission of RETURN context. The top-level expression of a function is compiled in RETURN context. The final subform of a progn (or implicit progn) is compiled in the same context as its parent. A return in a prog is compiled in the same context as the prog. Those are all true for both MULTIPLE and RETURN context. There are some specific cases where RETURN context is not transmitted. If a prog1 is compiled in RETURN context, then its first subform will indeed be the value returned from the function, but I do not consider this RETURN context, because this is not tail-recursive (a function call here could not be turned into a jump). Similarly, only the final subform of and or or can be compiled in RETURN context, and a singleton cond clause can never be compiled in RETURN context. Fortunately, these rules agree exactly with the rules regarding when multiple values can be transmitted. With all this in hand, I shall define an abstract compilation context ``multiple values possible'' (MVP), which means ``the form I am compiling is permitted to produce multiple values''. MVP is true when the compilation context is MULTIPLE or RETURN. The compiler's new job is to treat any form of function call in MVP context specially (issuing the instructions that set the MVA flag) In addition, any RETURN or UNBIND in MVP context is potentially special, as it must do the right thing if the top of stack is a set of multiple values. In the case of RETURN, all values must get returned. In the case of UNBIND, the compiler does not know what the actual stack depth is, although it knows the logical stack depth (treating the potentially multiple values as one); whatever does the unbinding has to move potentially multiple values, while decrementing the stack depth the logical amount required. These can be handled either by defining new opcodes MVRETURN and MVUNBIND, which are identical to their companions except for doing something special when the top of stack is MVMarker(n), or by extending the existing RETURN or UNBIND to do the right thing in those cases. For the remainder of the discussion, I shall assume that the compiler emits the MVRETURN and MVUNBIND opcodes in MVP context, although the choice of new opcodes vs. extending the old ones is arbitrary and has no effect on the rest of the design. Specific constructs for producing Multiple Values (values v1 ... vn) This is the basic multiple-value returning form. It compiles as a push of its arguments (as with any other function), then push MVMarker(n). This must be followed by an MVRETURN, for the usual case of a function returning multiple values, or an MVUNBIND, if the form is returning multiple values within the same function, e.g., from an internal LET or PROG to an enclosing multiple-value catcher. If values is compiled in other than MVP context (does not make much sense as a program, but each case must be covered) then instead of pushing MVMarker(n), it compiles as a pop of the n-1 excess values, or a push of NIL if n = 0. (values-list list) This is the only other way to return multiple values; it is the same as (apply (function values) list). It can be compiled open, in much the same form as we compile the central part of EVAL. The expression compiles as something like '0; BIND[temp]; list; lp: LISTP; NFJUMP exit; COPY; CAR; SWAP; CDR; temp; '1; IPLUS2; POP; JUMP lp; exit: temp; MVMARKER with a suitable MVRETURN or MVUNBIND following as needed. To protect against circularity, one could replace the POP with a range check against the constant multiple-values-limit. It is likely that we want the function values-list itself to compile in this fashion, and internal uses of values-list in the interpreter surely want to compile open with no range check (it is convenient to pass values around using multiple-value-list and values-list, and the value of multiple-value-list is guaranteed to be in range). However, it is an open question whether user calls should compile open or closed. The MVMARKER opcode This opcode is used to perform the ``push MVMarker(n)'' operation described above. Definition: If TOS is not a smallposp less than multiple-values-limit (255 following the current proposal), then punt; else replace TOS with MVMarker(TOS). The MVRETURN opcode MVRETURN is the same as RETURN except that it has to notice whether it is returning multiple values. So extend RETURN as follows: if top of stack is MVMarker(n), then check MVA flag (in returnee). Two cases: If flag is off, then multiple values are not acceptable. If n = 0, replace TOS with NIL. Otherwise, pop the MVMarker and n-1 values off of returner's stack, leaving TOS containing the first of the n values. Proceed with normal return. If flag is on, then multiple values are acceptable. Proceed with normal return, but remember where TOS-n, the location of the first of the multiple values, is stored. At end of return, go into a block transfer loop, transferring n values, in order, from returner's stack to top of returnee's stack, finally pushing the same MVMarker(n) as the returnee's new TOS (thus, if TOS is a register, it is preserved, just as it is with normal return). Remember than n can be zero, in which case the block transfer loop terminates without doing anything. Note that the MVA flag is never tested unless actually returning multiple values. No special action is required if returning a single value to a multiple-value receiver. If MVRETURN = RETURN, then the minimum microcode requirement for the first implementation is that RETURN test for TOS = MVMarker(n), and punt if it is. A second stopping point could be to handle the flag is off case in microcode, where all the microcode needs to do is adjust the returner's stack before doing the normal return. The MVUNBIND opcode Similarly, MVUNBIND has to notice whether the top of stack is a set of multiple values, and if so, preserve more than one value as it pops items from the stack. This seems fairly unpleasant, but intrinsically necessary to compile such forms as (multiple-value-bind & (let -- (foo)) --). Thus, MVUNBIND is the same as UNBIND unless TOS = MVMarker(n), in which case it needs to note TOS-n, the location of the first of the multiple values, is stored. It then does a regular UNBIND, but at the end, goes into a block transfer loop, transferring n values, in order, from the old position on the stack to the frame's new top of stack, finally pushing the same MVMarker(n) as the frame's new TOS. Specific constructs for receiving Multiple Values A receiver of multiple values generally does one of two things: (1) It expects exactly n values (n known at compile time), and it will spread them one at a time into some other structure, e.g., into local variables. In this case, the receiver needs to first coerce the stack into having exactly n multiple values, either by popping excess values or by pushing extra NILs. (2) It expects an arbitrary number of values, and merely needs to know how many it is about to do something with. The following new opcodes are defined: MVPREPARE Sets the MVA flag in the current frame. This instruction is issued immediately preceding any calling instruction (FNx, APPLYFN, EVAL) in MVP context. It is desirable that this be a fast, single-byte opcode, since it needs to be issued for all tail-recursive calls, even for code that never heard of multiple values. In a sampling from a random full.sysout with only a few extra modules loaded, approximately 8% of all calling instructions were tail-recursive; MVPREPARE before each of those opcodes would have added some 3800 bytes (7.5 pages) to that sysout. This opcode is not necessary, of course, if we instead define new calling opcodes. MVCLEAR Clear the MVA flag in the current frame. Used only for multiple-value-prog1. MVFORCE ``Tell me how many values I have.'' If TOS = MVMarker(n), then replace it with the smallposp n; else push the small constant 1. In either case, clear the MVA flag in the current frame. STACKADJUST TOS is a small integer. Pop it off stack and call it n. If n is positive, pop stack n more times; if n is negative, push n NILs. Both MVFORCE and STACKADJUST (not to mention multiple-value function calls themselves) have an indeterminate affect on stack depth, although they are always used in sequences whose net effect is determinate. If necessary, introduce composite opcodes or pseudo-opcodes to help the compiler keep track. With these opcodes in hand, the various receivers are described as follows: (multiple-value-setq v1 ... vn form) This sets the n variables v1 thru vn to the values returned by form and returns as value the first of the multiple values (what v1 gets set to). It compiles as WithMV[form]; MVFORCE; SIC n; IDIFFERENCE; STACKADJUST; SETQ vn; POP; SETQ vn-1; POP; ... SETQ v1 Net stack level effect: one push. The sequence of instructions MVFORCE; SIC n; IDIFFERENCE; STACKADJUST has stack level effect of +n-1 (if you assume that the mv-producing form itself contributed exactly 1 to stack depth, as normal forms do). (multiple-value-bind (v1 ... vn) form . other-forms) This evaluates (progn . other-forms) with variables v1 thru vn bound to the values returned by form. It compiles as WithMV[form]; MVFORCE; SIC n; IDIFFERENCE; STACKADJUST; BIND [v1; ...; vn]; (progn . other-forms) followed by the appropriate unbind or return. Of course, the compiling context of the multiple-value-bind is transmitted to the final subform of the inside progn. (multiple-value-prog1 form . other-forms) This is the same as conventional prog1 except that it can return multiple values if form does. In other words: (a) if the multiple-value-prog1 is not in MVP context, then compile as prog1. (b) if the multiple-value-prog1 is in MVP context, then compile form in context MULTIPLE, the remaining forms for effect. In addition, if form's value is computed by function call, the call must be followed by MVCLEAR to clear the MVA flag (which ordinarily would be cleared by MVFORCE, not used in this case). Of course, any return or unbind following the multiple-value-prog1 must be of the mv variety (since you're in MVP context). Note that the ordinary function prog1 never transmits its compile context to its first subform; i.e., prog1 never returns multiple values. For an inefficient implementation, multiple-value-prog1 could be compiled as (values-list (prog1 (multiple-value-list form) . other-forms)). (multiple-value-list form) This evaluates form and returns a list of all the values. It compiles as WithMV[form]; MVFORCE; 'LIST; APPLYFN i.e., call the function LIST with the n arguments that come back from form. SpiceLisp has a LIST opcode that does this, but I don't think we want that. (multiple-value-call fn form1 ... formn) This is effectively an apply* (i.e., funcall) of fn with arguments being the result of spreading all the multiple values returned by form1 thru formn. It compiles as '0; BIND[temp]; WithMV[form1]; MVFORCE; temp; IPLUS2; SETQ temp; POP; WithMV[form2]; ... ; SETQ temp; POP; WithMV[formn]; MVFORCE; temp; IPLUS2; fn; CHECKAPPLY*; APPLYFN followed by a suitable unbind or return. The temporary variable keeps track of how many values are accumulating on the stack. Eventually, APPLYFN is called to do the function call with the total number of arguments. The temporary binding is not needed if n = 1, in which case multiple-value-call can be compiled more simply as WithMV[form1]; MVFORCE; fn; CHECKAPPLY*; APPLYFN Of course, in either case, if fn is a side-affectable form instead of a constant, a temporary binding will be needed for it as well. And if the multiple-value-call form itself is in MVP context, then the CHECKAPPLY* must be preceded by MVPREPARE. For an inefficient implementation, multiple-value-call can be compiled as (apply fn (append (multiple-value-list form1) (multiple-value-list form2) . . . (multiple-value-list formn))). Interpreter Of course, interpreted versions need to be written for all of the multiple value forms described above. In addition, the macros for SPREADAPPLY*, SPREADAPPLY, .CALLAFTERPUSHINGNILS., and .EVALFORM., which hand-compile special versions of function call and APPLYFN in the interpreter, may need to be modified to transmit MVP context to their call or APPLYFN if the compiler doesn't catch those automatically. The function RETURN needs to be an NLAMBDA that manually evaluates its argument and saves the potentially multiple values that come back, before smashing its return link to the appropriate PROG frame. The function \INTERPRETER1, which implements calls to interpreted fns, needs to be careful to return the evaluation of its final subform directly, without setting an intermediate value. There is a similar problem with \EVPROGN and interpreted OR (AND and COND are already written correctly, though). The ufn for the opcode CHECKAPPLY*, which intercepts compiled calls to APPLY* of NLAMBDA nospreads, which currently returns `(LAMBDA NIL (QUOTE ,(SPREADAPPLY* FN (\CKAPPLYARGS)))), needs to instead return `(LAMBDA NIL (VALUES-LIST (QUOTE ,(MULTIPLE-VALUE-LIST (SPREADAPPLY* FN (\CKAPPLYARGS)))))), and similarly for its FAULTAPPLY (udf) case. The PROG1 in EVALHOOK needs to be a MULTIPLE-VALUE-PROG1. Actually, we might want to rip out the current EVALHOOK, since it does not meet the CommonLisp spec. Its inclusion in the Harmony release (and notes) was a violation of our usual policy on new features. The spaghetti functions (RETEVAL, ENVEVAL, etc) appear to be okay as they stand. Break and Advise Advising a function should, of course, not interfere with the transmission of multiple values. The simplest way to insure this is for the advising form to wrap a (multiple-value-list &) around the value coming back, set a new variable !VALUES to this list, and finally return (values-list !VALUES) following any ``after'' advice. For backward compatibility, also set !VALUE to (CAR !VALUES). The same is true for the break package, except it is now BREAK1 and its underlings that must gather up all the values into a list and spread the list on exit. The break command VALUE could be extended to print the values in some nice fashion when the number of values is not 1. Additional Compiler Considerations Stack Depth. While multiple values are on the stack, the compiler does not know the absolute stack depth. This is usually unimportant, since multiple values do not usually remain on the stackthey are usually immediately returned, or passed up within a function by doing an unbind. So I believe the only effect is to prohibit the compiler from using COPY.N and STORE.N optimizations while compiling: (a) the final n-1 subforms in a multiple-value-prog1; (b) all but the first of the forms in a multiple-value-call (and, of course, the temporary variable introduced must be a pvar); In addition, the compiler cannot do an unbind by performing one or more POPs in a place where MVUNBIND is called for (also, any variables to be unbound in MVP context must be pvars). Nasty question: what do you do with a GO inside of one of the arguments to multiple-value-call, or in the extra subforms of a multiple-value-prog1? I think these are the only places you could write GO where the stack depth is indeterminate, and I would not be at all disturbed to forbid it. In the case of multiple-value-prog1, you could handle it by popping everything you know to be on the stack after the first value, then do MVFORCE; STACKADJUST. I can't think of a way to handle the multiple-value-call case, however, unless you really do want to rely on UNBIND blowing away the stack up to the bind mark, something we were trying to get away from. Tail Recursion Removal. I believe this is unaffected. If the call to function FOO permits multiple values to be returned, then whether you perform a tail-recursive call to FOO as a function call with MVA set, or as a jump, the right thing happens. Other Compiler Optimizations. Larry suggests a possible problem with optimizing away the binding of a variable to the value of a function call by replacing the single use of that variable with the call. For example, transforming (prog ((a (foo))) (return a)) into (foo). The latter might return multiple values, whereas the semantics of the first dictate only a single value. Larry: is this an important transformation, and is the transformation at the source level or at the intermediate code level? If this optimization occurs after the first pass compilation of this expression, then there is no problemthe call to foo is not in MVP context, so is compiled as a regular call, and will not return multiple values then, nor will it when it is rearranged to remove the extraneous bind. Larry, are there any other problem optimizations? Summary of Requirements Microcode. Instructions are MVRETURN, MVUNBIND, MVPREPARE, MVCLEAR, MVMARKER, MVFORCE, STACKADJUST. The instructions have been designed so that an initial (very inefficient) implementation is possible with no microcode support at all. Of course, some of the ufns are rather unpleasant; most will require the same ugly trick as we use for the unboxed floating ufnsgrab caller's stack pointer and explicitly manipulate the stack. Most of them need to set NoPush bit so that the ufn returns no value (we can't return a value if it is the MVMarker, since this would confuse RETURN (if MVRETURN is considered the same as RETURN)). Popping caller's stack is much easier than pushing, since the latter bumps into the next frame (usually the ufn itself!). Breakdown as follows: MVUNBIND simulates UNBIND and copies values downward on stack. Net pop, so not too ugly. MVRETURN requires pushing onto returnee, so have to make sure there is space. If returner is what follows returnee, we can smash it out of the way; if free block of sufficient size follows, shrink the free block; else have to move returnee, much as \HARDRETURN does. I'm inclined to have MVRETURN ufn to \HARDRETURN and fold all this garbage into one place. MVPREPARE grabs caller's FX and sets the MVA flag in it; MVCLEAR clears the flag. (Do these confuse the compiler? They are zero-argument opcodes that return zero values.) MVMARKER is fairly easy, since it just requires smashing contents of TOS. MVFORCE either smashes TOS or pushes a value. Do this by having zero-argument ufn (as all the others are), which examines caller's TOS. If it is the MVMarker, smash it and set NoPush bit as you return; if it is not, return constant 1 in conventional manner. STACKADJUST can actually have a ufn of one argument (n). If n is positive, there is a net pop, which is relatively easy; if n is negative, need to push NILs, which gets into the same hairiness as MVRETURN and would probably require doing something in the misc context if there is a frame blocking the way. This is perhaps the least pleasant opcode. Compiler. Compiler needs to add notion of Multiple Values Possible context, Lap code for each of the constructs described above. Need to inhibit certain optimizations (COPY.N and STORE.N in certain situations). Interpreter. Need to implement interpreted versions of all of the Multiple Values forms listed above, and make sure the existing interpreted fns transmit multiple values when required to.  0000GACHA TIMESROMAN TIMESROMAN GACHA TIMESROMAN  TIMESROMAN  TIMESROMAN * / \w.@@!   6S tq?53P+933A"4( $#q!t0Tf x2 D\-  HT- M)' 9 r s* OH R! 1=K&h~^ {WN  ) %W@p 2@ Fr' sJ [R*. %\ 6 L =  @  o  !   W3!. ( ! 3D=I: A #(7 KR  o  q)  #     ;VB ) %$6 @  /(2"G 9s`# V  (EHQ&# 5kg (5X:[I^2    A   + 0mB ) sz