TJaMDoc.tioga
Copyright Ó 1987 by Xerox Corporation. All rights reserved.
Michael Plass, September 24, 1991 12:10 pm PDT
TJaM
CEDAR 7.0 —
TJaM
JaM interpreter for Tioga styles
Michael F. Plass
© Copyright 1987 Xerox Corporation. All rights reserved.
Abstract: Tioga uses a reverse-Polish language called JaM as the basis for it's style machinery. TJaM is the JaM interpreter that Tioga uses.
Created by: Duog Wyatt
Maintained by: TiogaImplementors^.pa
Keywords: interpreter, JaM language, JaM, stack-based language, styles, Tioga
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

Introduction
JaM is a simple stack-based language. JaM is intended to be a flexible system, giving the user rather direct control over the primitives. It is not intended to be a fault tolerant system for beginning users. This manual is written in the same spirit: the intent is to explain only those aspects of JaM which are not properties of programming languages in general. It gives explanations of all but the most obscure intrinsic functions and a sampling of the most useful external utilities.
All functions in JaM make use of the operand stack. This stack contains objects: integers, reals, booleans, character strings (ropes), identifiers (atoms), commands, dictionaries, arrays, and special stack markers. Execution proceeds by getting a token from the input stream, converting it into an object, checking whether or not it is executable and executing it. Non-executable objects are pushed onto the stack. During execution, operands may be retrieved from the stack and results are returned on the stack. This means that all input is given in postfix notation.
Basic Operation
JaM execution proceeds simply by transforming the input stream into a series of objects and processing them sequentially. There are two types of objects: nouns which are automatically placed on the operand stack, and verbs which are immediately executed. The JaM scanner parses the input stream into a series of tokens separated by tabs, carriage returns, spaces and commas. There are three types of tokens: numbers, strings and identifiers. Numbers and strings are converted directly into the corresponding objects and placed on the operand stack. Identifiers are looked up in the current dictionary (explained later) and the resulting object is processed according to whether it is a noun or a verb.
Syntactically, a string is a sequence of characters enclosed in parentheses. It may contain anything except unbalanced parentheses (even carriage returns). Any token which is neither a number nor a string is an identifier. Alternatively, strings may be enclosed in double quote marks, as in Cedar.
The system has a total of three major stacks: the operand stack, the dictionary or context stack, and the execution stack. The dictionary stack is used for keeping track of identifiers during execution. This stack functions as a series of nested contexts, much like the blocks of a block structured language. A dictionary is essentially a table for associating identifiers with their values. When the scanner comes across an identifier, it tries to look it up in the dictionary on the top of the dictionary stack. If the lookup is successful, the value found in the dictionary is the next object returned by the scanner. Otherwise, the identifier is looked up in each of the other dictionaries on the dictionary stack in sequence, until a value is found.
The execution stack is used for keeping track of nested function calls. It contains the information necessary to implement recursive function invocations. It need not be the direct concern of the user since he cannot refer to it directly.
Lexical syntax
String literals are defined in either of two ways: by enclosing them in parentheses or using double-quotes, Cedar-style:
(This is a string)
"This is a string"
(This is a string (with balanced parens inside))
"This is a string (with balanced parens inside)"
(This is a string with "quotes" in it)
"This is a string with \"quotes\" in it"
"This string has an unbalanced '(' and cannot be directly entered with the paren notation"
Numeric literals are pretty much standard fare:
1 -20 177B
1.0 6.9E+10
Curly braces are used for defining special arrays of quoted literals. Example:
(add1) {1 .add}.cvx .def
defines an op "add1" that adds one to the top element on the stack. The way this works is
(add1) leaves a string on the stack.
{1 .add} makes a special array of quoted literal and leaves in on the stack.
.cvx takes the array and makes it executable.
.def takes the name and the body and puts the pair into the current dictionary.
Percent sign causes the remainder of the line to be ignored (you can use tioga comments for whole-node comments):
1 1 .add = % this is a comment
Any other consecutive sequence of printing characters constitutes a name (atom).
Commands and Utilities
This discussion is organized by function. All commands relating to a particular topic are given together along with an explanation of the particular aspect of the system to which they relate. The exact placement of arguments and results of each function on the stack is given by a diagram. For example:
.add<x><y> b <x+y> õ Add two real numbers.
The symbols in angle brackets, <x>, <y>, <x+y>, represent objects on the stack. An arrow (b) separates the condition of the operand stack before the command is executed from the condition after. Only the top-most objects on the stack are shown, the others are assumed to remain unchanged. A comment may describe the command following the symbol õ. A double vertical bar (||) will be used to mean the bottom of the stack. There are various error conditions which can occur when trying to execute a command. These are discussed in a separate section (see Error Handling) because there are special functions relating to this topic.
Arithmetic and Bit Manipulation
There are three types of numeric objects: reals, integers, and long integers. Integers and reals are 32 bits. Implicit type conversion is performed on numeric objects, but it is also possible to do these conversions explicitly (see Type Conversion).
.add<x><y> b <x+y> õ Add two real numbers.
.sub<x><y> b <xy> õ Subtract two real numbers.
.mul<x><y> b <xy> õ Multiply two real numbers.
.div<x><y> b <x/y> õ Real division.
.idiv<x><y> b <x DIV y> õ Integer division.
.neg<x> b <—x> õ
.cos<x> b <cos(x)> õ (x in degrees)
.sin<x> b <sin(x)> õ
.atan<y><x> b <p=atan(y/x)> õ (—180 < p d 180)
.exp<b><e> b <be> õ (result always of real type)
.log<b><v> b <logbv> õ
.sqrt<x> b <x> õ
Boolean and Relational Commands
The following commands deal with boolean objects. They have two possible values represented here by .true'' and .false''. For integer comparisons, implicit type conversion occurs before comparison. If one argument is integer and the other is long integer or real, the integer is converted to that type. Similarly, long integers may be converted to real type. Strings may also be compared using lexicographic ordering. It is not legal to compare integers with strings.
.trueb .true õ
.falseb .false õ
.eq<x><y> b <.true> if x = y, otherwise <.false> õ
.gt<x><y> b <.true> if x > y numerically or lexicographically, else <.false> õ
.lt<x><y> b <.true> if x < y numerically or lexicographically, else <.false> õ
.not<x> b <~ x> õ
.and<x><y> b <x ' y> õ
.or<x><y> b <x ( y> õ
.xor<x><y> b <x y> õ
Stack Manipulation Commands
It is often useful to manipulate the operand stack. It is probably not good practice, however, to try to use the stack for all temporary storage. Define local variables instead (see Dictionary Related Commands). Overuse of stack manipulation makes programs hard to read and difficult to debug. With this warning in mind, the following diagrams should make these commands clear.
.pop<x> b õ
.copy<x1><x2> ... <xi><i> b <x1><x2> ... <xi><x1><x2> ... <xi> õ (i must be of type integer)
.index<xn><xn—1> ... <x1><i> b <xi> õ (<i> is an integer less than n)
.cntstk||<x1><x2> ... <xi> b <x1><x2> ... <xi><i> õ
.roll <xi> <xi-1> ... <xj+1> <xj> ... <x2> <x1> <i> <j> b
<xj> <xj-1> ... <x1> <xi> ... <xj+1> õ (i and j are of type integer.)
.dup<x> b <x><x> õ
.exch<x><y> b <y><x> õ
.clrstk||<x1><x2> ... <xi> b || õ clears the stack
Stack Marking and Mark Manipulation Commands
There is a special object called a stack mark. Its main purpose is for keeping variable numbers of arguments on the stack. The .loop and .rept commands (see below) use this concept internally to allow arbitrary execution inside the loop without losing track of the original condition of the operand stack.
.markb <mark> õ (mark type object put on stack)
.cnttomrk<mark><x1><x2> ... <xi> b <mark><x1><x2> ... <xi><i> õ
.clrtomrk<mark><x1><x2> ... <xi> b <mark> õ
Execution Control Commands
JaM has much of the execution control machinery found in Algol-like languages. This includes looping'', if-else'' and select'' constructions. There is no go to'' command, however, as this would not easily fit into the stack oriented structure of JaM. The constructions just mentioned do fit in with the stack oriented structure because they can operate on executable objects in the operand stack. For example, the .if command expects a boolean object and any other object on the operand stack. If the boolean equals .true, then the other object is executed, otherwise the .if command pops the object from the stack.
.exec<x> b õ Remove x from the operand stack and execute it as if it just came from the input stream. However x got on the stack, it is the result of some kind of execution. This means that x is effectively evaluated twice as with the eval'' function in Lisp. To reverse this effect, see Type Conversion
.if<b><x> b õ if b = .true then execute x
/if<b><x> b õ utility equivalent to: .cvx .if''
.ifelse<b><x><y> b õ if b = .true then execute x else execute y
/ifelse<b><x><y> b õ utility equivalent to: .ifelse .cvx .exec''
.rept<i><x> b õ execute x i times
.loop<x> b õ execute x forever
.for<i><j><k><x> b õ Execute xÐ(ki)/j+1 times, with i on top of the stack the first time and i+j, i+2j, ... on top thereafter.
.exitb õ Exit from current .rept, .loop, .dictforall, or .arrayforall loop. This clears the operand and execution stack down to the marks placed upon entering the current loop.
.stopb õ Clears the execution stack, terminating all unfinished execution. To see how this affects error conditions, refer to Error Handling.
.interruptb õ same as .stop
Dictionary Related Commands
Variables are stored in special objects called dictionaries. Dictionary objects are general symbol tables useful for all kinds off storage. They have a size hint specified at the time of their creation, but may grow to any size.
As mentioned earlier, there is a stack of dictionaries maintained by the system. The dictionaries in this stack are used like levels of static nesting for variable definitions in an Algol-like language. Dictionaries may be named and retrieved for later use just like other objects in JaM.
In the following table, <k> refer to keys or variables and <v> refers to the corresponding values.
.curdictb <current dictionary> õ pushes the top of the dictionary stack onto the operand stack
.dict<i> b <d> õ creates new dictionary d with capacity of i entries
.def<k><v> b õ Associate the value v with the key k in the current dictionary. To define a function, make v executable.
/def<k><s><v> b õ Equivalent to: <k><v>.def $help<k><s>.put''. The $help dictionary contains informative messages about certain functions. It provides a convenient way to document programs. To access this dictionary, see Input/Output Commands. This utility resides in the file basic.jam
/xdef<k><s><v> b õ Equivalent to: .cvx /def''.
.del<d><k> b õ deletes the key k from the dictionary d
.load<k> b <v> õ Retrieve the value associated with the key k in the current dictionary. If v is an executable string,this allows its definition to be printed. (See Type Conversion and Input/Output Commands).
.store<k><v> b õ Finds a definition of k in the current dictionary and replaces that definition with the value v. If no definition of k exists, this functions as .def
.put<d><k><v> b õ associates value v with key k in dictionary d
.get<d><k> b <v> õ retrieves the value v associated with k in dictionary d
.known<d><k> b õ <.true> if key k is in dictionary d, <.false> otherwise
.where<k> b õ <d><.true> if k is found in some dictionary d, <.false> otherwise
/dir<d> b õ Utility from basic.jam which prints all key, value pairs in the dictionary d
/kdir<d> b õ utility which prints all the keys in the dictionary d
??<d> b õ This is a utility from basic.jam equivalent to $help /kdir. This prints all functions on which help is available. This usually includes most of the external utility functions which have been loaded.
.clrdict<d> b õ Clear all entries from dictionary d
.dictforall<d><x> b õ Put <k><v> on the stack, and then execute <x>. This is done for every k, v pair in dictionary d
.begin<d> b õ Push d on the top of the dictionary stack. (This makes d the current dictionary)
.end.end b õ Pop current dictionary off the top of the dictionary stack.
.sysdictb <system dictionary> õ
.length<d> b <i> õ the current number of entries in the dictionary d
.attachdict<d1><d2> b õ attaches dictionary d2 to dictionary d1, so that searches within dictionary d1 will find elements of dictionary d2.
.detachdict<d1><d2> b õ detaches dictionary d2 from dictionary d1.
.attachedforall<d><x> b õ Puts dictionary di on the stack and executes x for each dictionary di attached to dictionary d.
.detachall<d> b õ Removes all attached dictionaries to dictionary d.
Array Related Commands
Array objects are linear arrays of objects indexed starting at zero. They have fixed lengths determined when they are created. There are commands for creating arrays, storing into them, retrieveing objects from them, etc. Most commands either expect array objects on the operand stack or return array objects. Arrays can also be made executable. (See Type Conversion)
.array<i> b <a> õ Creates a new array a of i objects.
[b <mark> õ Mark the operand stack for use by the ]'' utility.
]b <a> õ Put all the objects down to the first stack mark into an array a and return it on the operand stack. [ obj1 obj2 ... obji ]'' forms an i-element array.
.subarray<a><i><j> b <a`> õ a` is the subarray of a starting at position i and of length j. If i+j > length(a) this causes a range error.
.aput<a><i><v> b õ Store v in the i-th position of a, if 0 d i < length(a).
.aget<a><i> b <v> õ Get v from the i-th position of a, if 0 d i < length(a).
.aload<a> b <x1><x2> ... <xi—1><a> õ (where a is of length i)
.astore<x0><x1> ... <xi—1><a> b <a with x0, x1, ... , xi—1 stored in it> õ (a is of length i)
.arrayforall<a><x> b õ Puts a[i] on the stack and executes x for each object a[i] in a.
.abind<d><a> b õ Binds names in array a with values in dictionary d.
.afind<a><x> b <i><ok> õ Searches array a for element x and returns ok as .true if ai = x, otherwise ok is .false and <i> does not appear on the stack.
.length<a> b <i> õ Replaces array a with its length.
Input/Output Commands
TJaM does not provide a set of commands for I/O, since it is meant to be used with the style machinery. Just one input-related command is provided:
.run<file name> b õ The file is read as a byte stream and executed. This is how to read files of JaM function definitions.
Type Conversion Commands
There are several different types of objects in JaM. It is possible to change existing strings (see Scanner and String Manipulation Commands) but their length remains constant. There are two numeric types: integer and real. When the scanner finds a string without any decimal point, it tries to make it an integer, then if it's too big, a long integer. Strings too long to be long integers are converted to reals. Strings containing decimal points naturally become real type objects. It is not possible to enter numbers in exponential notation; however, one can type 6.7 10 -11 .exp .mul to get 6.7 10—11.
Type conversion commands allow the user to determine the types of objects and convert from one object type to another. Type mismatches cause run-time errors. These errors cause special error routines to be executed, which are user definable.
.type<x> b <Name of Type> õ Deliver the type of <x> on top of the operand stack. Current types include: .nil, .int (32 bits), .real (32 bits), .atom, .rope, .stream, .cmd, .op, .array, .dict, .mark, .other.
.length<x> b <i> õ Length i of: string (in characters), array (in elements), dictionary (in entries).
.cvs<x> b <s> õ Convert to string equivalent. Applies to numbers, strings.
.cvi<x> b <i> õ (converts numbers to type integer)
.cvr<i> b <x> õ (converts numbers to real type)
.cvx<s> b <s`> õ Converts strings or arrays to executable form. They become verbs. This is the way to get an executable string or array on the operand stack as is required by the execution control commands.
.cvlit<x> b <l> õ This converts strings and arrays into literal form (makes them nouns). This undoes the effect of .cvx.
.commandname<x> b <s> õ Returns the name of command x as a string s.
The most important type conversion is the conversion to executable. This is necessary every time a function is defined. There are two main executable forms in JaM: strings and arrays. When a string is expected, the scanner must extract the objects from the string and each identifier must be looked up in the dictionary stack. When arrays are executed, they are already sequences of objects. In addition, all the identifiers have been looked up ahead of time.
Executable strings are simpler to use because it is easier to change the definitions of the functions they call and they are easier to read and print. The use of the functions .cvx, .load, and .exec when defining executable arrays can be very confusing. Recursive functions are also more difficult with arrays. The following examples illustrate the difference in format:
(average) (.add 2 .div) .cvx .def
(average) [ (.add) .load 2 (.div) .load ] .cvx .def
There is a special syntax to combine the advantages of both. If you enclose the body in curly braces, JaM implicitly quotes everything inside and makes an array. Thus the preferred way to write the above definition is
(average) {.add 2 .div} .cvx .def
which will have the same effect as the second example above.
Occasionally it is desirable to define a very long body that is too big for the "{. . .}" notation to handle; in this case revert to the string representation, which can handle very large objects.
String Manipulation Commands
Strings in TJaM are represented by Cedar ROPEs.
.length<s> b <i> õ Replaces the string s with its length in characters.
.substring<s><i><j> b <s`> õ <s`> is the j character substring of s starting at position i. We must have 0 d i < length(s).
.search<t><s> b õ <a><s><b><.true> if there is a substring of t matching s, with b and a respectively the parts of t before and after the first occurrence of s, and <t><.false> if no substring of t matches s.
.asearch<t><s> b õ <a><s><.true> if a starting substring of t matches s -- a is the rest of t, <t><.false> otherwise.
.ropeconcat<s><t> b õ <r> where r is the concatenation of s and t
Error Handling
Run time error handling routines in JaM are user definable. When such an error is detected, the appropriate function (identifier, whose value is to be executed) is called. The default definitions of these functions are found in the file errordefs.jam, but those for Tioga are in TiogaUtils.jam. Not having these defaults loaded will cause an infinite string of errors when JaM tries to call the error handling routines. These defaults may be changed, but typically they stop execution and print the stack after displaying a short message regarding the type of error.
When writing new error handling routines one must keep in mind that the operand stack and execution stack must remain intact when the error handling routine is called. The error handling routines are:
.stkundflow  b õ Called when there is an attempt to get something from an empty stack.
.undefkey  b õ Called when an identifier cannot be found in a dictionary.
.longname  b õ A rare error caused by excessively huge file names.
.badname  b õ Called when a string which was supposed to be a file name cannot be found in the directory.
.typecheck  b õ Called when some argument of a function is of the wrong type. The offending primitive command is left on the stack along with its other arguments.
.dictfull  b õ Called when an attempt is made to define something into a full dictionary. This may mean a system utility has tried to define its own temporary variable into a full user dictionary.
.syntaxerr  b õ Called when the scanner finds an unmatched )''.
.overflow  b õ Arithmetic overflow.
.stkovrflow  b õ The stack filled up. Before the error is raised, the stack contents are packaged into a single array, which is left as the only element on the stack. Look for infinite recursion, if you get a stack overflow.
.rangechk  b õ Something out of range, usually in an array or string manipulation command.
.sizechk  b õ Occurs only in the .cvis and .cvirs commands (see Type Conversion). Warning -- this routine has been neglected in some versions of the file errordefs.jam.
Programming and Use of JaM
Because JaM is quite different from other programming languages, it is appropriate to give hints on how to program and put JaM to good use.
The JaM input language has very little syntax. This attribute is both good and bad. The lack of syntax is good because a uniform representation is attained. The lack of syntax is bad because the code is hopelessly unreadable. In JaM it is almost true that any line of code can do anything (given the appropriate redefinitions of identifiers). Because of this property of JaM code, it is desirable to do several things when programming. First, document each function as it is written (use /def and /xdef under Dictionary Related Commands). Second, use naming conventions for functions that all belong to a given class (for example, intrinsic commands are started with .''). The latter convention will allow a person reading the code to tell what category a function is in by its name and may help him avoid accidentally redefining intrinsic functions.
Since JaM is a stack oriented machine, the user must mentally keep track of the contents of the stack. It is easier to program JaM if each routine performs only one function and is very short. Normally a JaM routine should be at most one or two lines long. For example, suppose we wish to use the absolute value of a given number. Then, rather than introduce the code in line, it is desirable to write an "abs" function and then use that function e.g.
(abs)(.dup 0 .lt (.neg) .cvx .if) .cvx .def
Also if complicated parameters are being passed to JaM control statements, it is better to name the parameters rather than have lines and lines full of parentheses.