IODoc.tioga
Copyright Ó 1987, 1991 by Xerox Corporation. All rights reserved.
Written by: Mark R. Brown
Last Edited by: MBrown.pa on December 12, 1983
Last Edited by: Ed Fiala, May 18, 1984
Last Edited by: Subhana, May 30, 1984 3:56:51 pm PDT
Willie-Sue, November 25, 1987 4:12:47 pm PST
Rick Beach, March 13, 1987 4:21:19 pm PST
Michael Plass, August 5, 1991 5:37 pm PDT
Doug Wyatt, August 9, 1991 6:15 pm PDT
Willie-s, September 2, 1992 3:31 pm PDT
THE IO AND CONVERT INTERFACES
THE IO AND CONVERT INTERFACES
THE IO AND CONVERT INTERFACES
CEDAR 7.0 — FOR INTERNAL XEROX USE ONLY
CEDAR 7.0 — FOR INTERNAL XEROX USE ONLY
CEDAR 7.0 — FOR INTERNAL XEROX USE ONLY
The IO and Convert Interfaces
Byte stream I/O and I/O conversions in Cedar
Mark R. Brown
© Copyright 1984, 1987, 1991 Xerox Corporation. All rights reserved.
Abstract: The IO interface implements a byte stream abstraction with a variety of friendly services.
Created by: Mark R. Brown
Maintained by: CedarSupport^.pa
Keywords: Cedar language, conversions, input/output, ROPE, stream
XEROX   Xerox Corporation
    Palo Alto Research Center
    3333 Coyote Hill Road
    Palo Alto, California 94304

For Internal Xerox Use Only
1. Introduction
Streams
A stream (an instance of type IO.STREAM) is a producer or consumer of a byte sequence. An input stream is a producer, an output stream is a consumer. Cedar's notion of stream is similar to that of Unix and many other systems. The stream abstraction is important for two related reasons.
The first reason is that most of the data storage and communication devices that are part of our computers can be modeled as streams. Disks and tapes store byte sequences, and the Ethernet transmits them. Streams do not represent the capabilities of any of these devices perfectly, and streams model several other devices even less perfectly: consider bitmap displays and unencoded keyboards. But each of these devices can be represented as a stream.
The second reason is that streams are an adequate form of communication for a significant number of applications. For example, many programs are organized in a "read - eval - print" structure. The program parses an input stream and produces values (which are probably not simply character strings). These values are manipulated in some way, and then they are printed (i.e. converted to character strings and sent to the output stream). Such a program can be structured as a procedure taking an input stream and an output stream as its parameters, and is oblivious to the source of its input and the destination of its output. As long as one application can read the printed output of a second application, the two can be coupled in this way and avoid the difficulty of agreeing upon a single internal data format. Thus the stream abstraction forms a "lowest common denominator" for communication with applications and devices.
Cedar makes nearly all devices available as streams. If a device has capabilities that are not adequately modelled by a stream, its streams simply implement some extra procedures that apply only to streams for that device (or for a set of similar devices). For instance, a stream for reading a disk file implements a SetIndex operation that reflects the random-access capability of the disk. By presenting a file as an upward-compatible extension of a stream, the same basic operations can be used by programs that use files as streams and programs that use files as random-access devices.
We call each stream implementation a stream class; a stream class is named by an ATOM. For example, a stream on a file has class $File, while a stream on a ROPE has class $ROPE. For emphasis we sometimes call a particular stream a stream instance. Each stream instance of a particular class and "direction" (input or output) implements the same set of procedures.
Interfaces involving streams
IO is a large interface that contains three major components in addition to the generic operations on streams: procedures for creating streams from a few important data types, procedures for reading from a stream and performing input conversion, and procedures performing output conversion and writing the results to a stream. The stream creation procedure for a typical stream class is defined in an interface associated with the class, not in IO; the EditedStream and IOClasses interfaces define some useful stream classes. IOUtils contains procedures that are only of interest to the implementor of an unusually complex stream class.
Why is the IO interface is so large? The usual rule is "one abstraction, one interface", and by this rule the IO interface would only contain the the generic operations on streams. Cedar has chosen to provide many stream procedures in a single large interface as a convenience to programmers: most of the commonly-used procedures that deal in streams can be found in IO, and these procedures can often be called using object notation because they are part of IO.STREAM's cluster.
The IO and IOUtils interfaces contains a certain amount of documentation, but this is structured for use while programming and does not attempt to explain underlying concepts or give full detail. The EditedStream and IOClasses interfaces contain full documentation.
The Convert interface provides conversions from character strings into Cedar values (numbers, times, etc.) and from Cedar values into character strings. The Convert interface does not really involve streams, but it is often used in conjunction with streams, so we document it here.
The remainder of this document is organized as follows:
Section 2 gives an informal semantics of the procedures that apply to all streams. Everyone should read this section once; later, you can use it for reference when you need to recall exactly what UnsafeGetBlock does.
Section 3 describes the stream-creation procedures that are part of the IO interface, and points to the documentation for several others: file streams, viewer streams, and the like. If you want to create a stream for reading from a ROPE, look in Section 3.
Section 4 describes the Convert interface. To turn a ROPE into an INT or vice-versa, consult Section 4.
Section 5 describes facilities for formatted output: the PutF and Put procedures. These procedures convert Cedar values into byte strings and transmit them to an output stream; they are an ordinary stream client, but so commonly used that they deserve to be included in the IO interface.
Section 6 describes facilities for formatted input. These procedures read from an input stream and construct Cedar values. Just as in the PutF case, they are ordinary stream clients but are included in IO for their ubiquity.
Section 7 describes how to implement a stream class. This can be surprisingly simple: if you are not too concerned about performance, you can let many of the stream procedures assume default values.
Section 8 contains miscellaneous observations on topics like synchronization of stream access, efficiency of stream procedures, and the relationship between streams, files and ropes. It also describes known shortfalls in the IO interfaces, in the implementation, and in this document.
See Section 3.1.1 of the Cedar Language Reference Manual for a description of the notation used for grammars below.
2. Generic stream procedures
Stream varieties
The fourteen generic stream procedures fall into three groups:
Input: GetChar, GetBlock, UnsafeGetBlock, EndOf, Backup, PeekChar, CharsAvail
Output: PutChar, PutBlock, UnsafePutBlock, Flush, EraseChar
Control: Reset, Close
There are three stream varieties: input stream, output stream, and input/output stream. All streams implement the Control procedures, but other procedures are implemented or not as a function of the stream variety. The Cedar type system does not enforce this stream taxonomy; a stream class implementor is responsible for documenting the variety of stream he is implementing, and for providing all of the necessary procedures.
An input stream implements the Input procedures and none of the Output procedures. Informally, an input stream produces a sequence of bytes (returned by calls to GetChar, GetBlock, and UnsafeGetBlock). The end of the byte sequence is signalled by (1) EndOf returning TRUE, (2) GetChar raising EndOfStream, or (3) GetBlock or UnsafeGetBlock returning zero bytes when a nonzero number of bytes was requested. Backup and PeekChar provide a look-ahead capability for use by scanners. CharsAvail allows the caller to estimate the real-time response of the stream to later requests. Reset skips to the end of the input sequence, and Close releases resources that were reserved by the stream (such as a file opened for reading).
An output stream implements the Output procedures and none of the Input procedures. Informally, an output stream consumes a sequence of bytes (passed in calls to PutChar, PutBlock, and UnsafePutBlock). Flush forces the output bytes to be transmitted immediately to their destination. EraseChar "erases" the most-recently written character (most useful for display-like output devices). Reset has no uniformly-defined effect, while Close terminates the output sequence and releases resources that were reserved by the stream (such as a file opened for writing).
An input/output stream implements both the Input and Output procedures. It is not clear that input/output streams are a good idea, and this stream variety may be eliminated in a later version of IO. At present there are two input/output stream classes: read/write file streams, and Pup byte streams. These two have little in common: for file streams, the input and output halves are closely coupled (there is a single "stream index" into the file), while the input and output halves of a Pup byte stream are largely independent. In both cases, the semantics of the input and output halves are similar, and in both cases it seems to make sense for the Control procedures to operate on both halves simultaneously. In contrast, creating a typescript viewer produces two streams, one input stream and one output stream, because the the keyboard and the display have very different properties.
Exceptions
EndOfStream: ERROR [stream: STREAM]
Error: ERROR [ec: ErrorCode, stream: STREAM]
ErrorCode: TYPE = { ..., NotImplementedForThisStream, StreamClosed, Failure, ... }
Only two errors are raised by generic stream procedures: EndOfStream and Error. Error passes an element of an enumerated type to describe the specific error. Each stream class maps its errors into this fixed set. Clearly this set cannot truly describe the totality of errors that may arise; Error[$Failure, stream] is raised for any error that cannot be represented explicitly. If a client needs more information on the particular error, it must call a class-specific procedure to get the information. The complete error information can be stored in the stream, which is a parameter to Error.
Input
A simple model of an input stream is that the stream contains a sequence of bytes, the input sequence, as part of its state. GetChar removes the first byte from the input sequence, Backup adds a byte to the front of the sequence, and EndOf tests the sequence for emptiness. GetBlock, UnsafeGetBlock, and PeekChar are expressed in terms of GetChar and Backup.
The validity of this model is not diminished by the fact that for several stream classes, the entire input sequence for a stream is not known at the time a stream is created. The stream must behave, except for performance, as though the input sequence were known in advance and only modified by GetChar and Backup. In particular, once the input sequence is empty (as signified by stream.GetChar[] raising EndOfStream), it remains empty unless stream.Backup[char] is performed. The CharsAvail procedure provides information on how much of the input sequence is actually known to a stream.
GetChar: PROC [self: STREAM] RETURNS [CHAR]
Raises ERROR IO.EndOfStream[self] if the input sequence is empty; otherwise consumes and returns the next byte in the input sequence.
Backup: PROC [self: STREAM, char: CHAR]
Makes char the "next byte" of the input sequence. That is, after self.Backup[char], self.GetChar[] = char, and the sequence of bytes represented by self.GetChar[] equals the sequence of bytes represented by self (before the Backup call). If char is not the byte produced by the last self.GetChar[], then ERROR IO.Error[$IllegalPutBack, self] may optionally be raised. (The only reason for the char parameter is to allow a default implementation of Backup, for stream classes that choose not to implement Backup directly.) If a large number (thousands) of Backup calls are made without corresponding GetChar calls, then ERROR IO.Error[$BufferOverflow, self] may optionally be raised.
EndOf: PROC [self: STREAM] RETURNS [BOOL]
Returns TRUE if and only if calling self.GetChar[] raises ERROR IO.EndOfStream[self].
UnsafeGetBlock: UNSAFE PROC [self: STREAM, block: IO.UnsafeBlock] RETURNS [nBytesRead: INT]
IO.UnsafeBlock is equal to Basics.UnsafeBlock, declared as TYPE = RECORD [base: LONG POINTER TO PACKED ARRAY [0..0) OF CHARNIL, startIndex: INT ← 0, count: INT ← 0].
Is equivalent to (but for many stream classes is faster than)
IF block.startIndex < 0 OR block.count < 0 THEN
ERROR RuntimeError.BoundsFault;
nBytesRead ← 0;
FOR i: INT IN [0 .. block.count) DO
block.base^[block.startIndex+i] ← self.GetChar[ ! EndOfStream => EXIT];
nBytesRead ← nBytesRead + 1;
ENDLOOP;
RETURN[nBytesRead];
GetBlock: PROC [self: STREAM, block: REF TEXT, startIndex: NAT ← 0, count: NATNAT.LAST]
RETURNS [nBytesRead: NAT]
Is equivalent to (but for many stream classes is faster than)
nBytesRead ← self.UnsafeGetBlock[[
base: LOOPHOLE[block,LONG POINTER]+TEXT[0].SIZE,
startIndex: startIndex,
count: MAX[MIN[INT[count], INT[block.maxLength]-startIndex], 0] ]];
block.length ← startIndex + nBytesRead;
RETURN[nBytesRead];
PeekChar: PROC [self: STREAM] RETURNS [char: CHAR]
Is equivalent to
c: CHAR = self.GetChar[];
self.PutBack[c];
RETURN [c];
CharsAvail: PROC [self: STREAM, wait: BOOLFALSE] RETURNS [INT]
Returns an estimate of the number of characters from the input sequence that can be delivered very quickly (without waiting for user input or for network transmission.) The end of the input sequence counts as a "character" for the purposes of CharsAvail. A stream class that never waits may return a large constant value (INT.LAST), while a stream class that does wait should either return an exact value or an underestimate: it is acceptable to return 1 representing "some bytes are available or end of input sequence" and 0 representing "no bytes are available". If wait, then does not return until a nonzero value can be returned. CharsAvail allows a single process to consume a byte stream efficiently, without being suspended for reading past the available input.
Output
A simple model of an output stream is that the stream connects to a sequential output device of some kind. PutChar transmits a single byte to this device. PutBlock and UnsafePutBlock are expressed in terms of PutChar.
In practice, of course, an output stream may contain buffering. The Flush procedure forces a stream to flush its buffers to the underlying device. Some important sequential output devices allow PutChar to be "undone": erase the last character painted to the display, truncate the last character appended to a file. The EraseChar procedure provides this function for devices that support it, and has a simple default behavior for devices that do not.
PutChar: PROC [self: STREAM, char: CHAR]
Appends the given byte to the output sequence.
UnsafePutBlock: PROC [self: STREAM, block: IO.UnsafeBlock]
Is equivalent to (but for many stream classes is faster than)
IF block.startIndex < 0 OR block.count < 0 THEN
ERROR RuntimeError.BoundsFault;
FOR i: INT IN [block.startIndex .. block.startIndex+block.count) DO
self.PutChar[block.base^[i]];
ENDLOOP;
PutBlock: PROC [self: STREAM, block: REF READONLY TEXT,
startIndex: NAT ← 0, count: NATNAT.LAST]
Is equivalent to
stopIndexPlusOne: INTINT[startIndex]+count;
IF stopIndexPlusOne > block.maxLength THEN stopIndexPlusOneblock.length;
self.UnsafePutBlock[[
base: LOOPHOLE[block,LONG POINTER]+TEXT[0].SIZE,
startIndex: startIndex,
count: MAX[stopIndexPlusOne-startIndex, 0] ]]
Flush: PROC [self: STREAM]
Transmits all stream writes that have taken place since self was created, or the preceding Flush was done on self. The meaning of "transmits" depends upon the class of the stream self, and even upon the instance self. For instance, Flush applied to a file stream generally means to record all stream writes stably on disk storage. If the underlying file system supports transactions, and self was created with a specific option, then Flush commits the current transaction and creates a new one.
EraseChar: PROC [self: STREAM, char: CHAR]
If self is a stream on a display, erases the last character written to the display, which must have been char. If self is not a display stream, but is a stream of some class whose PutChar can be undone (for instance, a stream appending to a file), then undoes the last PutChar. Otherwise, performs
self.PutChar['\\]; self.PutChar[char]
Control
Close: PROC [self: STREAM, abort: BOOLFALSE]
The detailed effect of Close is stream class dependent, but one effect is common to all streams: it makes the stream unusable for further procedures. All procedures on self other than Flush, Reset, and Close will raise ERROR IO.Error[$StreamClosed, self].
The abort parameter has the following meaning for file output streams: if abort, do not Flush the stream as part of the Close. If the underlying file system supports transactions, and self was created with a specific option, then abort the current transaction.
Reset: PROC [self: STREAM]
For input streams, discard all data from the input sequence that was generated at a time earlier than the call to Reset. For "real-time" input streams, such as keyboard and network, this means to discard the buffered input. For input streams whose input sequence is determined at stream creation time, discard the entire input sequence (so that self.EndOf[] = TRUE after the Reset.) For output and input/output streams, some class-specific effect. For instance, reset on a display stream might clear the display; reset on a file output stream might reposition the stream to the beginning and truncate the file to zero length.
Other
General information
GetInfo: PROC [self: STREAM] RETURNS [variety: StreamVariety, name: ATOM];
Returns the stream's variety (input, output, inputOutput) and its class name ($File, $ROPE, $TEXT, etc.)
Stream property list
A stream instance has an associated property list containing stream-specific information that is orthogonal to the stream's class. The property list is a public component of the stream; if s is stream, s.propList is its property list. The usage of stream property lists is established by convention among the stream applications. For instance, the "print formatted" package uses the key $SetPFCode on the property list to associate a "print formatted context" with a stream, and file and viewer streams use the key $Name to store a rope, the name of the file or viewer connected to the stream.
The following three procedures are provided by IOUtils as a convenience:
IOUtils.StoreData: PROC [self: STREAM, key: ATOM, data: REF ANY]
Is equivalent to
self.propList ← Atom.PutPropOnList[self.propList, key, data]
IOUtils.LookupData: PROC [self: STREAM, key: ATOM] RETURNS [REF ANY]
Is equivalent to
RETURN[Atom.GetPropFromList[self.propList, key]]
IOUtils.RemoveData: PROC [self: STREAM, key: ATOM]
Is equivalent to
self.propList ← Atom.RemPropFromList[self.propList, key]
3. Stream classes
File (class $File)
See the FS interface documentation for basic information on files. FS.StreamOpen creates a stream, given the string name of a file; FS.StreamFromOpenFile creates a stream, given an FS.OpenFile.
The FS interface documentation describes the detailed semantics of generic stream operations as they apply to files; we provide a summary here. A file is a sequence of bytes (numbered from zero), and a stream on a file defines a stream index into this sequence. GetChar on an input stream raises EndOfStream when the stream index equals the file length, and otherwise returns the file byte at the stream index and advances it one byte. PutChar on an output stream extends the file by one byte when the stream index equals the file length, and otherwise replaces the file byte at the stream index and advances it one byte. An input/output file stream has a single stream index, so GetChar and PutChar interact. Several input streams may be opened on a single file. It is also possible to open an output stream and several input streams on a single file (if this works out, we may get rid of input/output file streams).
The following operations apply to file streams (and streams for other file-like devices):
GetIndex: PROC [self: STREAM] RETURNS [index: INT]
Returns the stream index.
SetIndex: PROC [self: STREAM, index: INT]
Sets the stream index. Raises EndOfStream if index > number of bytes in the file; does not extend the file.
GetLength: PROC [self: STREAM] RETURNS [length: INT];
Returns the number of bytes in the file.
SetLength: PROC [self: STREAM, length: INT];
Sets the number of bytes in the file. Then sets the stream index to be MIN[stream index, length]. The contents of the file bytes IN [previous file length .. length) are undefined. Does not apply to input streams.
ROPE (class $ROPE)
See the Rope interface documentation for basic information on ROPE.
RIS: PROC [rope: ROPE, oldStream: STREAMNIL] RETURNS [stream: STREAM];
The ROPE input stream behaves much like a file input stream, but gets characters from the input ROPE instead of from a file. GetIndex, SetIndex, and GetLength apply to a ROPE input stream. Reset is equivalent to self.SetIndex[self.GetLength[]]. If oldStream # NIL then RIS attempts to re-use it and avoid storage allocation expense.
ROS: PROC [oldStream: STREAMNIL] RETURNS [stream: STREAM];
A ROPE is immutable, so it does not make sense to open an output stream on a particular ROPE. Instead, ROS creates an output stream that buffers the output sequence and makes it available to the client via RopeFromROS. Reset causes the output sequence to be discarded. GetLength and GetIndex apply to a ROPE output stream. If oldStream # NIL then ROS attempts to re-use it and avoid storage allocation expense.
It is a good idea to close a ROPE output stream when it is no longer of use. The default call to RopeFromROS takes care of this, so the client needs to pay special attention only when RopeFromROS is being called repeatedly on the same stream.
The following operation applies to ROPE output streams:
RopeFromROS: PROC [self: STREAM, close: BOOLTRUE] RETURNS [ROPE];
Returns the entire output sequence as a rope. If close, then call self.Close[] before returning.
REF TEXT (class $TEXT)
TIS: PROC [text: REF READONLY TEXT, oldStream: STREAMNIL] RETURNS [stream: STREAM]
The TEXT input stream behaves much like a file input stream, but gets characters from the input REF TEXT instead of from a file. GetIndex, SetIndex, and GetLength apply to a TEXT input stream. Reset is equivalent to self.SetIndex[self.GetLength[]]. If oldStream # NIL then TIS attempts to re-use it and avoid storage allocation expense.
TOS: PROC [text: REF TEXTNIL, oldStream: STREAMNIL] RETURNS [stream: STREAM];
If text = NIL, set text ← a newly-allocated TEXT; set text.length ← 0. The TEXT output stream appends characters to text using RefText.AppendChar. Hence if text overflows, a larger text will be allocated, up to the limit on TEXT length (32k bytes). When this limit is reached, PutChar raises Error[$BufferOverflow, self]. The TEXT buffer is available at any time via TextFromTOS. Reset causes the output sequence, but not the buffer, to be discarded. GetLength and GetIndex apply to a TEXT output stream. If oldStream # NIL then TOS attempts to re-use it and avoid storage allocation expense.
The following operation applies to TEXT output streams:
TextFromTOS: PROC [self: STREAM] RETURNS [REF TEXT]
Returns the entire output sequence as a REF TEXT. Does not close the stream, so may be called repeatedly, but same REF TEXT may be returned several times (and will be modified if PutChar is called).
Typescript Viewers (classes $ViewersInput, $ViewersOutput, $EditedViewer)
See the ViewerIO interface documentation. ViewerIO.CreateViewerStreams creates a pair of streams, one for keyboard input and one for display output, coupled so that input is echoed as it is read.
Edited Input (class $Edited)
See the documentation contained in the EditedStream interface. An edited input stream provides simple interactive editing of an underlying input stream. Typically the editing is teletype-style (BS, ^W), but features such as default values and input completion can be implemented by the edited stream client.
Interactive editing is also provided by editable typescripts, and these are generally preferred to edited streams for applications that "live above" Tioga.
Pipe (class $Pipe)
IOClasses.CreatePipe[bufferByteCount] creates a pipe, that is, an output stream push and an input stream pull such that the byte sequence written to push can be read from pull. Closing push terminates the sequence. The parameter bufferByteCount specifies the (approximate) maximum number of bytes that can be written to push and not yet read from pull at any instant of time. Typically, one process does the pushing and another process does the pulling.
Concatenated Input (class $Concatenated)
IOClasses.CreateCatInputStream[input1, input2] creates an input stream s such that the byte sequence produced by s is the byte sequence produced by input1, followed by the byte sequence produced by input2. For instance, CreateCatInputStream[IO.RIS[rope1], IO.RIS[rope2]] is equivalent to IO.RIS[rope1.Cat[rope2]].
Dribble Output (class $Dribble)
IOClasses.CreateDribbleOutputStream[output1, output2] creates an output stream s such that each operation performed on s will be performed in turn on output1 and output2. For instance, s.PutChar[c] is equivalent to { output1.PutChar[c]; output2.PutChar[c] }.
Comment-filtered Input (class $CommentFilter)
IOClasses.CreateCommentFilterStream[stream] creates an input stream s such that comments contained in stream are filtered out. The comments in stream must follow Cedar syntax, and stream must also follow Cedar conventions for string and character literals. For more detail see documentation in the IOClasses interface.
Message Window Output (class $MessageWindow)
See the ViewerIO interface documentation. ViewerIO.CreateMessageWindowStream creates an output stream to the Viewers message window.
Null (class $Null)
noWhereStream: STREAM;
An output stream that simply discards its characters.
noInputStream: STREAM;
An input stream such that GetChar raises EndOfStream, EndOf returns TRUE, and CharsAvail returns INT.LAST.
4. Convert
This interface contains two sections: procedures for parsing character strings to produce Cedar values (INT, REAL, BasicTime.GMT, and so on), and procedures for printing Cedar values to produce character strings.
Procedures in the Convert interface raise a single error:
Error: ERROR [reason: ErrorType, index: INT];
ErrorType: TYPE = {
syntax, overflow, empty, -- parse errors, index gives error location
invalidBase, unprintableAtom -- print errors, index = 0
};
Parsing
Each procedure in this section takes r: ROPE as a parameter, but is perfectly happy to have the client use RefText.TrustTextAsRope to loophole a REF TEXT into ROPE to pass in. This is ok because these procedures do not retain the ROPE or test its type at runtime.
Each procedure in this section raises the following errors:
! Error[$empty, 0] (the input string is empty or contains only whitespace characters; the client might wish to supply a default value in this case)
! Error[$syntax, index] (syntax error detected in the input string at the given location)
Each "CardFrom..." or "IntFrom" or "TimeFrom" or "CharFrom" proc raises the following error:
! Error[$overflow, index] (input string cannot be expressed as a CARD or INT or GMT or CHAR)
In general, the parsing procedures do not consume the entire input rope, but are willing to stop by reaching the end of the input rope or by seeing whitespace after a valid token.
IntFromRope: PROC [r: ROPE, defaultBase: Base ← 10] RETURNS [INT];
Base: TYPE = [2..36];
If r (or a prefix of r) consists of whitespaceChar...?(+|-)num?whitespaceChar where num is digit!... and digit is a numeric or alphabetic character that is meaningful in the default base, then convert r according to this base. If r contains a character that is not meaningful in the default base (e.g. 'H in a "decimal" literal), then accept whitespaceChar...?(+|-)wholeNum where wholeNum is a string acceptable to CardFromWholeNumberLiteral below.
CardFromRope: PROC [r: ROPE, defaultBase: Base ← 10] RETURNS [LONG CARDINAL];
Like IntFromRope, but leading minus is not allowed.
RealFromRope: PROC [r: ROPE] RETURNS [REAL];
Accepts whitespaceChar...?(+|-)realNum where realNum is a string acceptable to RealFromLiteral below.
TimeFromRope: PROC [r: ROPE] RETURNS [BasicTime.GMT];
Accepts time in a wide variety of formats (same formats as IO.GetTime).
UnpackedTimeFromRope: PROC [r: ROPE] RETURNS [BasicTime.Unpacked];
Accepts time in a wide variety of formats (same formats as IO.GetTime).
BoolFromRope: PROC [r: ROPE] RETURNS [BOOL];
Accepts if r (or a prefix of r delimited by white space) matches (ignoring case) a prefix of "true", yes" (returning TRUE), "false", or "no" (returning FALSE).
AtomFromRope: PROC [r: ROPE] RETURNS [ATOM];
Derives a print name by eliminating the leading '$ of r (if any), then making an atom with that print name.
The following procedures accept the same literal formats accepted by the compiler. This means, for instance, that the octal and hex routines demand that a radix character ('B or 'b for octal, 'H or 'h for hex) be present. Numeric literals are always non-negative, and these procedures do not recognize a leading plus or minus sign. The numeric procedures are willing to skip over leading white space, and will terminate the literal on end of string or on trailing white space.
CardFromDecimalLiteral: PROC [r: ROPE, start: INT ← 0] RETURNS [LONG CARDINAL];
Accepts whitespaceChar...num?((D|d)?num)?whitespaceChar
CardFromOctalLiteral: PROC [r: ROPE, start: INT ← 0] RETURNS [LONG CARDINAL];
Accepts whitespaceChar...num(B|b)?num?whitespaceChar
CardFromHexLiteral: PROC [r: ROPE, start: INT ← 0] RETURNS [LONG CARDINAL];
Accepts whitespaceChar...num(H|h)?num?whitespaceChar
CardFromWholeNumberLiteral: PROC [r: ROPE, start: INT ← 0] RETURNS [LONG CARDINAL];
Accepts any of the above formats, but is less efficient than any of the above procedures.
RealFromLiteral: PROC [r: ROPE, start: INT ← 0] RETURNS [REAL];
Accepts both whitespaceChar...num exponent?whitespaceChar and whitespaceChar... ?num.num?exponent?whitespaceChar, where exponent is (E|e)?(+|-)num
RopeFromLiteral: PROC [r: ROPE, start: INT ← 0] RETURNS [ROPE];
Accepts "extendedChar..."?(L|l)
CharFromLiteral: PROC [r: ROPE, start: INT ← 0] RETURNS [CHAR];
Accepts both 'extendedChar and digit!...(C|c)
Printing
There are two parallel sets of procs for printing. A proc in the first set returns a rope, while the proc in the second set appends its result to an existing REF TEXT (usually avoiding additional storage allocation) and returns a REF TEXT (usually but not always the same REF as the original).
Each "RopeFrom..." proc raises the following error:
! BoundsFault (length of result rope would exceed NAT.LAST characters)
Each "Append..." proc raises the following errors:
! PointerFault (to = NIL)
! BoundsFault (length of result text would exceed NAT.LAST characters)
Other errors are identical for the two variants of each procedure.
RopeFromInt: PROC [from: INT, base: Base ← 10, showRadix: BOOLTRUE]
RETURNS [Rope.Text];
! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not = 8, 10, or 16.
If showRadix = TRUE, returns a rope including a radix character ('B or 'H, not 'D) to make the result a valid Cedar literal (perhaps with a unary minus operator in front).
RopeFromCard: PROC [from: LONG CARDINAL, base: Base ← 10, showRadix: BOOLTRUE]
RETURNS [Rope.Text];
! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not = 8, 10, or 16.
If showRadix = TRUE, returns a rope including a radix character ('B or 'H, not 'D) to make the result a valid Cedar literal.
RopeFromReal: PROC [from: REAL,
precision: RealPrecision ← Real.DefaultSinglePrecision, useE: BOOLFALSE]
RETURNS [Rope.Text];
If useE, prints in "scientific notation". <I cannot explain how "precision" is interpreted, yet, because I do not know. Something to do with number of digits printed ...>
RopeFromTime: PROC [from: BasicTime.GMT,
start: TimePrecision ← years, end: TimePrecision ← minutes,
includeDayOfWeek: BOOLFALSE, useAMPM: BOOLTRUE, includeZone: BOOLTRUE]
RETURNS [Rope.Text];
TimePrecision: TYPE = { years, months, days, hours, minutes, seconds, unspecified };
Returns a rope representing a time, for instance "September 16, 1983 7:50 pm PDT". If includeDayOfWeek then prefixes this with "Friday, ". If NOT useAMPM then renders the time of day as "19:50" instead of "7:50 pm". If NOT includeZone then omits the " PDT". If end = $seconds then includes more precision, as in "7:50:05 pm". <As of October 5, 1983, time zone printing is not implemented.>
To obtain the weekday only, pass includeDayOfWeek = TRUE and pass an empty [start .. end] interval (e.g. start = $months, end = $years). To obtain the month only, pass includeDayOfWeek = FALSE and start = end = $months.
RopeFromUnpackedTime: PROC [from: BasicTime.Unpacked,
start: TimePrecision ← years, end: TimePrecision ← minutes,
includeDayOfWeek: BOOLFALSE, useAMPM: BOOLTRUE, includeZone: BOOLTRUE]
RETURNS [Rope.Text];
Returns a rope representing a time, in the same format as RopeFromTime, but from an unpacked time.
RopeFromBool: PROC [from: BOOL] RETURNS [Rope.Text];
Returns "TRUE" or "FALSE".
RopeFromAtom: PROC [from: ATOM, quote: BOOLTRUE] RETURNS [ROPE];
! Error[$unprintableAtom, 0]: quote = TRUE, and either from = NIL or from's print name is not a valid Cedar identifier.
Return's atom's print name ("<NIL>" is generated for from = NIL). If quote then inserts a leading quote ($) and checks that the print name is a valid Cedar identifier.
RopeFromRope: PROC [from: ROPE, quote: BOOLTRUE] RETURNS [Rope.Text];
Returns a rope that represents "from" as a literal acceptable to the compiler, i.e. inserts backslash escapes as necessary. If quote then adds an outer pair of quotes ("").
RopeFromChar: PROC [from: CHAR, quote: BOOLTRUE] RETURNS [Rope.Text];
Returns a rope that represents "from" as a literal acceptable to the compiler, i.e. adds backslash escape as necessary. If quote then inserts a leading quote (').
AppendInt: PROC [to: REF TEXT, from: INT, base: Base ← 10, showRadix: BOOLTRUE]
RETURNS [REF TEXT];
! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not = 8, 10, or 16.
AppendCard: PROC [to: REF TEXT, from: LONG CARDINAL, base: Base ← 10, showRadix: BOOLTRUE]
RETURNS [REF TEXT];
! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not 8, 10, or 16.
AppendReal: PROC [to: REF TEXT, from: REAL,
precision: RealPrecision ← Real.DefaultSinglePrecision, useE: BOOLFALSE]
RETURNS [REF TEXT];
AppendTime: PROC [to: REF TEXT, from: BasicTime.GMT,
start: TimePrecision ← years, end: TimePrecision ← minutes,
includeDayOfWeek: BOOLFALSE, useAMPM: BOOLTRUE, includeZone: BOOLTRUE]
RETURNS [REF TEXT];
AppendUnpackedTime: PROC [to: REF TEXT, from: BasicTime.Unpacked,
start: TimePrecision ← years, end: TimePrecision ← minutes,
includeDayOfWeek: BOOLFALSE, useAMPM: BOOLTRUE, includeZone: BOOLTRUE]
RETURNS [REF TEXT];
AppendBool: PROC [to: REF TEXT, from: BOOL] RETURNS [REF TEXT];
AppendAtom: PROC [to: REF TEXT, from: ATOM, quote: BOOLTRUE] RETURNS [REF TEXT];
! Error[$unprintableAtom, 0]: quote = TRUE, and either from = NIL or from's print name is not a valid Cedar identifier.
AppendRope: PROC [to: REF TEXT, from: ROPE, quote: BOOLTRUE]
RETURNS [REF TEXT];
AppendChar: PROC [to: REF TEXT, from: CHAR, quote: BOOLTRUE]
RETURNS [REF TEXT];
5. Printing (output conversion)
PutF
Introduction
The PutF package provides a convenient means of printing Cedar values. The package is similar in concept and spirit to both the Unix (tm) procedure PrintF and the FORTRAN format interpreter.
Each call to PutF specifies a format (a ROPE containing text and conversion specifications) and a sequence of Cedar values. PutF replaces each conversion specification in the format with a printed representation of the corresponding Cedar value. The printed representation chosen for the value is a function of the conversion specification (it may specify field width, left-justification within the field, and so forth). The leftmost conversion specification is replaced by the (printed) first value, the second conversion specification from the left is replaced by the (printed) second value, and so on.
Basic Usage
The most commonly used procedures are PutF and PutFR:
PutF: PROC [stream: STREAM, format: ROPENIL, v1, v2, v3, v4, v5: Value ← [null[]]];
PutFR: PROC [format: ROPENIL, v1, v2, v3, v4, v5: Value ← [null[]]] RETURNS [ROPE];
The difference between the two procedues is in how the character sequence derived from the format and values is returned: PutF transmits it to an output stream, while PutFR creates a ROPE and returns it. (It might seem to make more sense to locate PutFR in the Convert interface, but its parameters and errors have so much in common with PutF that it belongs in IO.) The remainder of the discussion refers to PutF, but all comments apply to both procedures.
The format parameter is a ROPE in which the character '% is interpreted specially. Note that the backslash character ('\\) has no special significance within a format: backslash escape sequences are interpreted by the compiler.
In a format, '% introduces a conversion specification. A conversion specification is a conversion string (zero or more non-alphabetic characters), followed by a conversion code (a single alphabetic character). PutF replaces the '% and the conversion specification with a new string that is a function of the conversion code, the conversion string, and the value to be printed. To include '% in a format literally, write it twice, as in "100%%".
A large percentage of PutF usage can be satisfied with the conversion code 'g (analogous to G-format in FORTRAN.) A conversion string for this code has the syntax ?(0|-)fieldWidth, where fieldWidth is a number. If the value to be printed exceeds fieldWidth characters, it is printed in full; if it is smaller, it is printed right-justified in the field, with blank-filling on the left. A leading '- causes printing to be left-justified in the field; a leading '0 causes printing to be right-justified with 0-filling.
To pass a Cedar value to PutF, a client must convert it to type IO.Value. For each popular value type, the IO interface includes a procedure for converting values of that type, for instance IO.rope for ROPE and IO.int for INT. (Each procedure is an inline that simply constructs a variant record; writing the procedure call requires fewer brackets than writing the variant record constructor. While you might think that defining Value: TYPE = REF ANY would be superior to this scheme involving variant records, consider how clumsy and expensive it is to convert an INT, say, to a REF ANY.)
Examples:
IO.PutFR["This is %g in a 5 position field: |%5g|", IO.rope["an integer"], IO.int[17]] =
"This is an integer in a 5 position field: | 17|"
IO.PutFR["This is %g in a 5 position field: |%05g|", IO.rope["an integer"], IO.int[17]] =
"This is an integer in a 5 position field: |00017|"
IO.PutFR["This is %g in a 5 position field: |%-5g|", IO.rope["an integer"], IO.int[17]] =
"This is an integer in a 5 position field: |17 |"
Defaulting a Value to [null[]] causes the Value to be ignored (it is not matched against a conversion specification in the format). Defaulting the format to NIL is like supplying the format "%g%g...%g", with the number of "%g" repetitions equal to the number of non-null Values passed.
Other conversions
While %g-conversion handles 90% of all applications, on occasion it does not give the desired effect. For instance, %g prints INTs in decimal, while some applications call for octal or hex. In this section we describe the range of output formats that are possible with the standard conversion codes. We also describe the behavior of %g-conversion in more detail.
The conversion performed by PutF is a function of (1) the Value passed (both its type and value), (2) the conversion code, and (3) the conversion string. A Value.refAny is treated somewhat specially by PutF: if its value is a REF TEXT, REF BOOL, REF CHAR, REF CARDINAL, REF LONG CARDINAL, REF INTEGER, REF INT, REF NAT, or REF REAL, it is converted to a Value of the more specific type (Value.text, Value.boolean, etc.) by PutF. This changes the Value as seen by the conversion. Note that in some circumstances it is useful to LOOPHOLE a value into one of the specific Value types (e.g. one way to print a REF is to LOOPHOLE it into a LONG CARDINAL and pass it to IO.card.)
The standard conversion string has the form ?(0|-)fieldWidth?(.numDigits), where fieldWidth and numDigits are numbers. If the value to be printed exceeds fieldWidth characters, it is printed in full; if it is smaller, it is printed right-justified in its field, with blank-filling on the left. A leading '- causes printing to be left-justified; a leading '0 causes printing to be right-justified with 0-filling. numDigits controls the number of significant digits printed for floating-point numbers, and is ignored when printing other types.
The following conversions use the standard conversion string and apply to fixed-point numbers. A value of type INT or LONG CARDINAL is used directly. A value of type REAL is rounded to the nearest INT. A value of type BOOL is converted to LONG CARDINAL using the mapping TRUE -> 1, FALSE -> 0.
%b: Print number in octal, without trailing 'B.
%d: Print number in decimal.
%x: Print number in hex, without trailing 'H.
The following conversions use the standard conversion string and apply to floating-point numbers. A value of type REAL is used directly. A value of type INT or LONG CARDINAL is converted to REAL.
%e: Print number in scientific notation (mantissa and exponent). Analogous to FORTRAN E format.
%f: Print number in fixed-point notation. Analogous to FORTRAN F format.
The following conversions use the standard conversion string and apply to character strings. A value of type ROPE or REF TEXT is used directly. A value of type ATOM is converted into its PName, a ROPE. A value of type CHAR is converted into a one-character ROPE. A value of type BOOL is converted to a ROPE using the mapping TRUE -> "TRUE", FALSE -> "FALSE". A value of type INT or LONG CARDINAL is converted to a ROPE in decimal notation, and a value of type REAL is converted to a ROPE in fixed-point notation. A value of type BasicTime.GMT is converted to a ROPE in standard date format (produced by Convert.AppendTime). A value of type REF ANY is converted to a ROPE using AMIO.PrintRefAny (PrintProcs and AMTypes).
%g, %a, %c, %s: Print the string (four codes are assigned for historical reasons; %g is preferred).
%h: Print the string, but print control characters (char < 40C) as "^<char + '@>" (for instance, '\n prints as "^M").
The following conversion uses the standard conversion string and applies to fixed-point numbers. A value of type INT or LONG CARDINAL is used directly. A value of type REAL is rounded to the nearest INT.
%r: Print the number as a time interval in seconds, with format HH:MM:SS (exactly two digits, with zero fill, in the minutes and seconds fields; two or more digits in the hours field.)
The following conversion uses the standard conversion string and applies to time and fixed-point numbers. A value of type BasicTime.GMT is used directly. A number (INT, LONG CARDINAL, or REAL) is interpreted as a number of seconds after BasicTime.earliestGMT.
%t: Print the time in standard date format.
The following conversions ignore the conversion string and apply to ROPE only.
%l: For Viewer streams, change looks for subsequent output; for other streams, no effect. The rope is interpreted as a sequence of looks; a lower-case character means "add the look", an upper-case character means "remove the look", and space means "remove all looks". For instance, h.PutF["Make this %lbold%l for emphasis", IO.rope["b"], IO.rope["B]] will print "Make this bold for emphasis" on a Viewer stream.
%q: Print the literal representation of the rope, i.e. the form that would be coded in a program text to produce the given rope. For instance, the character CR in the rope is printed as backslash-n, and the character '" in the rope prints as backslash-". Does not print '" surrounding the rope, but these can be included in the format string if desired. For instance, IO.PutFR["%q", IO.rope["abc\ndef"]] = "abc\\\ndef", while IO.PutFR["%g", IO.rope["abc\ndef"]] = "abc\ndef"
A client with unusual needs can (1) write a PrintProc to change the way that REF ANYs are printed, or (2) write a conversion proc for PutF. See the section "Advanced Usage" below.
Exceptions
PutF attempts to keep going in the face of errors, printing whatever information is possible. When it encounters an error it prints "#####".
A client that needs more specific error reporting can get it. See the section "Advanced Usage" below.
Printing more than five values
PutF and PutFR are limited to printing five values in each call. The following procedures escape that limitation:
PutFL: PROC [stream: STREAM, format: ROPENIL, list: LIST OF Value];
PutFLR: PROC [format: ROPENIL, list: LIST OF Value] RETURNS [ROPE];
PutFL is seldom used because the same effect can be achieved by making a sequence of calls to PutF, and in most cases the format becomes unreadable when more than five values are printed in a single call.
Efficiency
A call to PutF is relatively expensive, largely due to the runtime interpretation of the format string. A call to Put (below) is considerably faster and more compact, but does not offer the same formatting options. If only one value needs to be printed, use PutF1 instead of PutF. PutFR is implemented using ROS and PutF.
Advanced Usage: Writing a conversion proc
The description above is written as if the interpretation of a string like "%g" were an immutable part of the PutF package. In fact the interpretation of conversion specifications is determined by information contained in the stream, and the conversions described above are simply the defaults provided by the standard stream classes.
Conversion is performed by a PFCodeProc:
IOUtils.PFCodeProc: TYPE = PROC [stream: STREAM, val: Value, format: Format, char: CHAR];
Format: TYPE = RECORD [form: ROPE, first: INT];
Here val is the Value to print according to conversion code char (always IN ['a .. 'z]), sending the results to stream. The format.form is the entire format rope (never empty, even if an empty format was passed to PutF), and format.first is the index of the first character of the conversion string (the character following the '%, which might be the conversion code).
A PFCodeProc raises IO.Error with ec IN [$PFFormatSyntaxError .. $PFUnprintableValue]. The codes are interpreted as follows:
$PFFormatSyntaxError: the PFCodeProc cannot interpret the chars between '% and the conversion code.
$PFTypeMismatch: the PFCodeProc is not prepared for the supplied Value type.
$PFUnprintableValue: the PFCodeProc is not able to print the supplied value (for instance, overflow occurred in real -> int conversion).
As a convenience, a PFCodeProc may also call PutF, which raises IO.Error with ec IN [$PFCantBindConversionProc .. $PFUnprintableValue]. The errors raised by PutF and not described above are:
$PFCantBindConversionProc: PutF could not find a PFCodeProc proc to call.
$PFFormatSyntaxError: in addition to the errors (described above) that are detected by a PFCodeProc, PutF detects errors such as too many or not enough conversion specifications in format string, and no conversion code following the initial '% of a conversion specification.
The set of PFCodeProcs associated with a stream is represented by a value of type PFProcs:
IOUtils.PFProcs: TYPE = REF PFProcsRecord; PFProcsRecord: TYPE;
This may be used in several ways. A stream class implementation may create a single PFProcs value and associate it with all stream instances of that class. A stream client may add PFCodeProcs to a particular stream, and if the stream is shared, restore the stream to its original state when finished.
In both cases the same coding pattern applies: first, the client calls CopyPFCodeProcs to obtain a new PFProcs value. (Passing stream = NIL to CopyPFCodeProcs obtains a copy of the global default PFProcs, useful during initialization of a stream class when no stream instances are in sight.) This PFProcs value is modified using SetPFCodeProc. Then the new PFProcs value is associated with the stream instance, using SetPFProcs.
IOUtils.CopyPFProcs: PROC [stream: STREAM] RETURNS [new: PFProcs];
IOUtils.SetPFCodeProc: PROC [pfProcs: PFProcs, char: CHAR, codeProc: PFCodeProc]
RETURNS [previous: PFCodeProc];
IOUtils.SetPFProcs: PROC [stream: STREAM, pfProcs: PFProcs] RETURNS [previous: PFProcs];
SetPFCodeProc raises IO.Error[$PFInvalidCode, NIL] if char is not IN ['A..'Z] or ['a..'z]., and raises IO.Error[$PFInvalidPFProcs, NIL] if pfProcs = NIL. (CopyPFProcs never returns NIL.)
When PutF requires a PFCodeProc, it interrogates its stream. (PutFR uses a rope output stream obtained by calling IO.ROS[], and interrogates that stream.) If a PFProcs was associated with the stream using SetPFProcs, this PFProcs is searched to find a PFCodeProc for the given character. Otherwise the global default PFProcs (which currently includes a PFCodeProc for 'a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'l, 'q, 'r, 's, 't, and 'x) is searched. If the search fails, PutF raises IO.Error[$PFCantBindConversionProc, stream].
Advanced Usage: Writing an error proc
PutF's behavior when it detects an error is determined by the PFErrorProc associated with the stream:
IOUtils.PFErrorProc: TYPE = PROC [error: ErrorCode, stream: STREAM];
The PFErrorProc is called when PutF detects an error, either an error noticed by PutF itself or an error by a PFCodeProc called from PutF. The error parameter to the PFErrorProc describes the error, and the stream parameter is the stream passed to PutF. The default PFErrorProc, IOUtils.PFErrorPrintPounds, performs stream.PutRope["#####"] and returns. Two other PFErrorProcs are predefined: IOUtils.PFErrorNoop just returns, and IOUtils.PFErrorError raises IO.Error[error, stream]. Other PFErrorProcs can be written.
Calling SetPFErrorProc associates a PFErrorProc with a PFProcs in a manner analogous to SetPFCodeProc:
IOUtils.SetPFErrorProc: PROC [pfProcs: PFProcs, errorProc: PFErrorProc]
RETURNS[previous: PFErrorProc];
Wizard Usage: Setting the global PrintRefAny proc and the global default PFProcs
For structural reasons, the IO package does not import code that depends on the abstract machine. Hence the implementation of PrintRefAny is quite basic: print the ref in octal. During system initialization the following procedure should be called by the implementor of PrintRefAny:
IOUtils.RegisterPrintRefAny: PROC [printRefAnyProc: PrintRefAnyProc];
PrintRefAnyProc: TYPE = PROC [stream: STREAM,
refAny: REF READONLY ANY, depth: INT, width: INT, verbose: BOOL];
The global default PF code and error procs may be modified, perhaps to plug-in procs that rely on the abstract machine. They should probably only be modified during system initialization.
IOUtils.SetDefaultPFCodeProc: PROC [char: CHAR, codeProc: PFCodeProc]
RETURNS [previous: PFCodeProc];
IOUtils.SetDefaultPFErrorProc: PROC [errorProc: PFErrorProc]
RETURNS[previous: PFErrorProc];
Put
The Put procedure provides a convenient means of printing Cedar values. It is both faster and less flexible than PutF, because it does not accept a format string in addition to the sequence of Cedar values to be printed.
Put: PROC [stream: STREAM, v1, v2, v3: Value ← [null[]]]
PutR: PROC [v1, v2, v3: Value ← [null[]]] RETURNS [ROPE]
PutL: PROC [stream: STREAM, list: LIST OF Value]
PutLR: PROC [list: LIST OF Value] RETURNS [ROPE]
Character and character string (ROPE and REF TEXT) values are simply transmitted to the stream. REF ANY values are converted by calling PrintRefAny[stream, refAny.value, 4, 32, FALSE]. Other values are converted to character strings using standard calls to Convert, then transmitted to the stream:
atom: Convert.RopeFromAtom[from: atom.value, quote: FALSE]
bool: Convert.RopeFromBool[bool.value]
card: Convert.RopeFromCard[card.value]
int: Convert.RopeFromInt[int.value]
real: Convert.RopeFromReal[real.value]
time: Convert.RopeFromTime[time.value]
(The implementation actually uses the "Append" variants of these Convert calls, to reduce allocations, but the effect is the same.)
6. Scanning
Introduction
Scanning is the process of dividing an input stream into character subsequences or tokens. These tokens may then be converted into values of specific type, e.g. numbers or atoms.
Scanning routines that return a result are generally provided in two versions. The first version takes a REF TEXT buffer as a parameter, and may return its result in this buffer, but will allocate a larger one if the buffer fills up. Hence if each token is smaller than the buffer, only a single allocation is required to scan a sequence of tokens. The second version returns its result in a ROPE; hence at least one byte of storage is allocated for each byte of token scanned. In either case, a token cannot exceed NAT.LAST bytes in length.
Here is a sample program fragment to illustrate one style of use for the REF TEXT version of a typical scanning procedure. Note that the value of buffer below is never changed, even though a call to GetCedarToken may return a different REF TEXT.
buffer: REF TEXT = RefText.ObtainScratch[RefText.line];
tokenKind: IO.TokenKind;
DO
tokenText: REF TEXT;
[tokenKind: tokenKind, token: tokenText] ← IO.GetCedarToken[stream, buffer];
SELECT tokenKind FROM
tokenEOF => EXIT;
tokenID => -- do something with tokenText -- ... ;
ENDCASE;
ENDLOOP;
RefText.ReleaseScratch[buffer];
Scanning according to Cedar syntax
GetCedarToken: PROC [stream: STREAM, buffer: REF TEXT, flushComments: BOOLTRUE]
RETURNS [tokenKind: TokenKind, token: REF TEXT, charsSkipped: INT, error: TokenError];
! (none)
Consumes chars from stream, looking for next Cedar token. Returns the kind of token found, the characters of the token, and the number of white space characters discarded before reaching the token. If flushComments then the characters of a comment are treated as white space. The error returned is # none only for tokenKind = tokenERROR.
GetCedarToken makes no attempt to detect range errors; these are detected when the result token is converted from text to a value. So, for instance, GetCedarTokenText will happily return the token "'\999" (but not "'\1111") as a tokenChar, "999B" as a tokenINT, and "9E999999999999999" as a tokenREAL.
TokenKind: TYPE = {
tokenERROR, -- token.error describes the scanning error
tokenID, -- an identifier or reserved word
tokenDECIMAL, -- a whole number literal expressed in decimal
tokenOCTAL, -- a whole number literal expressed in octal
tokenHEX, -- a whole number literal expressed in hexidecimal
tokenREAL, -- a REAL literal
tokenROPE, -- a ROPE, REF TEXT, or STRING literal
tokenCHAR, -- a CHAR literal
tokenATOM, -- an ATOM literal
tokenSINGLE, -- a single-character token, such as "("
tokenDOUBLE, -- a double-character token, such as "~="
tokenCOMMENT, -- a comment
tokenEOF -- the end-of-file marker
};
TokenError: TYPE = {
none, -- no error
extendedChar, -- error following backslash in char or string literal
numericLiteral, charLiteral, stringLiteral, atomLiteral, -- error in parsing indicated type
singleChar -- first non-whitespace char is not legal as first char of token
};
GetCedarTokenRope: PROC [stream: STREAM, flushComments: BOOL ← TRUE]
RETURNS [tokenKind: TokenKind, token: ROPE, charsSkipped: INT];
! EndOfStream
! Error[$SyntaxError]
Calls GetCedarToken. If token returned is tokenEOF or tokenERROR, raises an appropriate signal. Otherwise converts the token into a ROPE and returns it.
The following convenience procedures generally call GetCedarToken (with flushComments = TRUE), check that the tokenKind is as expected, and convert the token to a Cedar value of the indicated type. GetInt, GetCard, and GetReal accept an optional leading plus or minus sign. GetReal accepts REAL literals, but also accepts anything that GetInt or GetCard would accept, converting it to REAL. GetTime and GetUnpackedTime are not implemented as just described because there is no standard Cedar syntax for time; they simply consume whatever looks like a time.
! EndOfStream (tokenKind is tokenEOF)
! Error[$SyntaxError] (tokenKind is not the one expected, including tokenERROR)
! Error[$Overflow] (overflow in a conversion: GetInt, GetCard, GetCharLiteral, GetTime)
GetInt: PROC [stream: STREAM] RETURNS [INT];
GetCard: PROC [stream: STREAM] RETURNS [LONG CARDINAL];
GetReal: PROC [stream: STREAM] RETURNS [REAL];
GetBool: PROC [stream: STREAM] RETURNS [BOOL];
GetAtom: PROC [stream: STREAM] RETURNS [ATOM];
GetRopeLiteral: PROC [stream: STREAM] RETURNS [ROPE];
GetCharLiteral: PROC [stream: STREAM] RETURNS [CHAR];
GetID: PROC [stream: STREAM] RETURNS [ROPE];
GetTime: PROC [stream: STREAM] RETURNS [BasicTime.GMT];
GetUnpackedTime: PROC [stream: STREAM] RETURNS [BasicTime.Unpacked];
GetRefAny: PROC [stream: STREAM] RETURNS [REF ANY];
! EndOfStream
! Error[$SyntaxError]
Calls GetCedarToken (with flushComments = TRUE) to parse input stream, then converts the resulting token to a REF to a value of the appropriate type. GetRefAny recognizes no tokens of type tokenDOUBLE, and only a few tokens of type tokenSINGLE:
'( starts a list (LIST OF REF ANY), and ') terminates a list,
'+ and '- are unary operators that may precede a numeric literal,
', is ignored between elements of a list,
'^ is always ignored.
GetRefAnyLine: PROC [stream: STREAM] RETURNS [LIST OF REF ANY];
! EndOfStream
! Error[$SyntaxError]
Calls GetRefAny repeatedly until the token found is (1) not an element of a list, and (2) is immediately followed by a CR. Creates a LIST OF REF ANY to hold the sequence of values returned, and returns this list. Is intended for use in command interpreters.
Ad-hoc scanning
The procedures described in this section are designed to tokenize an input stream according to criteria defined by client-supplied procedures.
SkipWhitespace: PROC [stream: STREAM, flushComments: BOOLTRUE]
RETURNS [charsSkipped: INT];
The effect is to read and discard characters from stream until a non-whitespace character is read (and put back using Backup). If flushComments, treats comments as whitespace. Returns the number of characters skipped.
GetToken: PROC [stream: STREAM, breakProc: BreakProc ← TokenProc, buffer: REF TEXT]
RETURNS [token: REF TEXT, charsSkipped: INT];
! EndOfStream (stream.EndOf[] AND token.IsEmpty[] when GetToken is about to return)
BreakProc: TYPE = PROC [char: CHAR] RETURNS [CharClass];
CharClass: TYPE = {break, sepr, other};
The result token is the first sequence of characters in stream that is either a run of consecutive other characters, or a single break character. All chars preceding token, and token itself, are removed from stream. Raises EndOfStream if token would be empty and stream is empty. charsSkipped is the number of chars skipped before reaching the first char of the token.
The simplest BreakProc is a pure function of its char parameter; it does keep state across calls. But it is perfectly ok for a BreakProc supplied to GetToken to keep state across calls to it from GetToken, knowing that the BreakProc will be called exactly once for each character read from the stream by a particular call to GetToken.
The following BreakProcs are defined in IO:
TokenProc: BreakProc;
Is equivalent to {RETURN[SELECT char FROM
IN [NUL .. SP], ',, ':, '; => sepr,
'[, '], '(, '), '{, '}, '", '+, '-, '*, '/, '@, '← => break,
ENDCASE => other]};
IDProc: BreakProc;
Is equivalent to {RETURN[SELECT char FROM
IN [NUL .. SP], ',, ':, '; => sepr,
ENDCASE => other]};
s.GetToken[breakProc: TokenProc] approximates the behavior of the Cedar scanner, but discards commas, colons, and semicolons, does not handle real numbers, rope literals, two-character operators, etc. s.GetToken[IDProc] does not recognize single-character tokens, hence accepts "/indigo/cedar/top/io.df" or "Rovner.pa" as a single token.
GetTokenRope: PROC [
stream: STREAM, breakProc: BreakProc ← TokenProc]
RETURNS [token: ROPE, charsSkipped: INT];
! EndOfStream (stream.EndOf[] AND token.IsEmpty[] when GetTokenRope is about to return)
Calls GetToken, converts token to ROPE, returns it.
GetLine: PROC [stream: STREAM, buffer: REF TEXT] RETURNS [line: REF TEXT];
! EndOfStream (stream.EndOf[] when GetLine is called)
A line is a sequence of characters that does not contain a CR. GetLine simply returns the first line that it finds in the stream. If the line was delimited by CR, the CR is removed from stream but not included in the result line. Raises EndOfStream if the input stream is empty on entry.
GetLineRope: PROC [stream: STREAM] RETURNS [line: ROPE];
! EndOfStream (stream.EndOf[] when GetLineRope is called)
Calls GetLine, converts line to ROPE, returns it.
Character constants
Several character constants from the Ascii interface are re-defined in the IO interface for convenience:
BS: CHAR = Ascii.BS; -- '\b
TAB: CHAR = Ascii.TAB; -- '\t
LF: CHAR = Ascii.LF; -- '\l
FF: CHAR = Ascii.FF; -- '\f
CR: CHAR = Ascii.CR; -- '\n
NUL: CHAR = Ascii.NUL;
ControlA: CHAR = Ascii.ControlA;
BEL: CHAR = Ascii.BEL;
ControlX: CHAR = Ascii.ControlX;
ESC: CHAR = Ascii.ESC;
SP: CHAR = Ascii.SP;
DEL: CHAR = Ascii.DEL;
7. Implementing a stream class
Overview
We represent a stream as a REF to a STREAMRecord:
STREAMRecord: TYPE = RECORD [
streamProcs: REF StreamProcs, -- procedures for the stream class
streamData: REF ANY, -- instance data, type is specific to the stream class
propList: Atom.PropList, -- instance data, type is independent of the stream class
backingStream: STREAM -- special instance data, used to implement layered streams
];
The streamProcs field of a stream points to an immutable record that can be shared by any number of stream instances. A typical stream class implementation builds this record during module initialization. In contrast, the streamData field of a stream points to a mutable record that is not shared across streams. A stream class implementation builds this record in its stream creation procedure.
Many stream class implementations do not use the propList and backingStream fields of the STREAMRecord. Such a stream class implementation is a module of the form:
DIRECTORY IO, XxxOps;
XxxStreamImpl: CEDAR PROGRAM IMPORTS IO EXPORTS XxxOps = {
Data: TYPE = RECORD [ -- Xxx-specific contents -- ];
DataHandle: TYPE = REF Data;
XxxGetChar: PROC [self: STREAM] RETURNS [CHAR] = {
selfData: DataHandle = NARROW[self.streamData];
c: CHAR-- Xxx-specific computation using selfData -- ;
RETURN [c]
};
-- ... other stream proc implementations --
Create: PUBLIC PROC [ -- Xxx-specific parameters -- ] RETURNS [IO.STREAM] = {
data: DataHandle = NEW [Data ← [ -- Xxx-specific initialization -- ]];
RETURN [IO.CreateStream[procs, data]];
};
procs: REF IO.StreamProcs = IO.CreateStreamProcs [
variety: $input, class: $Xxx,
getChar: XxxGetChar,
-- ... other stream procs --
];
}.
Note that IO.CreateStreamProcs allocates and initializes storage for the StreamProcs record, and that IO.CreateStream allocates and initializes storage for the STREAMRecord. Also note that there is no need to monitor accesses to the global variable procs above, because it is constant after initialization and the record it points to is immutable.
A stream class implementation may vary from this template along several dimensions. We summarize them here, and describe them more fully in later sections:
A stream class might use the backingStream field of the STREAMRecord. This is designed to encourage the implementation of stream classes that "inherit" most of their behavior from another class.
A stream class might use the default implementations provided by CreateStreamProcs. This is especially handy when a backing stream is used or when high performance is not required.
A stream class might implement non-generic stream procedures. There are two possible implementation schemes, one that applies when the new procedure is specific to a single stream class, and another when the procedure is implemented by several stream classes.
Backing stream
If s is a stream, s.backingStream is called its backing stream. Backing streams are intended to serve a specific purpose: to make it convenient to implement one stream class in terms of another. As mentioned above, CreateStreamProcs supplies default implementations of generic stream procedures -- even the most fundamental ones such as GetChar and EndOf. These default implementations all check to see if a backing stream is present, and if so they "pass the buck" to it. This means that you can implement a stream class by implementing a few procedures, obtaining the rest from CreateStreamProcs, and supplying a backing stream to CreateStream (via its backingStream parameter, which we defaulted to NIL in the template above). The backing stream of an input stream should be an input stream, and the backing stream of an output stream should be an output stream.
CreateStreamProcs
CreateStreamProcs: PROC[
variety: StreamVariety,
name: ATOM,
getChar: PROC [self: STREAM] RETURNS [CHAR] ← NIL,
-- many others --
]
RETURNS [REF StreamProcs];
On a call to CreateStreamProcs, any procedure parmeters that you pass NIL as the value of (the default) are assigned specific values by CreateStreamProcs. So to understand CreateStreamProcs it is sufficient first to understand what happens when you supply non-NIL values for all procedure parameters, and then to understand the defaults that CreateStreamProcs uses when NIL is supplied.
To first order, all you can do with a REF StreamProcs is to insert it into the streamProcs field of a stream, e.g. by passing it to CreateStream. So the semantics of CreateStreamProcs are understood in terms of the behavior of the resulting stream. Consider the following code fragment:
myGetChar: PROC [self: STREAM] RETURNS [CHAR] = { ... };
myProcs: REF StreamProcs ← IO.CreateStreamProcs[getChar: myGetChar, ...];
s: STREAMIO.CreateStream[streamProcs: myProcs, ...];
[] ← s.GetChar[];
The call "s.GetChar[]" in the final line is equivalent to "myGetChar[s]". At a more primitive level, the call "s.GetChar[]" translates to "call the proc supplied as the getChar parm to the CreateStreamProcs call that created s.streamProcs". This applies uniformly to all of the procedure-valued parameters to CreateStreamProcs, not just to GetChar.
CreateStreamProcs does not allow its caller to specify an implementation of PeekChar. This reflects the current implementation, in which PeekChar is implemented for all stream classes as a call to GetChar, followed by a call to PutBack.
Now for the defaults:
GetChar, EndOf, PutChar, GetIndex, SetIndex, GetLength, SetLength: If a backing stream is present, call the corresponding procedure in the backing stream; otherwise raise ERROR Error[$NotImplementedForThisStream, self].
GetBlock, UnsafeGetBlock, PutBlock, UnsafePutBlock: Implement these in terms of other procedures for the same stream (not by calling down to the backing stream.) For instance, if an implementation of UnsafeGetBlock is provided but an implementation of GetBlock is not, implement GetBlock by calling UnsafeGetBlock. If both GetBlock and UnsafeGetBlock are unimplemented, implement them by calling GetChar.
CharsAvail: If a backing stream is present, call CharsAvail in the backing stream; otherwise return INT.LAST.
Backup: Replace the stream's original StreamProcs by a different set of procedures in which GetChar, GetBlock, UnsafeGetBlock, EndOf, CharsAvail, and Reset know that the stream is in the "backed up" state. When the backed-up character is consumed, the stream's original procedures are restored. Up to 2^15 characters may be backed up in this way; an attempt to back up more than this raises Error[$BufferOverflow, self].
Flush: If a backing stream is present, call Flush in the backing stream; otherwise no effect.
Reset: If a backing stream is present, call Reset in the backing stream; otherwise no effect.
Close: If abort then Reset the stream, otherwise Flush the stream. If a backing stream is present, call Close in the backing stream. Then replace the stream procs with "closed stream" procs (IOUtils.closedStreamProcs).
Example: defining a stream to the Message Window
Consider the problem of implementing a stream class that writes to the Viewer Message Window. The primitives on this window are MessageWindow.Append and MessageWindow.Clear. We want the stream to buffer characters until a Flush is executed, then send them to the window. Stream Reset should clear the buffer and the window.
This implementation demonstrates the full power of backing streams and CreateStreamProcs defaulting. We use a rope output stream to implement the character buffering between Flush calls.
DIRECTORY IO, MessageWindow, Rope;
MessageWindowStreamImpl: CEDAR PROGRAM IMPORTS IO, MessageWindow, Rope = {
STREAM: TYPE = IO.Stream;
ROPE: TYPE = Rope.ROPE;
MessageWindowStreamFlush: PROC [self: STREAM] = {
-- get characters from backing stream --
r: ROPE ← self.backingStream.GetOutputStreamRope[];
-- change crs to spaces (since message window is only one line high!) --
i: INT ← 0;
WHILE (i ← Rope.Find[s1: r, s2: "\n", pos1: i]) # -1 DO
r ← Rope.Replace[base: r, start: i, len: 1, with: " "];
ENDLOOP;
-- display the rope --
MessageWindow.Append[message: r, clearFirst: TRUE];
-- clear buffer in backing stream --
self.backingStream.Reset[];
};
MessageWindowStreamReset: PROC [self: STREAM] = {
self.backingStream.Reset[];
MessageWindow.Clear[];
};
MessageWindowStreamProcs: REF IO.StreamProcs = IO.CreateStreamProcs[
variety: output, name: $MessageWindow,
flush: MessageWindowStreamFlush, reset: MessageWindowStreamReset];
CreateMessageWindowStream: PROC RETURNS [STREAM] = {
RETURN[IO.CreateStream[
streamProcs: MessageWindowStreamProcs, streamData: NIL,
backingStream: IO.ROS[]]];
};
}.
Implementing non-generic stream procedures
Non-generic stream procedures fall into two classes. The first type is exemplified by IO.RopeFromROS, which applies only to rope output streams (created by IO.ROS) and returns a rope containing the bytes sent to the stream. RopeFromROS simply checks that the stream it has been passed is a rope output stream, computes the result rope, and returns it.
The second type is exemplified by IO.GetLength, which applies to any "file like" stream, including file streams, rope streams, and ref text streams, but does not apply to all streams. Rather than have the GetLength procedure "know" about all of these stream classes, GetLength searches the StreamProcs property list of the stream it has been passed for a $GetLength procedure, then calls it. The $GetLength procedure for the class is added to the property list by CreateStreamProcs when the class is created. If CreateStreamProcs did not take a getLength parameter, it would still be possible to attach a getLength procedure to a StreamProcs property list using IOUtils.StoreProc.
We use the module below, a simplified version of the module that implements rope output streams, to illustrate these ideas.
DIRECTORY IO, IOUtils;
RopeOutputStreamImpl: CEDAR PROGRAM IMPORTS IO, IOUtils EXPORTS IO = {
Data: TYPE = RECORD [ -- ROS-specific data -- ];
DataHandle: TYPE = REF Data;
OutputRopeStreamPutChar: PROC [self: STREAM, char: CHAR] = {
data: DataHandle = NARROW[self.streamData];
-- ROS-specific computation using data -- ;
};
-- ... other stream proc implementations --
RopeFromROS: PUBLIC PROC [self: STREAM, close: BOOL] RETURNS [ROPE] = {
data: DataHandle = NARROW[self.streamData !
RuntimeError.NarrowRefFault => ERROR IO.Error[$NotImplementedForThisStream, self];
RETURN[ -- ROS-specific computation using data -- ];
};
TypeOfGetLength: TYPE = PROC [self: STREAM] RETURNS [length: INT];
GetLength: PUBLIC PROC [self: STREAM] RETURNS [length: INT] = {
proc: REF ANY;
DO
IF self.streamProcs.class = $Closed THEN ERROR IO.Error[$StreamClosed, self];
proc ←IOUtils.LookupProc[self, $GetLength];
IF proc # NIL THEN RETURN[(NARROW[proc, REF TypeOfGetLength])^ [self] ]
ELSE IF self.backingStream # NIL THEN self ← self.backingStream
ELSE ERROR IO.Error[$NotImplementedForThisStream, self];
ENDLOOP;
};
OutputRopeStreamGetLength: PROC [self: STREAM] RETURNS [length: INT] = {
data: DataHandle = NARROW[self.streamData];
RETURN[ -- ROS-specific computation using data -- ];
};
ROS: PUBLIC PROC [oldStream: STREAM] RETURNS [stream: IO.STREAM] = {
data: DataHandle = NEW [Data ← [ -- ROS-specific initialization -- ]];
RETURN [IO.CreateStream[procs, data]];
};
procs: REF IO.StreamProcs = IO.CreateStreamProcs [
variety: $output, class: $ROPE,
putChar: OutputRopeStreamPutChar,
-- ... other stream procs --
];
-- for example's sake, attach $GetLength proc using basic method --
-- (would normally let CreateStreamProcs do this for GetLength) --
IOUtils.StoreProc[streamProcs: procs, key: $GetLength,
procRef: NEW[TypeOfGetLength ← OutputRopeStreamGetLength]];
}.
We wish to emphasize the point that GetLength is implemented just once, regardless of the number of classes that implement a procedure like OutputRopeStreamGetLength. Note that GetLength has a higher per-call overhead than RopeFromROS, because there is an extra procedure call and a property list search on each call.
8. Other issues
Ropes, files, and streams
A rope is an immutable byte sequence that lives in (and dies with) a virtual memory. A rope may be derived from an immutable file, but the file and the rope are still different objects.
A file is a (possibly mutable) byte sequence that lives outside of virtual memory. A file is accessed through an open file, which lives in virtual memory, but the file and the open file are still different objects.
An input stream can be a pointer into an existing byte sequence, such as a file or a rope. It can also deal with byte sequences that are not manifest in the way ropes and files are, e.g. a sequence of keystrokes. An output stream can be used to produce a rope or a file.
Concurrency
Individual streams are independent objects, so there is never any need to synchronize calls to their procedures. If two processes are to share a single stream, they must synchronize their accesses to that stream at a level above the stream procedures; individual stream procedures (PutChar, PutBlock, etc.) are not guaranteed to be atomic. A stream class may use a monitor (probably either a module monitor on the implementation or an object monitor on the instance data) to provide atomicity at this level, but would do so to protect its own integrity, not to synchronize its callers.
A stream may be used to access a system resource that has its own concurrency control. For instance, FS provides locks that control concurrent access to local files, and Alpine maintains a separate locking facility for its files.
Efficiency
The basic byte-transmission procedures are GetChar for input streams and PutChar for output streams. In some circumstances the overhead of one procedure call per byte transmitted is not acceptable. In this case a client should use the corresponding block-transmission procedures GetBlock and PutBlock. These traffic in REF TEXT buffers, TEXT being a standard safe type for representing mutable byte sequences. (Note that TEXT is limited to 32k bytes.) Not all stream classes implement the block procedures more efficiently than the byte procedures, but several important ones do, including file streams and Pup streams.
An application that transmits non-text data (such as a MACHINE DEPENDENT record that must be stored on a file for later use) will find it more convenient to use UnsafeGetBlock and UnsafePutBlock, which traffic in LONG POINTERs and byte offsets. A common error in using these procedures is to pass the SIZE of a datum (a word count), rather than the BYTES. Since UnsafeGetBlock stores bytes through an untyped LONG POINTER, it has the potential for unbounded destruction of the garbage collector invariants if misused; inspect each use of UnsafeGetBlock carefully as you wrap it in a TRUSTED block.
A particular stream class may have its own efficiency considerations, which should be discussed in the documentation for the class.
IO procedures that depend upon AM facilities (not available on non-PrincOps machines)
Some procedures in the IO interface may call Cedar abstract machine (AM) interfaces. In some cases clients may wish to avoid this, because it causes a circularity. Here are the IO procedures that may call AM:
Any procedure that takes a Value parameter, when the Value has variant refAny and the REF ANY that is passed does not NARROW to one of the basic types (ATOM, BOOL, CHAR, LONG CARDINAL, INT, REAL, ROPE, REF TEXT, BasicTime.GMT). <Since NARROW of REF Opaque does not work today, the AM is actually called for printing a REF ANY that turns out to be an ATOM, ROPE, or REF BasicTime.GMT, but this is a bug>
Known shortfalls in the interfaces, the implementation, and this document
This document should contain more information on the efficiency of the basic stream primitives when applied to commonly-used stream classes, and on the efficiency of the scanning and printing packages (Taft).
The list of procedures in this document is not up-to-date with what is actually in the interfaces (Plass).