The IO and Convert interfaces CEDAR 5.0 The IO and Convert interfaces Byte stream I/O and I/O conversions in Cedar By Mark R. Brown Last edit by MBrown.pa on December 12, 1983 Release as [Indigo]Documentation>IODoc.Tioga XEROX Xerox Corporation Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, California 94304 1. Introduction Streams A stream (an instance of type IO.STREAM) is a producer or consumer of a byte sequence. An input stream is a producer, an output stream is a consumer. Cedar's notion of stream is similar to that of Unix and many other systems. The stream abstraction is important for two related reasons. The first reason is that most of the data storage and communication devices that are part of our computers can be modeled as streams. Disks and tapes store byte sequences, and the Ethernet transmits them. Streams do not represent the capabilities of any of these devices perfectly, and streams model several other devices even less perfectly: consider bitmap displays and unencoded keyboards. But each of these devices can be represented as a stream. The second reason is that streams are an adequate form of communication for a significant number of applications. For example, many programs are organized in a "read - eval - print" structure. The program parses an input stream and produces values (which are probably not simply character strings). These values are manipulated in some way, and then they are printed (i.e. converted to character strings and sent to the output stream). Such a program can be structured as a procedure taking an input stream and an output stream as its parameters, and is oblivious to the source of its input and the destination of its output. As long as one application can read the printed output of a second application, the two can be coupled in this way and avoid the difficulty of agreeing upon a single internal data format. Thus the stream abstraction forms a "lowest common denominator" for communication with applications and devices. Cedar makes nearly all devices available as streams. If a device has capabilities that are not adequately modelled by a stream, its streams simply implement some extra procedures that apply only to streams for that device (or for a set of similar devices). For instance, a stream for reading a disk file implements a SetIndex operation that reflects the random-access capability of the disk. By presenting a file as an upward-compatible extension of a stream, the same basic operations can be used by programs that use files as streams and programs that use files as random-access devices. We call each stream implementation a stream class; a stream class is named by an ATOM. For example, a stream on a file has class $File, while a stream on a ROPE has class $ROPE. For emphasis we sometimes call a particular stream a stream instance. Each stream instance of a particular class and "direction" (input or output) implements the same set of procedures. Interfaces involving streams IO is a large interface that contains three major components in addition to the generic operations on streams: procedures for creating streams from a few important data types, procedures for reading from a stream and performing input conversion, and procedures performing output conversion and writing the results to a stream. The stream creation procedure for a typical stream class is defined in an interface associated with the class, not in IO; the EditedStream and IOClasses interfaces define some useful stream classes. IOUtils contains procedures that are only of interest to the implementor of an unusually complex stream class. Why is the IO interface is so large? The usual rule is "one abstraction, one interface", and by this rule the IO interface would only contain the the generic operations on streams. Cedar has chosen to provide many stream procedures in a single large interface as a convenience to programmers: most of the commonly-used procedures that deal in streams can be found in IO, and these procedures can often be called using object notation because they are part of IO.STREAM's cluster. The IO and IOUtils interfaces contains a certain amount of documentation, but this is structured for use while programming and does not attempt to explain underlying concepts or give full detail. The EditedStream and IOClasses interfaces contain full documentation. The Convert interface provides conversions from character strings into Cedar values (numbers, times, etc.) and from Cedar values into character strings. The Convert interface does not really involve streams, but it is often used in conjunction with streams, so we document it here. The remainder of this document is organized as follows: Section 2 gives an informal semantics of the procedures that apply to all streams. Everyone should read this section once; later, you can use it for reference when you need to recall exactly what UnsafeGetBlock does. Section 3 describes the stream-creation procedures that are part of the IO interface, and points to the documentation for several others: file streams, viewer streams, and the like. If you want to create a stream for reading from a ROPE, look in Section 3. Section 4 describes the Convert interface. To turn a ROPE into an INT or vice-versa, consult Section 4. Section 5 describes facilities for formatted output: the PutF and Put procedures. These procedures convert Cedar values into byte strings and transmit them to an output stream; they are an ordinary stream client, but so commonly used that they deserve to be included in the IO interface. Section 6 describes facilities for formatted input. These procedures read from an input stream and construct Cedar values. Just as in the PutF case, they are ordinary stream clients but are included in IO for their ubiquity. Section 7 describes how to implement a stream class. This can be surprisingly simple: if you are not too concerned about performance, you can let many of the stream procedures assume default values. Section 8 contains miscellaneous observations on topics like synchronization of stream access, efficiency of stream procedures, and the relationship between streams, files and ropes. It also describes known shortfalls in the IO interfaces, in the implementation, and in this document. See Section 3.1.1 of the Cedar Language Reference Manual for a description of the notation used for grammars below. 2. Generic stream procedures Stream varieties The fourteen generic stream procedures fall into three groups: Input: GetChar, GetBlock, UnsafeGetBlock, EndOf, Backup, PeekChar, CharsAvail Output: PutChar, PutBlock, UnsafePutBlock, Flush, EraseChar Control: Reset, Close There are three stream varieties: input stream, output stream, and input/output stream. All streams implement the Control procedures, but other procedures are implemented or not as a function of the stream variety. The Cedar type system does not enforce this stream taxonomy; a stream class implementor is responsible for documenting the variety of stream he is implementing, and for providing all of the necessary procedures. An input stream implements the Input procedures and none of the Output procedures. Informally, an input stream produces a sequence of bytes (returned by calls to GetChar, GetBlock, and UnsafeGetBlock). The end of the byte sequence is signalled by (1) EndOf returning TRUE, (2) GetChar raising EndOfStream, or (3) GetBlock or UnsafeGetBlock returning zero bytes when a nonzero number of bytes was requested. Backup and PeekChar provide a look-ahead capability for use by scanners. CharsAvail allows the caller to estimate the real-time response of the stream to later requests. Reset skips to the end of the input sequence, and Close releases resources that were reserved by the stream (such as a file opened for reading). An output stream implements the Output procedures and none of the Input procedures. Informally, an output stream consumes a sequence of bytes (passed in calls to PutChar, PutBlock, and UnsafePutBlock). Flush forces the output bytes to be transmitted immediately to their destination. EraseChar "erases" the most-recently written character (most useful for display-like output devices). Reset has no uniformly-defined effect, while Close terminates the output sequence and releases resources that were reserved by the stream (such as a file opened for writing). An input/output stream implements both the Input and Output procedures. It is not clear that input/output streams are a good idea, and this stream variety may be eliminated in a later version of IO. At present there are two input/output stream classes: read/write file streams, and Pup byte streams. These two have little in common: for file streams, the input and output halves are closely coupled (there is a single "stream index" into the file), while the input and output halves of a Pup byte stream are largely independent. In both cases, the semantics of the input and output halves are similar, and in both cases it seems to make sense for the Control procedures to operate on both halves simultaneously. In contrast, creating a typescript viewer produces two streams, one input stream and one output stream, because the the keyboard and the display have very different properties. Exceptions EndOfStream: ERROR [stream: STREAM] Error: ERROR [ec: ErrorCode, stream: STREAM] ErrorCode: TYPE = { ..., NotImplementedForThisStream, StreamClosed, Failure, ... } Only two errors are raised by generic stream procedures: EndOfStream and Error. Error passes an element of an enumerated type to describe the specific error. Each stream class maps its errors into this fixed set. Clearly this set cannot truly describe the totality of errors that may arise; Error[$Failure, stream] is raised for any error that cannot be represented explicitly. If a client needs more information on the particular error, it must call a class-specific procedure to get the information. The complete error information can be stored in the stream, which is a parameter to Error. Input A simple model of an input stream is that the stream contains a sequence of bytes, the input sequence, as part of its state. GetChar removes the first byte from the input sequence, Backup adds a byte to the front of the sequence, and EndOf tests the sequence for emptiness. GetBlock, UnsafeGetBlock, and PeekChar are expressed in terms of GetChar and Backup. The validity of this model is not diminished by the fact that for several stream classes, the entire input sequence for a stream is not known at the time a stream is created. The stream must behave, except for performance, as though the input sequence were known in advance and only modified by GetChar and Backup. In particular, once the input sequence is empty (as signified by stream.GetChar[] raising EndOfStream), it remains empty unless stream.Backup[char] is performed. The CharsAvail procedure provides information on how much of the input sequence is actually known to a stream. GetChar: PROC [self: STREAM] RETURNS [CHAR] Raises ERROR IO.EndOfStream[self] if the input sequence is empty; otherwise consumes and returns the next byte in the input sequence. Backup: PROC [self: STREAM, char: CHAR] Makes char the "next byte" of the input sequence. That is, after self.Backup[char], self.GetChar[] = char, and the sequence of bytes represented by self.GetChar[] equals the sequence of bytes represented by self (before the Backup call). If char is not the byte produced by the last self.GetChar[], then ERROR IO.Error[$IllegalPutBack, self] may optionally be raised. (The only reason for the char parameter is to allow a default implementation of Backup, for stream classes that choose not to implement Backup directly.) If a large number (thousands) of Backup calls are made without corresponding GetChar calls, then ERROR IO.Error[$BufferOverflow, self] may optionally be raised. EndOf: PROC [self: STREAM] RETURNS [BOOL] Returns TRUE if and only if calling self.GetChar[] raises ERROR IO.EndOfStream[self]. UnsafeGetBlock: UNSAFE PROC [self: STREAM, block: IO.UnsafeBlock] RETURNS [nBytesRead: INT] IO.UnsafeBlock is equal to Basics.UnsafeBlock, declared as TYPE = RECORD [base: LONG POINTER TO PACKED ARRAY [0..0) OF CHAR _ NIL, startIndex: INT _ 0, count: INT _ 0]. Is equivalent to (but for many stream classes is faster than) IF block.startIndex < 0 OR block.count < 0 THEN ERROR RuntimeError.BoundsFault; nBytesRead _ 0; FOR i: INT IN [0 .. block.count) DO block.base^[block.startIndex+i] _ self.GetChar[ ! EndOfStream => EXIT]; nBytesRead _ nBytesRead + 1; ENDLOOP; RETURN[nBytesRead]; GetBlock: PROC [self: STREAM, block: REF TEXT, startIndex: NAT _ 0, count: NAT _ NAT.LAST] RETURNS [nBytesRead: NAT] Is equivalent to (but for many stream classes is faster than) nBytesRead _ self.UnsafeGetBlock[[ base: LOOPHOLE[block,LONG POINTER]+TEXT[0].SIZE, startIndex: startIndex, count: MAX[MIN[INT[count], INT[block.maxLength]-startIndex], 0] ]]; block.length _ startIndex + nBytesRead; RETURN[nBytesRead]; PeekChar: PROC [self: STREAM] RETURNS [char: CHAR] Is equivalent to c: CHAR = self.GetChar[]; self.PutBack[c]; RETURN [c]; CharsAvail: PROC [self: STREAM, wait: BOOL _ FALSE] RETURNS [INT] Returns an estimate of the number of characters from the input sequence that can be delivered very quickly (without waiting for user input or for network transmission.) The end of the input sequence counts as a "character" for the purposes of CharsAvail. A stream class that never waits may return a large constant value (INT.LAST), while a stream class that does wait should either return an exact value or an underestimate: it is acceptable to return 1 representing "some bytes are available or end of input sequence" and 0 representing "no bytes are available". If wait, then does not return until a nonzero value can be returned. CharsAvail allows a single process to consume a byte stream efficiently, without being suspended for reading past the available input. Output A simple model of an output stream is that the stream connects to a sequential output device of some kind. PutChar transmits a single byte to this device. PutBlock and UnsafePutBlock are expressed in terms of PutChar. In practice, of course, an output stream may contain buffering. The Flush procedure forces a stream to flush its buffers to the underlying device. Some important sequential output devices allow PutChar to be "undone": erase the last character painted to the display, truncate the last character appended to a file. The EraseChar procedure provides this function for devices that support it, and has a simple default behavior for devices that do not. PutChar: PROC [self: STREAM, char: CHAR] Appends the given byte to the output sequence. UnsafePutBlock: PROC [self: STREAM, block: IO.UnsafeBlock] Is equivalent to (but for many stream classes is faster than) IF block.startIndex < 0 OR block.count < 0 THEN ERROR RuntimeError.BoundsFault; FOR i: INT IN [block.startIndex .. block.startIndex+block.count) DO self.PutChar[block.base^[i]]; ENDLOOP; PutBlock: PROC [self: STREAM, block: REF READONLY TEXT, startIndex: NAT _ 0, count: NAT _ NAT.LAST] Is equivalent to stopIndexPlusOne: INT _ INT[startIndex]+count; IF stopIndexPlusOne > block.maxLength THEN stopIndexPlusOne _ block.length; self.UnsafePutBlock[[ base: LOOPHOLE[block,LONG POINTER]+TEXT[0].SIZE, startIndex: startIndex, count: MAX[stopIndexPlusOne-startIndex, 0] ]] Flush: PROC [self: STREAM] Transmits all stream writes that have taken place since self was created, or the preceding Flush was done on self. The meaning of "transmits" depends upon the class of the stream self, and even upon the instance self. For instance, Flush applied to a file stream generally means to record all stream writes stably on disk storage. If the underlying file system supports transactions, and self was created with a specific option, then Flush commits the current transaction and creates a new one. EraseChar: PROC [self: STREAM, char: CHAR] If self is a stream on a display, erases the last character written to the display, which must have been char. If self is not a display stream, but is a stream of some class whose PutChar can be undone (for instance, a stream appending to a file), then undoes the last PutChar. Otherwise, performs self.PutChar['\\]; self.PutChar[char] Control Close: PROC [self: STREAM, abort: BOOL _ FALSE] The detailed effect of Close is stream class dependent, but one effect is common to all streams: it makes the stream unusable for further procedures. All procedures on self other than Flush, Reset, and Close will raise ERROR IO.Error[$StreamClosed, self]. The abort parameter has the following meaning for file output streams: if abort, do not Flush the stream as part of the Close. If the underlying file system supports transactions, and self was created with a specific option, then abort the current transaction. Reset: PROC [self: STREAM] For input streams, discard all data from the input sequence that was generated at a time earlier than the call to Reset. For "real-time" input streams, such as keyboard and network, this means to discard the buffered input. For input streams whose input sequence is determined at stream creation time, discard the entire input sequence (so that self.EndOf[] = TRUE after the Reset.) For output and input/output streams, some class-specific effect. For instance, reset on a display stream might clear the display; reset on a file output stream might reposition the stream to the beginning and truncate the file to zero length. Other General information GetInfo: PROC [self: STREAM] RETURNS [variety: StreamVariety, name: ATOM]; Returns the stream's variety (input, output, inputOutput) and its class name ($File, $ROPE, $TEXT, etc.) Stream property list A stream instance has an associated property list containing stream-specific information that is orthogonal to the stream's class. The property list is a public component of the stream; if s is stream, s.propList is its property list. The usage of stream property lists is established by convention among the stream applications. For instance, the "print formatted" package uses the key $SetPFCode on the property list to associate a "print formatted context" with a stream, and file and viewer streams use the key $Name to store a rope, the name of the file or viewer connected to the stream. The following three procedures are provided by IOUtils as a convenience: IOUtils.StoreData: PROC [self: STREAM, key: ATOM, data: REF ANY] Is equivalent to self.propList _ Atom.PutPropOnList[self.propList, key, data] IOUtils.LookupData: PROC [self: STREAM, key: ATOM] RETURNS [REF ANY] Is equivalent to RETURN[Atom.GetPropFromList[self.propList, key]] IOUtils.RemoveData: PROC [self: STREAM, key: ATOM] Is equivalent to self.propList _ Atom.RemPropFromList[self.propList, key] 3. Stream classes <> See the FS interface documentation for basic information on files. FS.StreamOpen creates a stream, given the string name of a file; FS.StreamFromOpenFile creates a stream, given an FS.OpenFile. The FS interface documentation describes the detailed semantics of generic stream operations as they apply to files; we provide a summary here. A file is a sequence of bytes (numbered from zero), and a stream on a file defines a stream index into this sequence. GetChar on an input stream raises EndOfStream when the stream index equals the file length, and otherwise returns the file byte at the stream index and advances it one byte. PutChar on an output stream extends the file by one byte when the stream index equals the file length, and otherwise replaces the file byte at the stream index and advances it one byte. An input/output file stream has a single stream index, so GetChar and PutChar interact. Several input streams may be opened on a single file. It is also possible to open an output stream and several input streams on a single file (if this works out, we may get rid of input/output file streams). The following operations apply to file streams (and streams for other file-like devices): GetIndex: PROC [self: STREAM] RETURNS [index: INT] Returns the stream index. SetIndex: PROC [self: STREAM, index: INT] Sets the stream index. Raises EndOfStream if index > number of bytes in the file; does not extend the file. GetLength: PROC [self: STREAM] RETURNS [length: INT]; Returns the number of bytes in the file. SetLength: PROC [self: STREAM, length: INT]; Sets the number of bytes in the file. Then sets the stream index to be MIN[stream index, length]. The contents of the file bytes IN [previous file length .. length) are undefined. Does not apply to input streams. <> See the Rope interface documentation for basic information on ROPE. RIS: PROC [rope: ROPE, oldStream: STREAM _ NIL] RETURNS [stream: STREAM]; The ROPE input stream behaves much like a file input stream, but gets characters from the input ROPE instead of from a file. GetIndex, SetIndex, and GetLength apply to a ROPE input stream. Reset is equivalent to self.SetIndex[self.GetLength[]]. If oldStream # NIL then RIS attempts to re-use it and avoid storage allocation expense. ROS: PROC [oldStream: STREAM _ NIL] RETURNS [stream: STREAM]; A ROPE is immutable, so it does not make sense to open an output stream on a particular ROPE. Instead, ROS creates an output stream that buffers the output sequence and makes it available to the client via RopeFromROS. Reset causes the output sequence to be discarded. GetLength and GetIndex apply to a ROPE output stream. If oldStream # NIL then ROS attempts to re-use it and avoid storage allocation expense. It is a good idea to close a ROPE output stream when it is no longer of use. The default call to RopeFromROS takes care of this, so the client needs to pay special attention only when RopeFromROS is being called repeatedly on the same stream. The following operation applies to ROPE output streams: RopeFromROS: PROC [self: STREAM, close: BOOL _ TRUE] RETURNS [ROPE]; Returns the entire output sequence as a rope. If close, then call self.Close[] before returning. <> TIS: PROC [text: REF READONLY TEXT, oldStream: STREAM _ NIL] RETURNS [stream: STREAM] The TEXT input stream behaves much like a file input stream, but gets characters from the input REF TEXT instead of from a file. GetIndex, SetIndex, and GetLength apply to a TEXT input stream. Reset is equivalent to self.SetIndex[self.GetLength[]]. If oldStream # NIL then TIS attempts to re-use it and avoid storage allocation expense. TOS: PROC [text: REF TEXT _ NIL, oldStream: STREAM _ NIL] RETURNS [stream: STREAM]; If text = NIL, set text _ a newly-allocated TEXT; set text.length _ 0. The TEXT output stream appends characters to text using RefText.AppendChar. Hence if text overflows, a larger text will be allocated, up to the limit on TEXT length (32k bytes). When this limit is reached, PutChar raises Error[$BufferOverflow, self]. The TEXT buffer is available at any time via TextFromTOS. Reset causes the output sequence, but not the buffer, to be discarded. GetLength and GetIndex apply to a TEXT output stream. If oldStream # NIL then TOS attempts to re-use it and avoid storage allocation expense. The following operation applies to TEXT output streams: TextFromTOS: PROC [self: STREAM] RETURNS [REF TEXT] Returns the entire output sequence as a REF TEXT. Does not close the stream, so may be called repeatedly, but same REF TEXT may be returned several times (and will be modified if PutChar is called). <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <<(Soon to be made obsolete by Ed Taft's new FTP implementation: STP.CreateRemoteStream. Used by FS only.)>> <> <> <> noWhereStream: STREAM; An output stream that simply discards its characters. noInputStream: STREAM; An input stream such that GetChar raises EndOfStream, EndOf returns TRUE, and CharsAvail returns INT.LAST. 4. Convert <> <> Error: ERROR [reason: ErrorType, index: INT]; ErrorType: TYPE = { syntax, overflow, empty, -- parse errors, index gives error location invalidBase, unprintableAtom -- print errors, index = 0 }; <> <> <> <> <> <> <> <> IntFromRope: PROC [r: ROPE, defaultBase: Base _ 10] RETURNS [INT]; Base: TYPE = [2..36]; <> CardFromRope: PROC [r: ROPE, defaultBase: Base _ 10] RETURNS [LONG CARDINAL]; <> RealFromRope: PROC [r: ROPE] RETURNS [REAL]; <> TimeFromRope: PROC [r: ROPE] RETURNS [BasicTime.GMT]; <> UnpackedTimeFromRope: PROC [r: ROPE] RETURNS [BasicTime.Unpacked]; <> BoolFromRope: PROC [r: ROPE] RETURNS [BOOL]; <> AtomFromRope: PROC [r: ROPE] RETURNS [ATOM]; <> <> CardFromDecimalLiteral: PROC [r: ROPE, start: INT _ 0] RETURNS [LONG CARDINAL]; <> CardFromOctalLiteral: PROC [r: ROPE, start: INT _ 0] RETURNS [LONG CARDINAL]; <> CardFromHexLiteral: PROC [r: ROPE, start: INT _ 0] RETURNS [LONG CARDINAL]; <> CardFromWholeNumberLiteral: PROC [r: ROPE, start: INT _ 0] RETURNS [LONG CARDINAL]; <> RealFromLiteral: PROC [r: ROPE, start: INT _ 0] RETURNS [REAL]; <> RopeFromLiteral: PROC [r: ROPE, start: INT _ 0] RETURNS [ROPE]; <> CharFromLiteral: PROC [r: ROPE, start: INT _ 0] RETURNS [CHAR]; <> <> <> <> <> <> <> <> <> RopeFromInt: PROC [from: INT, base: Base _ 10, showRadix: BOOL _ TRUE] RETURNS [Rope.Text]; ! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not = 8, 10, or 16. <> RopeFromCard: PROC [from: LONG CARDINAL, base: Base _ 10, showRadix: BOOL _ TRUE] RETURNS [Rope.Text]; ! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not = 8, 10, or 16. <> RopeFromReal: PROC [from: REAL, precision: RealPrecision _ Real.DefaultSinglePrecision, useE: BOOL _ FALSE] RETURNS [Rope.Text]; <>> RopeFromTime: PROC [from: BasicTime.GMT, start: TimePrecision _ years, end: TimePrecision _ minutes, includeDayOfWeek: BOOL _ FALSE, useAMPM: BOOL _ TRUE, includeZone: BOOL _ TRUE] RETURNS [Rope.Text]; TimePrecision: TYPE = { years, months, days, hours, minutes, seconds, unspecified }; <>> <> RopeFromUnpackedTime: PROC [from: BasicTime.Unpacked, start: TimePrecision _ years, end: TimePrecision _ minutes, includeDayOfWeek: BOOL _ FALSE, useAMPM: BOOL _ TRUE, includeZone: BOOL _ TRUE] RETURNS [Rope.Text]; <> RopeFromBool: PROC [from: BOOL] RETURNS [Rope.Text]; <> RopeFromAtom: PROC [from: ATOM, quote: BOOL _ TRUE] RETURNS [ROPE]; ! Error[$unprintableAtom, 0]: quote = TRUE, and either from = NIL or from's print name is not a valid Cedar identifier. <" is generated for from = NIL). If quote then inserts a leading quote ($) and checks that the print name is a valid Cedar identifier.>> RopeFromRope: PROC [from: ROPE, quote: BOOL _ TRUE] RETURNS [Rope.Text]; <> RopeFromChar: PROC [from: CHAR, quote: BOOL _ TRUE] RETURNS [Rope.Text]; <> AppendInt: PROC [to: REF TEXT, from: INT, base: Base _ 10, showRadix: BOOL _ TRUE] RETURNS [REF TEXT]; ! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not = 8, 10, or 16. AppendCard: PROC [to: REF TEXT, from: LONG CARDINAL, base: Base _ 10, showRadix: BOOL _ TRUE] RETURNS [REF TEXT]; ! Error[$invalidBase, 0]: base is statically invalid (not IN [2 .. 36]), or literal = TRUE and base is not 8, 10, or 16. AppendReal: PROC [to: REF TEXT, from: REAL, precision: RealPrecision _ Real.DefaultSinglePrecision, useE: BOOL _ FALSE] RETURNS [REF TEXT]; AppendTime: PROC [to: REF TEXT, from: BasicTime.GMT, start: TimePrecision _ years, end: TimePrecision _ minutes, includeDayOfWeek: BOOL _ FALSE, useAMPM: BOOL _ TRUE, includeZone: BOOL _ TRUE] RETURNS [REF TEXT]; AppendUnpackedTime: PROC [to: REF TEXT, from: BasicTime.Unpacked, start: TimePrecision _ years, end: TimePrecision _ minutes, includeDayOfWeek: BOOL _ FALSE, useAMPM: BOOL _ TRUE, includeZone: BOOL _ TRUE] RETURNS [REF TEXT]; AppendBool: PROC [to: REF TEXT, from: BOOL] RETURNS [REF TEXT]; AppendAtom: PROC [to: REF TEXT, from: ATOM, quote: BOOL _ TRUE] RETURNS [REF TEXT]; ! Error[$unprintableAtom, 0]: quote = TRUE, and either from = NIL or from's print name is not a valid Cedar identifier. AppendRope: PROC [to: REF TEXT, from: ROPE, quote: BOOL _ TRUE] RETURNS [REF TEXT]; AppendChar: PROC [to: REF TEXT, from: CHAR, quote: BOOL _ TRUE] RETURNS [REF TEXT]; 5. Printing (output conversion) <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <<"This is an integer in a 5 position field: | 17|">> <> <<"This is an integer in a 5 position field: |00017|">> <> <<"This is an integer in a 5 position field: |17 |">> <> <> <> <> <> < 1, FALSE -> 0.>> <<%b: Print number in octal, with trailing 'B.>> <<%d: Print number in decimal.>> <<%x: Print number in hex, with trailing 'H.>> <> <<%e: Print number in scientific notation (mantissa and exponent). Analogous to FORTRAN E format.>> <<%f: Print number in fixed-point notation. Analogous to FORTRAN F format.>> < "TRUE", FALSE -> "FALSE". A value of type INT or LONG CARDINAL is converted to a ROPE in decimal notation, and a value of type REAL is converted to a ROPE in fixed-point notation. A value of type BasicTime.GMT is converted to a ROPE in standard date format (produced by Convert.AppendTime). A value of type REF ANY is converted to a ROPE using AMIO.PrintRefAny (PrintProcs and AMTypes).>> <<%g, %a, %c, %s: Print the string (four codes are assigned for historical reasons; %g is preferred).>> <<%h: Print the string, but print control characters (char < 40C) as "^" (for instance, '\n prints as "^M").>> <> <<%r: Print the number as a time interval in seconds, with format HH:MM:SS (exactly two digits, with zero fill, in the minutes and seconds fields; two or more digits in the hours field.)>> <> <<%t: Print the time in standard date format.>> <> <<%l: For Viewer streams, change looks for subsequent output; for other streams, no effect. The rope is interpreted as a sequence of looks; a lower-case character means "add the look", an upper-case character means "remove the look", and space means "remove all looks". For instance, h.PutF["Make this %lbold%l for emphasis", IO.rope["b"], IO.rope["B]] will print "Make this bold for emphasis" on a Viewer stream.>> <<%q: Print the literal representation of the rope, i.e. the form that would be coded in a program text to produce the given rope. For instance, the character CR in the rope is printed as backslash-n, and the character '" in the rope prints as backslash-". Does not print '" surrounding the rope, but these can be included in the format string if desired. For instance, IO.PutFR["%q", IO.rope["abc\ndef"]] = "abc\\\ndef", while IO.PutFR["%g", IO.rope["abc\ndef"]] = "abc\ndef">> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> $PFFormatSyntaxError: the PFCodeProc cannot interpret the chars between '% and the conversion code. $PFTypeMismatch: the PFCodeProc is not prepared for the supplied Value type. $PFUnprintableValue: the PFCodeProc is not able to print the supplied value (for instance, overflow occurred in real -> int conversion). <> $PFCantBindConversionProc: PutF could not find a PFCodeProc proc to call. $PFFormatSyntaxError: in addition to the errors (described above) that are detected by a PFCodeProc, PutF detects errors such as too many or not enough conversion specifications in format string, and no conversion code following the initial '% of a conversion specification. <> IOUtils.PFProcs: TYPE = REF PFProcsRecord; PFProcsRecord: TYPE; <> <> IOUtils.CopyPFProcs: PROC [stream: STREAM] RETURNS [new: PFProcs]; IOUtils.SetPFCodeProc: PROC [pfProcs: PFProcs, char: CHAR, codeProc: PFCodeProc] RETURNS [previous: PFCodeProc]; IOUtils.SetPFProcs: PROC [stream: STREAM, pfProcs: PFProcs] RETURNS [previous: PFProcs]; <> <> <> <> <> <> <> IOUtils.SetPFErrorProc: PROC [pfProcs: PFProcs, errorProc: PFErrorProc] RETURNS[previous: PFErrorProc]; <> <> IOUtils.RegisterPrintRefAny: PROC [printRefAnyProc: PrintRefAnyProc]; PrintRefAnyProc: TYPE = PROC [stream: STREAM, refAny: REF READONLY ANY, depth: INT, width: INT, verbose: BOOL]; <> IOUtils.SetDefaultPFCodeProc: PROC [char: CHAR, codeProc: PFCodeProc] RETURNS [previous: PFCodeProc]; IOUtils.SetDefaultPFErrorProc: PROC [errorProc: PFErrorProc] RETURNS[previous: PFErrorProc]; <> <> Put: PROC [stream: STREAM, v1, v2, v3: Value _ [null[]]] PutR: PROC [v1, v2, v3: Value _ [null[]]] RETURNS [ROPE] PutL: PROC [stream: STREAM, list: LIST OF Value] PutLR: PROC [list: LIST OF Value] RETURNS [ROPE] <> atom: Convert.RopeFromAtom[from: atom.value, quote: FALSE] bool: Convert.RopeFromBool[bool.value] card: Convert.RopeFromCard[card.value] int: Convert.RopeFromInt[int.value] real: Convert.RopeFromReal[real.value] time: Convert.RopeFromTime[time.value] <<(The implementation actually uses the "Append" variants of these Convert calls, to reduce allocations, but the effect is the same.)>> 6. Scanning <> <> <> <> buffer: REF TEXT = RefText.ObtainScratch[RefText.line]; tokenKind: IO.TokenKind; DO tokenText: REF TEXT; [tokenKind: tokenKind, token: tokenText] _ IO.GetCedarToken[stream, buffer]; SELECT tokenKind FROM tokenEOF => EXIT; tokenID => -- do something with tokenText -- ... ; ENDCASE; ENDLOOP; RefText.ReleaseScratch[buffer]; <> GetCedarToken: PROC [stream: STREAM, buffer: REF TEXT, flushComments: BOOL _ TRUE] RETURNS [tokenKind: TokenKind, token: REF TEXT, charsSkipped: INT, error: TokenError]; <> <> <> TokenKind: TYPE = { tokenERROR, -- token.error describes the scanning error tokenID, -- an identifier or reserved word tokenDECIMAL, -- a whole number literal expressed in decimal tokenOCTAL, -- a whole number literal expressed in octal tokenHEX, -- a whole number literal expressed in hexidecimal tokenREAL, -- a REAL literal tokenROPE, -- a ROPE, REF TEXT, or STRING literal tokenCHAR, -- a CHAR literal tokenATOM, -- an ATOM literal tokenSINGLE, -- a single-character token, such as "(" tokenDOUBLE, -- a double-character token, such as "~=" tokenCOMMENT, -- a comment tokenEOF -- the end-of-file marker }; TokenError: TYPE = { none, -- no error extendedChar, -- error following backslash in char or string literal numericLiteral, charLiteral, stringLiteral, atomLiteral, -- error in parsing indicated type singleChar -- first non-whitespace char is not legal as first char of token }; GetCedarTokenRope: PROC [stream: STREAM, flushComments: BOOL _ TRUE] RETURNS [tokenKind: TokenKind, token: ROPE, charsSkipped: INT]; <> <> <> <> <> <> <> GetInt: PROC [stream: STREAM] RETURNS [INT]; GetCard: PROC [stream: STREAM] RETURNS [LONG CARDINAL]; GetReal: PROC [stream: STREAM] RETURNS [REAL]; GetBool: PROC [stream: STREAM] RETURNS [BOOL]; GetAtom: PROC [stream: STREAM] RETURNS [ATOM]; GetRopeLiteral: PROC [stream: STREAM] RETURNS [ROPE]; GetCharLiteral: PROC [stream: STREAM] RETURNS [CHAR]; GetID: PROC [stream: STREAM] RETURNS [ROPE]; GetTime: PROC [stream: STREAM] RETURNS [BasicTime.GMT]; GetUnpackedTime: PROC [stream: STREAM] RETURNS [BasicTime.Unpacked]; GetRefAny: PROC [stream: STREAM] RETURNS [REF ANY]; <> <> <> <<'( starts a list (LIST OF REF ANY), and ') terminates a list,>> <<'+ and '- are unary operators that may precede a numeric literal,>> <<', is ignored between elements of a list,>> <<'^ is always ignored.>> GetRefAnyLine: PROC [stream: STREAM] RETURNS [LIST OF REF ANY]; <> <> <> <> <> SkipWhitespace: PROC [stream: STREAM, flushComments: BOOL _ TRUE] RETURNS [charsSkipped: INT]; <> GetToken: PROC [stream: STREAM, breakProc: BreakProc _ TokenProc, buffer: REF TEXT] RETURNS [token: REF TEXT, charsSkipped: INT]; <> BreakProc: TYPE = PROC [char: CHAR] RETURNS [CharClass]; CharClass: TYPE = {break, sepr, other}; <> <> <> TokenProc: BreakProc; <> < sepr,>> <<'[, '], '(, '), '{, '}, '", '+, '-, '*, '/, '@, '_ => break,>> < other]};>> IDProc: BreakProc; <> < sepr,>> < other]};>> <> GetTokenRope: PROC [ stream: STREAM, breakProc: BreakProc _ TokenProc] RETURNS [token: ROPE, charsSkipped: INT]; <> <> GetLine: PROC [stream: STREAM, buffer: REF TEXT] RETURNS [line: REF TEXT]; <> <> GetLineRope: PROC [stream: STREAM] RETURNS [line: ROPE]; <> <> <> Several character constants from the Ascii interface are re-defined in the IO interface for convenience: BS: CHAR = Ascii.BS; -- '\b TAB: CHAR = Ascii.TAB; -- '\t LF: CHAR = Ascii.LF; -- '\l FF: CHAR = Ascii.FF; -- '\f CR: CHAR = Ascii.CR; -- '\n NUL: CHAR = Ascii.NUL; ControlA: CHAR = Ascii.ControlA; BEL: CHAR = Ascii.BEL; ControlX: CHAR = Ascii.ControlX; ESC: CHAR = Ascii.ESC; SP: CHAR = Ascii.SP; DEL: CHAR = Ascii.DEL; 7. Implementing a stream class Overview We represent a stream as a REF to a STREAMRecord: STREAMRecord: TYPE = RECORD [ streamProcs: REF StreamProcs, -- procedures for the stream class streamData: REF ANY, -- instance data, type is specific to the stream class propList: Atom.PropList, -- instance data, type is independent of the stream class backingStream: STREAM -- special instance data, used to implement layered streams ]; The streamProcs field of a stream points to an immutable record that can be shared by any number of stream instances. A typical stream class implementation builds this record during module initialization. In contrast, the streamData field of a stream points to a mutable record that is not shared across streams. A stream class implementation builds this record in its stream creation procedure. Many stream class implementations do not use the propList and backingStream fields of the STREAMRecord. Such a stream class implementation is a module of the form: DIRECTORY IO, XxxOps; XxxStreamImpl: CEDAR PROGRAM IMPORTS IO EXPORTS XxxOps = { Data: TYPE = RECORD [ -- Xxx-specific contents -- ]; DataHandle: TYPE = REF Data; XxxGetChar: PROC [self: STREAM] RETURNS [CHAR] = { selfData: DataHandle = NARROW[self.streamData]; c: CHAR _ -- Xxx-specific computation using selfData -- ; RETURN [c] }; -- ... other stream proc implementations -- Create: PUBLIC PROC [ -- Xxx-specific parameters -- ] RETURNS [IO.STREAM] = { data: DataHandle = NEW [Data _ [ -- Xxx-specific initialization -- ]]; RETURN [IO.CreateStream[procs, data]]; }; procs: REF IO.StreamProcs = IO.CreateStreamProcs [ variety: $input, class: $Xxx, getChar: XxxGetChar, -- ... other stream procs -- ]; }. Note that IO.CreateStreamProcs allocates and initializes storage for the StreamProcs record, and that IO.CreateStream allocates and initializes storage for the STREAMRecord. Also note that there is no need to monitor accesses to the global variable procs above, because it is constant after initialization and the record it points to is immutable. A stream class implementation may vary from this template along several dimensions. We summarize them here, and describe them more fully in later sections: A stream class might use the backingStream field of the STREAMRecord. This is designed to encourage the implementation of stream classes that "inherit" most of their behavior from another class. A stream class might use the default implementations provided by CreateStreamProcs. This is especially handy when a backing stream is used or when high performance is not required. A stream class might implement non-generic stream procedures. There are two possible implementation schemes, one that applies when the new procedure is specific to a single stream class, and another when the procedure is implemented by several stream classes. Backing stream If s is a stream, s.backingStream is called its backing stream. Backing streams are intended to serve a specific purpose: to make it convenient to implement one stream class in terms of another. As mentioned above, CreateStreamProcs supplies default implementations of generic stream procedures -- even the most fundamental ones such as GetChar and EndOf. These default implementations all check to see if a backing stream is present, and if so they "pass the buck" to it. This means that you can implement a stream class by implementing a few procedures, obtaining the rest from CreateStreamProcs, and supplying a backing stream to CreateStream (via its backingStream parameter, which we defaulted to NIL in the template above). The backing stream of an input stream should be an input stream, and the backing stream of an output stream should be an output stream. CreateStreamProcs CreateStreamProcs: PROC[ variety: StreamVariety, name: ATOM, getChar: PROC [self: STREAM] RETURNS [CHAR] _ NIL, -- many others -- ] RETURNS [REF StreamProcs]; <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> DIRECTORY IO, MessageWindow, Rope; MessageWindowStreamImpl: CEDAR PROGRAM IMPORTS IO, MessageWindow, Rope = { STREAM: TYPE = IO.Stream; ROPE: TYPE = Rope.ROPE; MessageWindowStreamFlush: PROC [self: STREAM] = { -- get characters from backing stream -- r: ROPE _ self.backingStream.GetOutputStreamRope[]; -- change crs to spaces (since message window is only one line high!) -- i: INT _ 0; WHILE (i _ Rope.Find[s1: r, s2: "\n", pos1: i]) # -1 DO r _ Rope.Replace[base: r, start: i, len: 1, with: " "]; ENDLOOP; -- display the rope -- MessageWindow.Append[message: r, clearFirst: TRUE]; -- clear buffer in backing stream -- self.backingStream.Reset[]; }; MessageWindowStreamReset: PROC [self: STREAM] = { self.backingStream.Reset[]; MessageWindow.Clear[]; }; MessageWindowStreamProcs: REF IO.StreamProcs = IO.CreateStreamProcs[ variety: output, name: $MessageWindow, flush: MessageWindowStreamFlush, reset: MessageWindowStreamReset]; CreateMessageWindowStream: PROC RETURNS [STREAM] = { RETURN[IO.CreateStream[ streamProcs: MessageWindowStreamProcs, streamData: NIL, backingStream: IO.ROS[]]]; }; }. Implementing non-generic stream procedures Non-generic stream procedures fall into two classes. The first type is exemplified by IO.RopeFromROS, which applies only to rope output streams (created by IO.ROS) and returns a rope containing the bytes sent to the stream. RopeFromROS simply checks that the stream it has been passed is a rope output stream, computes the result rope, and returns it. The second type is exemplified by IO.GetLength, which applies to any "file like" stream, including file streams, rope streams, and ref text streams, but does not apply to all streams. Rather than have the GetLength procedure "know" about all of these stream classes, GetLength searches the StreamProcs property list of the stream it has been passed for a $GetLength procedure, then calls it. The $GetLength procedure for the class is added to the property list by CreateStreamProcs when the class is created. If CreateStreamProcs did not take a getLength parameter, it would still be possible to attach a getLength procedure to a StreamProcs property list using IOUtils.StoreProc. We use the module below, a simplified version of the module that implements rope output streams, to illustrate these ideas. DIRECTORY IO, IOUtils; RopeOutputStreamImpl: CEDAR PROGRAM IMPORTS IO, IOUtils EXPORTS IO = { Data: TYPE = RECORD [ -- ROS-specific data -- ]; DataHandle: TYPE = REF Data; OutputRopeStreamPutChar: PROC [self: STREAM, char: CHAR] = { data: DataHandle = NARROW[self.streamData]; -- ROS-specific computation using data -- ; }; -- ... other stream proc implementations -- RopeFromROS: PUBLIC PROC [self: STREAM, close: BOOL] RETURNS [ROPE] = { data: DataHandle = NARROW[self.streamData ! RuntimeError.NarrowRefFault => ERROR IO.Error[$NotImplementedForThisStream, self]; RETURN[ -- ROS-specific computation using data -- ]; }; TypeOfGetLength: TYPE = PROC [self: STREAM] RETURNS [length: INT]; GetLength: PUBLIC PROC [self: STREAM] RETURNS [length: INT] = { proc: REF ANY; DO IF self.streamProcs.class = $Closed THEN ERROR IO.Error[$StreamClosed, self]; proc _IOUtils.LookupProc[self, $GetLength]; IF proc # NIL THEN RETURN[(NARROW[proc, REF TypeOfGetLength])^ [self] ] ELSE IF self.backingStream # NIL THEN self _ self.backingStream ELSE ERROR IO.Error[$NotImplementedForThisStream, self]; ENDLOOP; }; OutputRopeStreamGetLength: PROC [self: STREAM] RETURNS [length: INT] = { data: DataHandle = NARROW[self.streamData]; RETURN[ -- ROS-specific computation using data -- ]; }; ROS: PUBLIC PROC [oldStream: STREAM] RETURNS [stream: IO.STREAM] = { data: DataHandle = NEW [Data _ [ -- ROS-specific initialization -- ]]; RETURN [IO.CreateStream[procs, data]]; }; procs: REF IO.StreamProcs = IO.CreateStreamProcs [ variety: $output, class: $ROPE, putChar: OutputRopeStreamPutChar, -- ... other stream procs -- ]; -- for example's sake, attach $GetLength proc using basic method -- -- (would normally let CreateStreamProcs do this for GetLength) -- IOUtils.StoreProc[streamProcs: procs, key: $GetLength, procRef: NEW[TypeOfGetLength _ OutputRopeStreamGetLength]]; }. We wish to emphasize the point that GetLength is implemented just once, regardless of the number of classes that implement a procedure like OutputRopeStreamGetLength. Note that GetLength has a higher per-call overhead than RopeFromROS, because there is an extra procedure call and a property list search on each call. 8. Other issues Ropes, files, and streams A rope is an immutable byte sequence that lives in (and dies with) a virtual memory. A rope may be derived from an immutable file, but the file and the rope are still different objects. A file is a (possibly mutable) byte sequence that lives outside of virtual memory. A file is accessed through an open file, which lives in virtual memory, but the file and the open file are still different objects. An input stream can be a pointer into an existing byte sequence, such as a file or a rope. It can also deal with byte sequences that are not manifest in the way ropes and files are, e.g. a sequence of keystrokes. An output stream can be used to produce a rope or a file. Concurrency Individual streams are independent objects, so there is never any need to synchronize calls to their procedures. If two processes are to share a single stream, they must synchronize their accesses to that stream at a level above the stream procedures; individual stream procedures (PutChar, PutBlock, etc.) are not guaranteed to be atomic. A stream class may use a monitor (probably either a module monitor on the implementation or an object monitor on the instance data) to provide atomicity at this level, but would do so to protect its own integrity, not to synchronize its callers. A stream may be used to access a system resource that has its own concurrency control. For instance, FS provides locks that control concurrent access to local files, and Alpine maintains a separate locking facility for its files. Efficiency The basic byte-transmission procedures are GetChar for input streams and PutChar for output streams. In some circumstances the overhead of one procedure call per byte transmitted is not acceptable. In this case a client should use the corresponding block-transmission procedures GetBlock and PutBlock. These traffic in REF TEXT buffers, TEXT being a standard safe type for representing mutable byte sequences. (Note that TEXT is limited to 32k bytes.) Not all stream classes implement the block procedures more efficiently than the byte procedures, but several important ones do, including file streams and Pup streams. An application that transmits non-text data (such as a MACHINE DEPENDENT record that must be stored on a file for later use) will find it more convenient to use UnsafeGetBlock and UnsafePutBlock, which traffic in LONG POINTERs and byte offsets. A common error in using these procedures is to pass the SIZE of a datum (a word count), rather than the SIZE times PrincOps.BytesPerWord (a byte count). Since UnsafeGetBlock stores bytes through an untyped LONG POINTER, it has the potential for unbounded destruction of the garbage collector invariants if misused; inspect each use of UnsafeGetBlock carefully as you wrap it in a TRUSTED block. A particular stream class may have its own efficiency considerations, which should be discussed in the documentation for the class. IO procedures that depend upon AM facilities Some procedures in the IO interface may call Cedar abstract machine (AM) interfaces. In some cases clients may wish to avoid this, because it causes a circularity. Here are the IO procedures that may call AM: Any procedure that takes a Value parameter, when the Value has variant refAny and the REF ANY that is passed does not NARROW to one of the basic types (ATOM, BOOL, CHAR, LONG CARDINAL, INT, REAL, ROPE, REF TEXT, BasicTime.GMT). Known shortfalls in the interfaces, the implementation, and this document This document should contain more information on the efficiency of the basic stream primitives when applied to commonly-used stream classes, and on the efficiency of the scanning and printing packages (Taft). IOUtils.SetDefaultPFCodeProc and IOUtils.SetDefaultPFErrorProc do not exist (Spreitzer). It seems advisable to define inlines for the unsafe block operations that take word counts rather than byte counts, because this is a better match to the language's type.SIZE operator (Taft).