{Begin SubSec Random Access File Operations} {Title Random Access File Operations} {Text {index *PRIMARY* Randomly accessible files} {Tag RandomIO} For most applications, files are read starting at their beginning and proceeding sequentially, i.e., the next character read is the one immediately following the last character read. Similarly, files are written sequentially. However, for files on some devices, it is also possible to read/write characters at arbitrary positions in a file, essentially treating the file as a large block of auxiliary storage. For example, one application might involve writing an expression at the {it beginning} of the file, and then reading an expression from a specified point in its {it middle}. This particular example requires the file be open for {it both} input and output. However, random file input or output can also be performed on files that have been opened for only input or only output. {index *PRIMARY* File pointers} Associated with each file is a "file pointer" that points to the location where the next character is to be read from or written to. The file position of a byte is the number of bytes that precede it in the file, i.e., 0 is the position of the beginning of the file. The file pointer to a file is automatically advanced after each input or output operation. This section describes functions which can be used to {it reposition} the file pointer on those files that can be randomly accessed. A file used in this fashion is much like an array in that it has a certain number of addressable locations that characters can be put into or taken from. However, unlike arrays, files can be enlarged. For example, if the file pointer is positioned at the end of a file and anything is written, the file "grows." It is also possible to position the file pointer {it beyond} the end of file and then to write. (If the program attempts to {it read} beyond the end of file, an {lisp END OF FILE}{index END OF FILE Error} error occurs.) In this case, the file is enlarged, and a "hole" is created, which can later be written into. Note that this enlargement only takes place at the {it end} of a file; it is not possible to make more room in the middle of a file. In other words, if expression {lisp A} begins at position 1000, and expression {lisp B} at 1100, and the program attempts to overwrite {lisp A} with expression {lisp C}, whose printed representation is 200 bytes long, part of {lisp B} will be altered. Warning: File positions are always in terms of bytes, not characters. The user should thus be very careful about computing the space needed for an expression. In particular, {index NS characters}{index NS character I/O} NS characters may take multiple bytes (see {PageRef Term NS character I/O}). Also, the {index End-of-line character}end-of-line character (see {PageRef Term End-of-line character}) may be represented by a different number of characters in different implementations. Output functions may also introduce end-of-line's as a result of {fn LINELENGTH} considerations. Therefore {fn NCHARS} ({PageRef Fn NCHARS}) does {it not} specify how many bytes an expression takes to print, even ignoring line length considerations. {FnDef {FnName GETFILEPTR} {FnArgs FILE} {Text Returns the current position of the file pointer for {arg FILE}, i.e., the byte address at which the next input/output operation will commence. }} {FnDef {FnName SETFILEPTR} {FnArgs FILE ADR} {Text Sets the file pointer{index File pointers} for {arg FILE} to the position {arg ADR}; returns {arg ADR}. The special value {arg ADR}={lisp -1} is interpreted to mean the address of the end of file. Note: If a file is opened for output only, the end of file is initially zero, even if an old file by the same name had existed (see {fn OPENSTREAM}, {PageRef fn OPENSTREAM}). If a file is opened for both input and output, the initial file pointer is the beginning of the file, but {lisp (SETFILEPTR {arg FILE} -1)} sets it to the end of the file. If the file had been opened in append mode by {lisp (OPENSTREAM {arg FILE} 'APPEND)}, the file pointer right after opening would be set to the end of the existing file, in which case a {fn SETFILEPTR} to position the file at the end would be unnecessary. }} {FnDef {FnName GETEOFPTR} {FnArgs FILE} {Text Returns the byte address of the end of file, i.e., the number of bytes in the file. Equivalent to performing {lisp (SETFILEPTR {arg FILE} -1)} and returning {lisp (GETFILEPTR {arg FILE})} except that it does not change the current file pointer. }} {FnDef {FnName RANDACCESSP} {FnArgs FILE} {Text Returns {arg FILE} if {arg FILE} is randomly accessible, {lisp NIL} otherwise. The file {lisp T} is not randomly accessible, nor are certain network file connections in Interlisp-D. {arg FILE} must be open or an error is generated, {lisp FILE NOT OPEN}{index FILE NOT OPEN Error}. }} {FnDef {FnName COPYBYTES} {FnArgs SRCFIL DSTFIL START END} {Text Copies bytes from {arg SRCFIL} to {arg DSTFIL}, starting from position {arg START} and up to but not including position {arg END}. Both {arg SRCFIL} and {arg DSTFIL} must be open. Returns {lisp T}. If {arg END}={lisp NIL}, {arg START} is interpreted as the number of bytes to copy (starting at the current position). If {arg START} is also {lisp NIL}, bytes are copied until the end of the file is reached. Warning: {fn COPYBYTES} does not take any account of multi-byte NS characters ({PageRef Term NS Characters}). {fn COPYCHARS} (below) should be used whenever copying information that might include NS characters. }} {FnDef {FnName COPYCHARS} {FnArgs SRCFIL DSTFIL START END} {Text Like {fn COPYBYTES} except that it copies {index NS characters}NS characters ({PageRef Term NS Characters}), and performs the proper conversion if the end-of-line conventions of {arg SRCFIL} and {arg DSTFIL} are not the same (see {PageRef Term End-of-line character}). {arg START} and {arg END} are interpreted the same as with {fn COPYBYTES}, i.e., as byte (not character) specifications in {arg SRCFIL}. The number of bytes actually output to {arg DSTFIL} might be more or less than the number of bytes specified by {arg START} and {arg END}, depending on what the end-of-line conventions are. In the case where the end-of-line conventions happen to be the same, {fn COPYCHARS} simply calls {fn COPYBYTES}. }} {index *PRIMARY* Searching files} {FnDef {FnName FILEPOS} {FnArgs PATTERN FILE START END SKIP TAIL CASEARRAY} {Text Analogous to {index STRPOS FN}{fn STRPOS} ({PageRef Fn STRPOS}), but searches a file rather than a string. {fn FILEPOS} searches {arg FILE} for the string {arg PATTERN}. Search begins at {arg START} (or the current position of the file pointer, if {arg START}={lisp NIL}), and goes to {arg END} (or the end of {arg FILE}, if {arg END}={lisp NIL}). Returns the address of the start of the match, or {lisp NIL} if not found. {arg SKIP} can be used to specify a character which matches any character in the file. If {arg TAIL} is {lisp T}, and the search is successful, the value is the address of the first character {it after} the sequence of characters corresponding to {arg PATTERN}, instead of the starting address of the sequence. In either case, the file is left so that the next i/o operation begins at the address returned as the value of {index FILEPOS FN}{fn FILEPOS}. {arg CASEARRAY} should be a "case array"{index Case arrays} that specifies that certain characters should be transformed to other characters before matching. Case arrays are returned by {fn CASEARRAY} or {fn SEPRCASE} below. {arg CASEARRAY}={lisp NIL} means no transformation will be performed. A case array is an implementation-dependent object that is logically an array of character codes with one entry for each possible character. {fn FILEPOS} maps each character in the file "through" {arg CASEARRAY} in the sense that each character code is transformed into the corresponding character code from {arg CASEARRAY} before matching. Thus if two characters map into the same value, they are treated as equivalent by {fn FILEPOS}. {fn CASEARRAY} and {fn SETCASEARRAY} provide an implementation-independent interface to case arrays. For example, to search without regard to upper and lower case differences, {arg CASEARRAY} would be a case array where all characters map to themselves, except for lower case characters, whose corresponding elements would be the upper case characters. To search for a delimited atom, one could use " {arg ATOM} " as the pattern, and specify a case array in which all of the break and separator characters mapped into the same code as space. }} For applications calling for extensive file searches, the function {fn FFILEPOS} is often faster than {fn FILEPOS}. {FnDef {FnName FFILEPOS} {FnArgs PATTERN FILE START END SKIP TAIL CASEARRAY} {Text Like {fn FILEPOS}, except much faster in most applications. {fn FFILEPOS} is an implementation of the Boyer-Moore fast string searching algorithm{index Boyer-Moore fast string searching algorithm}. This algorithm preprocesses the string being searched for and then scans through the file in steps usually equal to the length of the string. Thus, {fn FFILEPOS} speeds up roughly in proportion to the length of the string, e.g., a string of length 10 will be found twice as fast as a string of length 5 in the same position. Because of certain fixed overheads, it is generally better to use {fn FILEPOS} for short searches or short strings. }} {FnDef {Name CASEARRAY} {Args OLDARRAY} {Text Creates and returns a new case array, with all elements set to themselves, to indicate the identity mapping. If {arg OLDARRAY} is given, it is reused. }} {FnDef {Name SETCASEARRAY} {Args CASEARRAY FROMCODE TOCODE} {Text Modifies the case array {arg CASEARRAY} so that character code {arg FROMCODE} is mapped to character code {arg TOCODE}. }} {FnDef {Name GETCASEARRAY} {Args CASEARRAY FROMCODE} {Text Returns the character code that {arg FROMCODE} is mapped to in {arg CASEARRAY}. }} {FnDef {FnName SEPRCASE} {FnArgs CLFLG} {Text Returns a new case array suitable for use by {fn FILEPOS} or {fn FFILEPOS} in which all of the break/separators of {var FILERDTBL} are mapped into character code zero. If {arg CLFLG} is non-{lisp NIL}, then all CLISP characters are mapped into this character as well. This is useful for finding a delimited atom in a file. For example, if {arg PATTERN} is {lisp " FOO "}, and {lisp (SEPRCASE T)} is used for {arg CASEARRAY}, then {fn FILEPOS} will find {lisp "(FOO_"}. }} {VarDef {Name UPPERCASEARRAY} {Text Value is a case array in which every lowercase character is mapped into the corresponding uppercase character. Useful for searching text files. }} }{End SubSec Random Access File Operations} {Begin SubSec Input/Output Operations with Characters and Bytes} {Title Input/Output Operations with Characters and Bytes} {Text {index *PRIMARY* Character I/O} {index *PRIMARY* NS character I/O} Interlisp-D supports the 16-bit NS character set (see {PageRef Term NS characters}). All of the standard string and print name functions accept litatoms and strings containing NS characters. In almost all cases, a program does not have to distinguish between NS characters or 8-bit characters. The exception to this rule is the handling of input/output operations. {index Character sets} {index *PRIMARY* Run-encoding of NS characters} Interlisp-D uses two ways of writing 16-bit NS characters on files. One way is to write the full 16-bits (two bytes) every time a character is output. The other way is to use "run-encoding." Each 16 NS character can be decoded into a character set (an integer from 0 to 254 inclusive) and a character number (also an integer from 0 to 254 inclusive). In run-encoding, the byte 255 (illegal as either a character set number or a character number) is used to signal a change to a given character set, and the following bytes are all assumed to come from the same character set (until the next change-character set sequence). Run-encoding can reduce the number of bytes required to encode a string of NS characters, as long as there are long sequences of characters from the same character set (usually the case). {it Note that characters are not the same as bytes.} A single character can take anywhere from one to four bytes bytes, depending on whether it is in the same character set as the preceeding character, and whether run-encoding is enabled. Programs which assume that characters are equal to bytes must be changed to work with NS characters. The functions {fn BIN} ({PageRef Fn BIN}) and {fn BOUT} ({PageRef Fn BOUT}) should only be used to read and write single eight-bit bytes. The functions {fn READCCODE} ({PageRef Fn READCCODE})and {fn PRINTCCODE} ({PageRef Fn PRINTCCODE}) should be used to read and write single character codes, interpreting run-encoded NS characters. {fn COPYBYTES} ({PageRef Fn COPYBYTES}) should only be used to copy blocks of 8-bit data; {fn COPYCHARS} should be used to copy characters. Most I/O functions ({fn READC}, {fn PRIN1}, etc.) read or write 16-bit NS characters. {index File pointers} The use of NS characters has serious consequences for any program that uses file pointers to access a file in a random access manner. At any point when a file is being read or written, it has a "current character set." If the file pointer is changed with {fn SETFILEPTR} ({PageRef Fn SETFILEPTR}) to a part of the file with a different character set, any characters read or written may have the wrong character set. The current character set can be accessed with the following function: {FnDef {Name CHARSET} {Args STREAM CHARACTERSET} {Text Returns the current character set of the stream {arg STREAM}. If {arg CHARACTERSET} is non-{lisp NIL}, the current character set for {arg STREAM} is set. Note that for output streams this may cause bytes to be written to the stream. If {arg CHARACTERSET} is {lisp T}, run encoding for {arg STREAM} is disabled: both the character set and the character number (two bytes total) will be written to the stream for each character printed. }} }{End SubSec Input/Output Operations with Characters and Bytes} ((?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) ?1(DEFAULTFONT 1 (GACHA 10) (GACHA 8) (TERMINAL 8)) -&, )-](*;;"L)tMt(<x5P(,A: #p00V31@7Ez