{Begin SubSec Strings} {Title Strings} {Text {index *PRIMARY* strings} A string is an object which represents a sequence of characters. Interlisp provides functions for creating strings, concatenating strings, and creating sub-strings of a string. The input syntax for a string is a double quote ({lisp "}),{index *PRIMARY* "} followed by a sequence of any characters except double quote and {lisp %}, terminated by a double quote. The {lisp %} and double quote characters may be included in a string by preceding them with the escape character {lisp %}. Strings are printed by {fn PRINT} and {fn PRIN2} with initial and final double quotes, and {lisp %}s{index % (escape character)} inserted where necessary for it to read back in properly. Strings are printed by {fn PRIN1} without the delimiting double quotes and extra {lisp %}s. A "null string"{index null string} containing no characters is input as {lisp ""}. The null string{index null string} is printed by {fn PRINT} and {fn PRIN2} as {lisp ""}. {lisp (PRIN1 "")} doesn't print anything. Strings are created by {fn MKSTRING}, {fn ALLOCSTRING}, {fn SUBSTRING}, and {fn CONCAT}. Internally a string is stored in two parts; a "string pointer"{index *PRIMARY* string pointer} and the sequence of characters. Several string pointers may reference the same character sequence, so a substring can be made by creating a new string pointer, without copying any characters. It is not possible to directly access a character sequence, so functions that refer to "strings" actually manipulate string pointers. In most cases, the user does not have to be aware of string pointers, but there are some situations where it is important to understand them. For example, suppose that {arg X} is a string pointer to a sequence of characters, and {arg Y} is another string pointer to a substring of {arg X}'s characters. If the characters of {arg Y} are modified (with {fn RPLSTRING} or {fn RPLCHARCODE}), the corresponding characters of {arg X} will be modified too. {note almost all of the string functions have special behavior wrt string ptrs vs character sequences. All of this should be explained better.} {note max length of string? --> big (many K); don't need to document.} {index *BEGIN* string functions} {FnDef {FnName STREQUAL} {FnArgs X Y} {Text Returns {lisp T} if {arg X} and {arg Y} are both strings and they contain the same sequence of characters, otherwise {lisp NIL}. {fn EQUAL} uses {fn STREQUAL}. Note that strings may be {fn STREQUAL} without being {fn EQ}. For instance, {lispcode (STREQUAL "ABC" "ABC") => T} {lispcode (EQ "ABC" "ABC") => NIL} {fn STREQUAL} returns {lisp T} if {arg X} and {arg Y} are the same string pointer, or two different string pointers which point to the same character sequence, or two string pointers which point to different character sequences which contain the same characters. Only in the first case would {arg X} and {arg Y} be {fn EQ}. }} {FnDef {FnName ALLOCSTRING} {FnArgs N INITCHAR OLD} {Text Creates a string of length {arg N} charaters of {arg INITCHAR} (which can be either a character code or something coercible to a character). If {arg INITCHAR} is {lisp NIL}, it defaults to character code 0. if {arg OLD} is supplied, it must be a string pointer, which is re-used. }} {FnDef {FnName MKSTRING} {FnArgs X FLG RDTBL} {Text If {arg X} is a string, returns {arg X}. Otherwise, creates and returns a string containing the print name of {arg X}. Examples: {lispcode (MKSTRING "ABC") => "ABC"} {lispcode (MKSTRING '(A B C)) => "(A B C)"} {lispcode (MKSTRING NIL) => "NIL"} Note that the last example returns the string {lisp "NIL"}, not the atom {atom NIL}. If {arg FLG} is {lisp T}, then the {fn PRIN2}-name of {arg X} is used, computed with respect to the readtable {arg RDTBL}. For example, {lispcode (MKSTRING "ABC" T) => "%"ABC%""} }} {Begin Note} strings can be handed to PRINT and READ ---lmm --- PRINT does *not* accept a string as a file argument -- JonL 8/10/83 --- OPENFILE using a string as the "file" argument does quite different things depending on whether it is for INPUT or OUTPUT; the former seems to treat it as though it were a LITATOM (and hence a file **name**), whereas the later seems always to generate an error. -- JonL 8/10/83 {End Note} {FnDef {FnName SUBSTRING} {FnArgs X N M OLDPTR} {Text Returns the substring of {arg X} consisting of the {arg N}th through {arg M}th characters of {arg X}. If {arg M} is {lisp NIL}, the substring contains the {arg N}th character thru the end of {arg X}. {arg N} and {arg M} can be negative numbers, which are interpreted as counts back from the end of the string, as with {fn NTHCHAR} ({PageRef Fn NTHCHAR}). {fn SUBSTRING} returns {lisp NIL} if the substring is not well defined, e.g., {arg N} or {arg M} specify character positions outside of {arg X}, or {arg N} corresponds to a character in {arg X} to the right of the character indicated by {arg M}). Examples: {lispcode (SUBSTRING "ABCDEFG" 4 6) => "DEF"} {lispcode (SUBSTRING "ABCDEFG" 3 3) => "C"} {lispcode (SUBSTRING "ABCDEFG" 3 NIL) => "CDEFG"} {lispcode (SUBSTRING "ABCDEFG" 4 -2) => "DEF"} {lispcode (SUBSTRING "ABCDEFG" 6 4) => NIL} {lispcode (SUBSTRING "ABCDEFG" 4 9) => NIL} If {arg X} is not a string, it is converted to one. For example, {lispcode (SUBSTRING '(A B C) 4 6) => "B C"} {fn SUBSTRING} does not actually copy any characters, but simply creates a new string pointer to the characters in {arg X}. If {arg OLDPTR} is a string pointer, it is modified and returned. {note {fn SUBSTRING} does not have to actually make the string if {arg X} is a litatom.} }} {FnDef {FnName GNC} {FnArgs X} {Text "Get Next Character." Returns the next character of the string {arg X} (as an atom); also removes the character from the string, by changing the string pointer. Returns {lisp NIL} if {arg X} is the null string.{index null string} If {arg X} isn't a string, a string is made. Used for sequential access to characters of a string. Example: {lispcode ←(SETQ FOO "ABCDEFG") "ABCDEFG" ←(GNC FOO) A ←(GNC FOO) B ←FOO "CDEFG"} Note that if {arg A} is a substring of {lisp B}, {lisp (GNC A)} does not remove the character from {lisp B}. {fn GNC} doesn't physically change the string of characters, just the string pointer.{index string pointers} {note would you ever want to use this with a non-string, since you can't grab the string again to get more than the first char? no! bad feature --lmm} }} {FnDef {FnName GLC} {FnArgs X} {Text "Get Last Character." Returns the last character of the string {arg X} (as an atom); also removes the character from the string. Similar to {fn GNC}. Example: {lispcode ←(SETQ FOO "ABCDEFG") "ABCDEFG" ←(GLC FOO) G ←(GLC FOO) F ←FOO "ABCDE"} }} {FnDef {FnName CONCAT} {FnArgs X{SUB 1} X{SUB 2} {ellipsis} X{SUB N}} {Type NOSPREAD} {Text Returns a new string which is the concatenation of (copies of) its arguments. Any arguments which are not strings are transformed to strings. Examples: {lispcode (CONCAT "ABC" 'DEF "GHI") => "ABCDEFGHI"} {lispcode (CONCAT '(A B C) "ABC") => "(A B C)ABC"} {lisp (CONCAT)} returns the null string, {lisp ""}.{index null string} {note copies all strings} }} {FnDef {FnName CONCATLIST} {FnArgs X} {Text {arg X} is a list of strings and/or other objects. The objects are transformed to strings if they aren't strings. Returns a new string which is the concatenation of the strings. Example: {lispcode (CONCATLIST '(A B (C D) "EF")) => "AB(C D)EF"} }} {FnDef {FnName RPLSTRING} {FnArgs X N Y} {Text Replaces the characters of string {arg X} beginning at character position {arg N} with string {arg Y}. {arg X} and {arg Y} are converted to strings if they aren't already. {arg N} may be positive or negative, as with {fn SUBSTRING}. Characters are smashed into (converted) {arg X}. Returns the string {arg X}. Examples: {lispcode (RPLSTRING "ABCDEF" -3 "END") => "ABCEND"} {lispcode (RPLSTRING "ABCDEFGHIJK" 4 '(A B C)) => "ABC(A B C)K"} Generates an error if there is not enough room in {arg X} for {arg Y}, i.e., the new string would be longer than the original. If {arg Y} was not a string, {arg X} will already have been modified since {fn RPLSTRING} does not know whether {arg Y} will "fit" without actually attempting the transfer. Note that if {arg X} is a substring of {lisp Z}, {lisp Z} will also be modified by the action of {index RPLSTRING FN}{fn RPLSTRING}. Example: {lispcode ← (SETQ FOO "ABCDEFG") "ABCDEFG" ← (SETQ BAR (SUBSTRING FOO 4 6) "DEF" ← (RPLSTRING BAR 2 "XY") "DXY" ← FOO "ABCDXYG"} {note if not enough room in X for Y, error is just "ILLEGAL ARG". There should be a better error msg} {note also generates an error if N is out of bounds of X} {note in Interlisp-D, X is not modified if Y is a non-string that is too large} }} {note should RPLCHARCODE be moved near NTHCHARCODE ??} {FnDef {FnName RPLCHARCODE} {FnArgs X N CHARCODE} {Text Replaces the {arg N}th character of the string {arg X} with the character code {arg CHARCODE}. {arg N} may be positive or negative. Returns the new {arg X}. Similar to {fn RPLSTRING}. Example: {lispcode (RPLCHARCODE "ABCDE" 3 (CHARCODE F)) => "ABFDE"} }} {index *BEGIN* searching strings} {FnDef {FnName STRPOS} {FnArgs PAT STRING START SKIP ANCHOR TAIL} {Text {fn STRPOS} is a function for searching one string looking for another. {arg PAT} and {arg STRING} are both strings (or else they are converted automatically). {fn STRPOS} searches {arg STRING} beginning at character number {arg START}, (or 1 if {arg START} is {lisp NIL}) and looks for a sequence of characters equal to {arg PAT}. If a match is found, the character position of the first matching character in {arg STRING} is returned, otherwise {lisp NIL}. Examples: {lispcode (STRPOS "ABC" "XYZABCDEF") => 4} {lispcode (STRPOS "ABC" "XYZABCDEF" 5) => NIL} {lispcode (STRPOS "ABC" "XYZABCDEFABC" 5) => 10} {arg SKIP} can be used to specify a character in {arg PAT} that matches any character in {arg STRING}. Examples: {lispcode (STRPOS "A&C&" "XYZABCDEF" NIL '&) => 4} {lispcode (STRPOS "DEF&" "XYZABCDEF" NIL '&) => NIL} If {arg ANCHOR} is {lisp T}, {fn STRPOS} compares {arg PAT} with the characters beginning at position {arg START} (or 1 if {arg START} is {lisp NIL}). If that comparison fails, {fn STRPOS} returns {lisp NIL} without searching any further down {arg STRING}. Thus it can be used to compare one string with some {it portion} of another string. Examples: {lispcode (STRPOS "ABC" "XYZABCDEF" NIL NIL T) => NIL} {lispcode (STRPOS "ABC" "XYZABCDEF" 4 NIL T) => 4} Finally, if {arg TAIL} is {lisp T}, the value returned by {index STRPOS FN}{fn STRPOS} if successful is not the starting position of the sequence of characters corresponding to {arg PAT}, but the position of the first character after that, i.e., the starting position plus {lisp (NCHARS {arg PAT})}. Examples: {lispcode (STRPOS "ABC" "XYZABCDEFABC" NIL NIL NIL T) => 7} {lispcode (STRPOS "A" "A" NIL NIL NIL T) => 2} If {arg TAIL}={lisp NIL}, {fn STRPOS} returns {lisp NIL}, or a character position within {arg STRING} which can be passed to {fn SUBSTRING}. In particular, {lisp (STRPOS "" "") => NIL}. However, if {arg TAIL}={lisp T}, {fn STRPOS} may return a character position outside of {arg STRING}. For instance, note that the second example above returns 2, even though {lisp "A"} has only one character. }} {FnDef {FnName STRPOSL} {FnArgs A STR START NEG} {Text {arg STR} is a string (or else it is converted automatically to a string), {arg A} is a list of characters or character codes. {fn STRPOSL} searches {arg STR} beginning at character number {arg START} (or else 1 if {arg START}={lisp NIL}) for one of the characters in {arg A}. If one is found, {fn STRPOSL} returns as its value the corresponding character position, otherwise {lisp NIL}. Example: {lispcode (STRPOSL '(A B C) "XYZBCD") => 4} If {arg NEG}={lisp T}, {fn STRPOSL} searches for a character {it not} on {arg A}. Example: {lispcode (STRPOSL '(A B C) "ABCDEF" NIL T) => 4} If any element of {arg A} is a number, it is assumed to be a character code. Otherwise, it is converted to a character code via {fn CHCON1}. Therefore, it is more efficient to call {fn STRPOSL} with {arg A} a list of character {it codes.} If {arg A} is a bit table, it is used to specify the characters (see {fn MAKEBITTABLE} below) }} {fn STRPOSL} uses a "bit table"{index *PRIMARY* bit tables} data structure to search efficiently. If {arg A} is not a bit table, it is converted to a bit table using {fn MAKEBITTABLE}. If {fn STRPOSL} is to be called frequently with the same list of characters, a considerable savings can be achieved by converting the list to a bit table {it once}, and then passing the bit table to {fn STRPOSL} as its first argument. {FnDef {FnName MAKEBITTABLE} {FnArgs L NEG A} {Text Returns a bit table suitable for use by {fn STRPOSL}. {arg L} is a list of characters or character codes, {arg NEG} is the same as described for {fn STRPOSL}. If {arg A} is a bit table, {fn MAKEBITTABLE} modifies and returns it. Otherwise, it will create a new bit table. }} Note: if {arg NEG}={lisp T}, {fn STRPOSL} must call {fn MAKEBITTABLE} whether {arg A} is a list {it or} a bit table. To obtain bit table efficiency with {arg NEG}={lisp T}, {fn MAKEBITTABLE} should be called with {arg NEG}={lisp T}, and the resulting "inverted" bit table should be given to {fn STRPOSL} with {arg NEG}={lisp NIL}. {note in Interlisp-10, a bit table is just a regular array. In Interlisp-D, a bit table is a CHARTABLE data type. Therefore, all references to arrays have been removed from this doc. There is no reason for users to know anything about the implementation of bit tables, right?} {index *END* searching strings} {index *END* string functions} }{End subsec Strings}