{Begin Chapter Strings}
{Title Strings}
{Text

{index *PRIMARY* Strings}


A string is an object which represents a sequence of characters.  Interlisp provides functions for creating strings, concatenating strings, and creating sub-strings of a string.

The input syntax for a string is a double quote {index *PRIMARY* " (string delimiter)}({lisp "}), followed by a sequence of any characters except double quote and {lisp %}, terminated by a double quote.  The {lisp %} and double quote characters may be included in a string by preceding them with the character {lisp %}.

Strings are printed by {fn PRINT} and {fn PRIN2} with initial and final double quotes, and {lisp %}s{index %} inserted where necessary for it to read back in properly.  Strings are printed by {fn PRIN1} without the delimiting double quotes and extra {lisp %}s.


{index *PRIMARY* Null strings}

A "null string" containing no characters is input as {lisp ""}.  The null string is printed by {fn PRINT} and {fn PRIN2} as {lisp ""}.  {lisp (PRIN1 "")} doesn't print anything.


{index *PRIMARY* String pointers}

Internally a string is stored in two parts; a "string pointer" and the sequence of characters.  Several string pointers may reference the same character sequence, so a substring can be made by creating a new string pointer, without copying any characters.  Functions that refer to "strings" actually manipulate string pointers.  Some functions take an "old string" argument, and re-use the string pointer.

{note almost all of the string functions have special behavior wrt string ptrs vs character sequences.  All of this should be explained better.}



{note max length of string?  --> big (many K); don't need to document.}





{index *PRIMARY* STRINGP Fn}
{FnDef {FnName STRINGP} {FnArgs X}
{Text
Returns {arg X} if {arg X} is a string,{index strings} {lisp NIL} otherwise.
}}


{FnDef {FnName STREQUAL} {FnArgs X Y}
{Text
Returns {lisp T} if {arg X} and {arg Y} are both strings and they contain the same sequence of characters, otherwise {lisp NIL}.  {fn EQUAL} uses {fn STREQUAL}.  Note that strings may be {fn STREQUAL} without being {fn EQ}.  For instance,

{lispcode (STREQUAL "ABC" "ABC")  =>  T}

{lispcode (EQ "ABC" "ABC")  =>  NIL}

{fn STREQUAL} returns {lisp T} if {arg X} and {arg Y} are the same string pointer, or two different string pointers which point to the same character sequence, or two string pointers which point to different character sequences which contain the same characters.  Only in the first case would {arg X} and {arg Y} be {fn EQ}.
}}


{FnDef {FnName STRING-EQUAL} {FnArgs X Y}
{Text
Returns {lisp T} if {arg X} and {arg Y} are either strings or litatoms, and they contain the same sequence of characters, ignoring case.  For instance,

{lispcode (STRING-EQUAL "FOO" "Foo")  =>  T}

{lispcode (STRING-EQUAL "FOO" 'Foo)  =>  T}

This is useful for comparing things that might want to be considered "equal" even though they're not both litatoms in a consistent case, such as file names and user names.
}}


{FnDef {FnName ALLOCSTRING} {FnArgs N INITCHAR OLD FATFLG}
{Text
Creates a string of length {arg N} characters of {arg INITCHAR} (which can be either a character code or something coercible to a character).  If {arg INITCHAR} is {lisp NIL}, it defaults to character code 0.  if {arg OLD} is supplied, it must be a string pointer, which is modified and returned.

If {arg FATFLG} is non-{lisp NIL}, the string is allocated using full 16-bit {index NS Characters}NS characters (see {PageRef Term NS Characters}) instead of 8-bit characters.  This can speed up some string operations if NS characters are later inserted into the string.  This has no other effect on the operation of the string functions.
}}




{FnDef {FnName MKSTRING} {FnArgs X FLG RDTBL}
{Text
If {arg X} is a string, returns {arg X}.  Otherwise, creates and returns a string containing the print name of {arg X}.  Examples:

{lispcode (MKSTRING "ABC")  =>  "ABC"}

{lispcode (MKSTRING '(A B C))  =>  "(A B C)"}

{lispcode (MKSTRING NIL)  =>  "NIL"}

Note that the last example returns the string {lisp "NIL"}, not the atom {atom NIL}.


If {arg FLG} is {lisp T}, then the {fn PRIN2}-name of {arg X} is used, computed with respect to the readtable {arg RDTBL}.  For example,

{lispcode (MKSTRING "ABC" T)  =>  "%"ABC%""}
}}


{FnDef {FnName NCHARS} {FnArgs X FLG RDTBL}
{Text
Returns the number of characters in the print name of {arg X}.  If {arg FLG}={lisp T}, the {index PRIN2-names}{fn PRIN2}-name is used.  For example,

{lispcode (NCHARS 'ABC)  =>  3}

{lispcode (NCHARS "ABC" T)  =>  5}

Note:  {fn NCHARS} works most efficiently on litatoms and strings, but can be given any object.
}}


{FnDef {FnName SUBSTRING} {FnArgs X N M OLDPTR}
{Text
Returns the substring of {arg X} consisting of the {arg N}th through {arg M}th characters of {arg X}.  If {arg M} is {lisp NIL}, the substring contains the {arg N}th character thru the end of {arg X}.  {arg N} and {arg M} can be negative numbers, which are interpreted as counts back from the end of the string, as with {fn NTHCHAR} ({PageRef Fn NTHCHAR}).  {fn SUBSTRING} returns {lisp NIL} if the substring is not well defined, e.g., {arg N} or {arg M} specify character positions outside of {arg X}, or {arg N} corresponds to a character in {arg X} to the right of the character indicated by {arg M}).  Examples:

{lispcode (SUBSTRING "ABCDEFG" 4 6)  =>  "DEF"}

{lispcode (SUBSTRING "ABCDEFG" 3 3)  =>  "C"}

{lispcode (SUBSTRING "ABCDEFG" 3 NIL)  =>  "CDEFG"}

{lispcode (SUBSTRING "ABCDEFG" 4 -2)  =>  "DEF"}

{lispcode (SUBSTRING "ABCDEFG" 6 4)  =>  NIL}

{lispcode (SUBSTRING "ABCDEFG" 4 9)  =>  NIL}


If {arg X} is not a string, it is converted to one.  For example,

{lispcode (SUBSTRING '(A B C) 4 6)  =>  "B C"}


{fn SUBSTRING} does not actually copy any characters, but simply creates a new string pointer to the characters in {arg X}.   If {arg OLDPTR} is a string pointer, it is modified and returned.


{note {fn SUBSTRING} does not have to actually make the string if {arg X} is a litatom.}
}}


{FnDef {FnName GNC} {FnArgs X}
{Text
"Get Next Character."  Returns the next character of the string {arg X} (as an atom); also removes the character from the string, by changing the string pointer.  Returns {lisp NIL} if {arg X} is the null string.  If {arg X} isn't a string, a string is made.  Used for sequential access to characters of a string.  Example:

{lispcode
←(SETQ FOO "ABCDEFG")
"ABCDEFG"
←(GNC FOO)
A
←(GNC FOO)
B
←FOO
"CDEFG"}

Note that if {arg A} is a substring of {lisp B}, {lisp (GNC A)} does not remove the character from {lisp B}.

{note would you ever want to use this with a non-string, since you can't grab the string again to get more than the first char?
no! bad feature --lmm}
}}


{FnDef {FnName GLC} {FnArgs X}
{Text
"Get Last Character."  Returns the last character of the string {arg X} (as an atom); also removes the character from the string.  Similar to {fn GNC}.  Example:

{lispcode
←(SETQ FOO "ABCDEFG")
"ABCDEFG"
←(GLC FOO)
G
←(GLC FOO)
F
←FOO
"ABCDE"}
}}


{FnDef {FnName CONCAT} {FnArgs X{SUB 1} X{SUB 2} {ellipsis} X{SUB N}}
{Type NOSPREAD}
{Text
Returns a new string which is the concatenation of (copies of) its arguments.  Any arguments which are not strings are transformed to strings.  Examples:

{lispcode (CONCAT "ABC" 'DEF "GHI")  =>  "ABCDEFGHI"}

{lispcode (CONCAT '(A B C) "ABC")  =>  "(A B C)ABC"}

{lisp (CONCAT)} returns the null string, {lisp ""}.

{note copies all strings}
}}


{FnDef {FnName CONCATLIST} {FnArgs L}
{Text
{arg L} is a list of strings and/or other objects.  The objects are transformed to strings if they aren't strings.  Returns a new string which is the concatenation of the strings.  Example:

{lispcode (CONCATLIST '(A B (C D) "EF"))  =>  "AB(C D)EF"}
}}



{FnDef {FnName RPLSTRING} {FnArgs X N Y}
{Text
Replaces the characters of string {arg X} beginning at character position {arg N} with string {arg Y}.  {arg X} and {arg Y} are converted to strings if they aren't already.  {arg N} may be positive or negative, as with {fn SUBSTRING}.  Characters are smashed into (converted) {arg X}.  Returns the string {arg X}.  Examples:

{lispcode (RPLSTRING "ABCDEF" -3 "END")  =>  "ABCEND"}

{lispcode (RPLSTRING "ABCDEFGHIJK" 4 '(A B C))  =>  "ABC(A B C)K"}

Generates an error if there is not enough room in {arg X} for {arg Y}, i.e., the new string would be longer than the original.  If {arg Y} was not a string, {arg X} will already have been modified since {fn RPLSTRING} does not know whether {arg Y} will "fit" without actually attempting the transfer.


Warning:  In some implementations of Interlisp, if {arg X} is a substring of {lisp Z}, {lisp Z} will also be modified by the action of {fn RPLSTRING} or {fn RPLCHARCODE}.  However, this is not guaranteed to be true in all cases, so programmers should not rely on {fn RPLSTRING} or {fn RPLCHARCODE} altering the characters of any string other than the one directly passed as argument to those functions.
}}



{note should RPLCHARCODE be moved near NTHCHARCODE ??}

{FnDef {FnName RPLCHARCODE} {FnArgs X N CHAR}
{Text
Replaces the {arg N}th character of the string {arg X} with the character code {arg CHAR}.  {arg N} may be positive or negative.  Returns the new {arg X}.  Similar to {fn RPLSTRING}.  Example:

{lispcode (RPLCHARCODE "ABCDE" 3 (CHARCODE F))  =>  "ABFDE"}
}}



{index *PRIMARY* Searching strings}

{FnDef {FnName STRPOS} {FnArgs PAT STRING START SKIP ANCHOR TAIL CASEARRAY BACKWARDSFLG}
{Text 
{fn STRPOS} is a function for searching one string looking for another.  {arg PAT} and {arg STRING} are both strings (or else they are converted automatically).  {fn STRPOS} searches {arg STRING} beginning at character number {arg START}, (or 1 if {arg START} is {lisp NIL}) and looks for a sequence of characters equal to {arg PAT}.  If a match is found, the character position of the first matching character in {arg STRING} is returned, otherwise {lisp NIL}.  Examples:

{lispcode (STRPOS "ABC" "XYZABCDEF")  =>  4}

{lispcode (STRPOS "ABC" "XYZABCDEF" 5)  =>  NIL}

{lispcode (STRPOS "ABC" "XYZABCDEFABC" 5)  =>  10}

{arg SKIP} can be used to specify a character in {arg PAT} that matches any character in {arg STRING}.  Examples:

{lispcode (STRPOS "A&C&" "XYZABCDEF" NIL '&)  =>  4}

{lispcode (STRPOS "DEF&" "XYZABCDEF" NIL '&)  =>  NIL}


If {arg ANCHOR} is {lisp T}, {fn STRPOS} compares {arg PAT} with the characters beginning at position {arg START} (or 1 if {arg START} is {lisp NIL}).  If that comparison fails, {fn STRPOS} returns {lisp NIL} without searching any further down {arg STRING}.  Thus it can be used to compare one string with some {it portion} of another string.  Examples:

{lispcode (STRPOS "ABC" "XYZABCDEF" NIL NIL T)  =>  NIL}

{lispcode (STRPOS "ABC" "XYZABCDEF" 4 NIL T)  =>  4}

If {arg TAIL} is {lisp T}, the value returned by {fn STRPOS} if successful is not the starting position of the sequence of characters corresponding to {arg PAT}, but the position of the first character after that, i.e., the starting position plus {lisp (NCHARS {arg PAT})}.  Examples:

{lispcode (STRPOS "ABC" "XYZABCDEFABC" NIL NIL NIL T)  =>  7}

{lispcode (STRPOS "A" "A" NIL NIL NIL T)  =>  2}

If {arg TAIL}={lisp NIL}, {fn STRPOS} returns {lisp NIL}, or a character position within {arg STRING} which can be passed to {fn SUBSTRING}.  In particular, {lisp (STRPOS "" "")  =>  NIL}.  However, if {arg TAIL}={lisp T}, {fn STRPOS} may return a character position outside of {arg STRING}.  For instance, note that the second example above returns 2, even though {lisp "A"} has only one character.

If {arg CASEARRAY} is non-{lisp NIL}, this should be a casearray like that given to {fn FILEPOS} ({PageRef Fn FILEPOS}).  The casearray is used to map the string characters before comparing them to the search string.

If {arg BACKWARDSFLG} is non-{lisp NIL}, the search is done backwards from the end of the string.
}}



{FnDef {FnName STRPOSL} {FnArgs A STRING START NEG BACKWARDSFLG}
{Text
{arg STRING} is a string (or else it is converted automatically to a string), {arg A} is a list of characters or character codes.  {fn STRPOSL} searches {arg STRING} beginning at character number {arg START} (or else 1 if {arg START}={lisp NIL}) for one of the characters in {arg A}.  If one is found, {fn STRPOSL} returns as its value the corresponding character position, otherwise {lisp NIL}.  Example:

{lispcode (STRPOSL '(A B C) "XYZBCD")  =>  4}

If {arg NEG}={lisp T}, {fn STRPOSL} searches for a character {it not} on {arg A}.  Example:

{lispcode (STRPOSL '(A B C) "ABCDEF" NIL T)  =>  4}

If any element of {arg A} is a number, it is assumed to be a character code.  Otherwise, it is converted to a character code via {fn CHCON1}.  Therefore, it is more efficient to call {fn STRPOSL} with {arg A} a list of character {it codes.}

If {arg A} is a bit table, it is used to specify the characters (see {fn MAKEBITTABLE} below)

If {arg BACKWARDSFLG} is non-{lisp NIL}, the search is done backwards from the end of the string.
}}


{index *PRIMARY* Bit tables}

{fn STRPOSL} uses a "bit table" data structure to search efficiently.  If {arg A} is not a bit table, it is converted to a bit table using {fn MAKEBITTABLE}.  If {fn STRPOSL} is to be called frequently with the same list of characters, a considerable savings can be achieved by converting the list to a bit table {it once}, and then passing the bit table to {fn STRPOSL} as its first argument.


{FnDef {FnName MAKEBITTABLE} {FnArgs L NEG A}
{Text
Returns a bit table suitable for use by {fn STRPOSL}.  {arg L} is a list of characters or character codes, {arg NEG} is the same as described for {fn STRPOSL}.  If {arg A} is a bit table, {fn MAKEBITTABLE} modifies and returns it.  Otherwise, it will create a new bit table.

Note:  if {arg NEG}={lisp T}, {fn STRPOSL} must call {fn MAKEBITTABLE} whether {arg A} is a list {it or} a bit table.  To obtain bit table efficiency with {arg NEG}={lisp T}, {fn MAKEBITTABLE} should be called with {arg NEG}={lisp T}, and the resulting "inverted" bit table should be given to {fn STRPOSL} with {arg NEG}={lisp NIL}.
}}



}{End Chapter Strings}