1 TEXT FILE TRANSLATOR 1 TEXT FILE TRANSLATOR TEXT-FILE-TRANSLATOR 1 TEXT-FILE-TRANSLATOR 1 TEXT FILE TRANSLATOR 6 Ron Fischer Xerox AI Systems 3/25/87 (last revised 10/2/87) This document describes a Text File and File Manager File importer and exporter. It covers the file loading and making process, and the static format of text files and File Manager descriptions. The File Manager description is discussed in increasing detail until its programability is covered. An appendix contains a query of potential users and their responses. The Text File Translator changes source code format, it does not alter semantics of code in any fashion (I.E. this is not an Interlisp to Common Lisp translator). 2 Overview 1 The Text File Translator supports the development of portable Common Lisp source code in the Xerox Lisp Environment. It brings portable Common Lisp sources into the File Manager without losing any of their contents. It also makes new text files based on the File Manager's description of the text file contents. The original file's function and ordering are retained, but exact formatting is not. The pretty printer causes all comments and expressions on the text file to be uniformly formatted. There is no support for the use of "text"as a File Manager source file format. All symbols described in this document are in the TEXTMODULES package, nicknamed TM. 2 Dependancies 1 The text file translator requires a patch to the file manager to be loaded. The patch is in the file EVAL-WHEN-PATCH, and it adds an eval-when filecom to the File Manager. Special support for editing and printing of comments is also required. These are provided by the SEDIT-COMMONLISP file. Some caveats on the editing of presentations are mentioned below. The patch and editing support files are automatically loaded by the TEXTMODULES file. 2 Programmer's Interface 1 load-textmodule pathname &optional options [Function] (See below under Text File Format for a description of the format of text files). Like lisp:load; the file indicated by pathname is loaded, but in addition File Manager descriptions of contents (filecoms) are created. Local bindings of reader affecting variables are established and set to Common Lisp defaults, except for the readtable. A special readtable is used which creates internal representations for objects normally lost during reading (see below under Presentation types). If there are some simple forms to set up the read environment at the front of the file, they are recognized and used to create a makefile environment. The forms are then discarded. Each form is read from the file (one at a time). If the form is recognized a description of it is given to the File Manager and its definition is installed. If the form is not recognized it is wrapped in a "top level form" descriptor and then installed by stripping presentation objects and evaluating. No forms after the read environment forms should change the reader's environment. When the file has been completely read its content description is given to the File Manager. Also added to the content description are properties declaring its filetype as :compile-file and makefile-environment as that of the text file (whether given by setting forms at the front of the file or by default). make-textmodule name &optional options [Function] (See below under File manager description of contents for a summary of what may be written out.) The File Manager's description of the file name is used to create a text file. options are as for il:makefile. name must have been loaded with load-textmodule. Local bindings of printer affecting variables are established and set to Common Lisp defaults, except for the readtable. The file's environment is written out, based on its makefile-environment property (see below under File manager description of contents for ways of expressing the environment). The specially made description of the file's contents (from the File Manager) is iterated over to write out each form in the file. 2 Text file format 1 Text files created by the translator contain a Symbolics ZMacs style mode line comment at the front, based on the makefile-environment of the file. The makefile-environment is actually instated using expressions directly following the mode line. ; -*- Mode: LISP; Package: (FOO GLOBAL 1000); Base:10 -*- mode For some versions of the EMACs editor this will declare the major mode, which arranges key to command bindings for LISP instead of documents. package Name, used packages and initial space for symbols. base Numeric "ibase" The mode line is provided purely as a convenience in transporting Common Lisp code to other systems. It has no effect on the Xerox Lisp programming environment (File Manager). The Common Lisp community has agreed that plain text files will use only one reader environment and hence not switch packages or alter the readtable partway through. That reader environment is set up by the seven (plus two) standard environment modifying forms. These forms are recognized by the TextModules translator only if they appear at the front of the file and in order (comments being ignored): Put provide In in-package Seven shadow Extremely export Random require (or il:filesload) User use-package Interface import ... shadowing-import ... setf *read-base* Commands Contents of module Our extensions add optional *read-base* setting and shadowing-import calls. Also, il:filesload may be used in place of require, when a Xerox Lisp file (not containing a provide form) must be loaded. Any one of these forms is optional, but they must appear first in the file (and in order) to be parsed into a makefile-environment when load-textmodule is called. The contents of a text file are a sequence of forms. The Xerox Programming environment (file manager) handles top level defining forms specially. Other forms are just saved and evaluated. 2 File Manager description of contents 1 The File Manager maintains a description of the format (makefile-environment) and contents (filecoms) of a file. The makefile-environment of the file is used to note the package the file is to be read in, and inter file dependancies. The filecoms maintain a description of the top level defining forms in a file, like function and variable definitions. They also store plain forms, which are to be evaluated. WIthin either of these type of forms there appear descriptions of individual objects, like vectors or "number in a radix." These are called presentation types (described separatly; see below). Makefile environment 1 The makefile-environment property of the file is used to set the read / print environment of a file. Three forms of this property are recognized. a string or symbol naming a package a defpackage statement a let statement A string or symbol is simply taken to name a package which should be used to read the file. A defpackage statement will have its portable parts translated into a let statement as described below. A let used for the makefile-environment should bind *package* and contain some form of the standard seven package and module setup forms (See above under "Text file format"). It should finally return the altered value of *package*. For example: (let ((*package* *package*)) ...environment setup forms. *package* ) The forms in this expression must be written in a standard, pre-existing package, such as USER or XCL-USER. This is to break the circularity of writing a package defining form in the package it defines. Filecoms 1 Top level forms are then either recognized or wrapped in a "top level form" filecom. All defdefiner defined top-level forms are recognized (a list is availible above under "Text file format"). In addition there are: (il:* type string) Contains a comment string. Type is a symbol of one, two, three or four semicolons, or a vertical bar. This handles top level single, double or triple semicolon comments, as well as balanced comments. (eval-when when . coms) Wrapper with an evaluation time and more specifiers. [INTERNAL] This is provided via a patch made from Pavel's eval-when com in the Motown source code for the File Manager. The patch file is call filepkg-patch. (il:p (il:translate-presentations form)) Contains an expression which is evaluated at load time. Also handles top level occurances of conditional read and rad time evaluations (hash comma and hash dot). This is a general purpose catch-all. Before evaluatiing these forms any presentation objects in them are stripped our (as for comments) or changed. This is done by the translation functions for the particular presentation objects. E.G. this allows comments to appear anywhere in the text and not affect evaluation. Making new specifiers 1 New specifiers should be added to the list specifier-types. This list is searched linearly; its order is significant. make-specifier-type &key name specifierp identifier add installer printer [Function] name A string naming the specifier. specifierp Predicate on FORM (a content specifier) which recognizes the specifier in the contents description of a file. identifier Predicate on FORM (a form from the text file), answers true if this is the specifier for the definition in FORM. add Function of FORM and CONTENTS which makes the definition in FORM availible (editable) in the programming environment (File Manager). It should make the definition editable and add a specifier for FORM to the file CONTENTS description. Should return the new contents description. To add runs of subforms use add-type and form-specifier-type (see below). Care should be used when making a definition editable. The simplest instance of this occurs when the FORM's definition is a definer. In this case its evaluation may be wrapped in a binding of il:dfnflg to il:prop to ensure that the definition form goes into the table of current definitions without being evaluated. Adding a specifier to the contents desciption should be done in a way that preserves ordering. The simplest way to do this is to append the new specifier to the end of the current CONTENTS description. See below the installer function for a comment on this strategy. installer Function of a FORM which makes the definition in FORM the current one to be used in execution. If the defining mode flg indicates that the file is being loaded for editing only this function will not be called during loading of the form (il:dfnflg is set by the :install option to load-textmodule). To install runs of subforms use install-type and form-specifier-type (see below). Care should be used when making a definition executable. The simplest instance of this occurs when the FORM's definition is a definer. In this case its evaluation may be wrapped in a binding of il:dfnflg to t to ensure that the definition form is actually evaluated. The semantics of the add and installer functions remove confusion between loading a definition into memory for editing and installing that definition as the currently executable one. The add function makes the definition editable and the installer makes it executable. The text file converter separates these ideas more completely than the File Manager, in whichthe il:getdef and il:putdef methods must recognize the il:dfnflg mode. printer Function of FORM and STREAM which prints the former to the latter. The following functions are used to handle subforms EG, in the eval-when specifier there are subforms that need to be parsed. form-specifier-type form [Function] Recognizes the form and returns a type specifier for it. If none is found a warning is signalled and a "do nothing" specifier is returned. This causes the unrecognized form to be lost. add-type type form contents [Function] Adds the form to the contents description based on the type in type. Returns the new contents description. nil is used as an empty contents description. install-type type form [Function] Installs the form as current and executable based on the type in type. This function is sensitive to the current definition mode. 2 Presentation objects 1 Presentation objects represent things that normally disappear during reading, like comments or numbers written in a particular base. Each presentation object must be capable of being read from a text file, edited with SEdit, installed as it would be when normally read, and printed to a text file in its original form. Default presentations 1 This table describes the Common Lisp presentations and their support by the Text File Translator. ; comment ;; comment ;;; comment ;;;; comment Single through quadruple semi-colon comments. Read and printed by the translator. Supported by SEdit. Installed (stripped) with il:remove-comments. Internal formatting is preserved, including CRs and tabs, etc. Adjacent comments are not smashed together so that line breaks are preserved. A single leading space in a comment is ignored, since comments are always printed with a single leading space. These comment types are represented internally in the same way as in Xerox Lisp, I.E. a list beginning with the symbol IL:*, following by a symbol (interned in the INTERLISP package) containing one through four semicolons. #|comment|# Balanced comment, possibly containing commented out code. Read and printed by the translator. Not supported by SEdit. SEdit support could be built from the comment nodetype. This comment type is represented internally in a manner similar to semicolon comments, but where the symbol containing semicolons is replaced by a symbol whose name is the vertical bar character. #+feature form #-feature form Read time conditional. Read and printed by the translator. Supported by SEDIT-COMMONLISP. Installation requires rereading the expression and evaluating it. All feature expressions are preserved: written to the File Manager source file and loaded as per *features* by il:load. Expressions must be preserved "as is" because, eg, numbers of higher precision, symbols in unknown packages, or the inclusion of unknown reader macros. A conditional expression can be either unread or read depending on whether the truth of its features list and polarity are correct. Read conditional expressions are stored as structure. Unread conditional expressions are stored as strings. These are read by remembering file position, doing a read suppress read, and backing up to the original position and saving all the characters between in a string. This means that streams from which conditional read presentations are read must be capable of random access (the TTY is not). These are represented by the hash-plus and hash-minus presentations. #.form Read time evaluation. Read and printed by the translator. Supported by SEDIT-COMMONLISP. Installation requires evaluating the form. Hash dot is represented by the hash-dot presentation. #,form Load time evaluation. Read and printed by the translator. Not supported by SEdit. Installation requires evaluating the form. The compiler must understand the presentation and cause it to be evaluated at compiled file load time. Hash comma is represented by the hash-comma presentation. #\c Character object. Common Lisp read / print convention. Fully supported. Does not require a presentation. #:name Uninterned symbol. Common Lisp read / print convention. Fully supported. Does not require a presentation. #'function cl:function abbreviation. Common Lisp read / print convention. Fully supported. Does not require a presentation. #S(name slot1 value1 ...) Structure. Not supported by SEdit. Does not require a presentation. #Orational #Xrational #Brational #radixRrational Rational representations (Octal, heXidecimal, Binary, Radix provided). Supported by SEDIT-COMMONLISP (except for "hash R"). These are represented internally by the hash-o, hash-x and hash-b presentations. #length*bitstring Bit-vector. Common Lisp read / print convention. Not well supported by SEdit (but can be inspected there). Does not require a presentation type. #length(element ...) Simple vector. Not directly supported by SEdit (can be inspected). Does not require a presentation type. #rankAcontent Array. Not supported by SEdit (but can be inspected there). Does not require a presentation type. #n=object #n# Object tag and reference to tagged object. Not supported by SEdit. No presentation is provided, but one would be required. Not yet addressed is the issue of symbol translation on lookup. In essence, if a symbol explicitly qualified with a particular package is looked up, and the in-memory package inherits the symbol from somewhere else, there is a "translation" of the explicitly qualified symbol into the inherited package. The same sort of translation applies when a symbol written as internal is actually found to be external in a package; it is translated to be external. A solution to this would be a mechanism in the symbol lookup routines which signalled a condition whenever such a translation was about to take place (signalling would normally be disabled). Then a handler could replace the symbol with a presentation type that preserved the original text form. This is similar to treatment of broken atoms in SEdit. Making new presentations 1 The hash-bar and semi-colon comment presentations are currently handled as special cases. The presentation code otherwise depends on representation as structure with unique datatypes. defpresentation name &key fields include print-function read-macro translator [Function] Creates a new type of presentation, which is represented as a structure. The standard structure predicates may be used to identify instances of a presentation type. name A symbol naming the presentation. This will be used as the name of the new structure, and must not conflict with others. fields A list of structure fields that instances of this presentation will have. include (optional; defaults to presentation) The name of a structure to inherit fields from. Useful when defining a subclass of presentations. All presentations must ultimately inherit from the structure presentation. The full include syntax may be used here to override default field values. print-function prints the presentation object as plain text. read-macro list that describes the syntax of characters that indicate when object is found in text. The list contains one or two characters followed by a function that creates the presentation. If one character is provided it is set up as a macro-character with the given function. If two are provided the first is made a dispatching macro character and a dispatching macro character with the given function is created. The read macro is placed in the LISP-FILE readtable. translator Either the flag :delete, which causes the presentation to be removed from the containing form, or a function on the presentation which returns what is normally read. The :delete flag is typically used with comments. The function usually converts the presentation to its as read form, eg #o10 would be converted to the number 8 and returned by the translator function. 2 Editing presentations 1 All neccessary Common Lisp presentations are supported (except for circular structure). However, it is not possible to edit a fairly large number of these presentations in SEdit. Sadly, there was no time to provide EditNoteTypes for vectors, arrays and structures since this is (currently) a rather involved programming task. Editing of hash-plus and hash-minus presentations is prone to problems. Never change the sign of the conditional read to something other than a plus or a minus (do not delete it either). 2 Suggested functionality 1 Patch option Prettyprints a text file containing only changed defs. Friendly fixes to sources being maintained primarily in text oriented environments. Convert defmacros to defdefiners Done with user confirmation during read. Easy bridge to use of defdefiner features, like File Manager support. Convert comments to doc strings Using external comments and comments in doc string positions as documentation is a common idiom at Xerox AIS. Limited usefulness. One problem is that large doc strings are rather ugly and undesirable. Would need to reconvert on the way out as well (back to an external comment). The comment would need to be keyed on some indentation level. This sort of feature would be disabled by default.(LIST ((PAGE NIL (PAPERSIZE NIL STARTINGPAGE# 1) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD RIGHT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY MODERN OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO NIL) (270 15 288 36) NIL) (HEADING NIL (HEADINGTYPE FOOTINGR) (54 27 558 36) NIL) (TEXT NIL NIL (54 54 504 723) NIL))) (PAGE NIL (PAPERSIZE NIL) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD LEFT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY MODERN OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO NIL) (54 15 288 36) NIL) (HEADING NIL (HEADINGTYPE FOOTINGV) (54 27 558 36) NIL) (HEADING NIL (HEADINGTYPE VERSOHEAD) (54 762 558 36) NIL) (TEXT NIL NIL (54 54 504 684) NIL))) (PAGE NIL (PAPERSIZE NIL) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD RIGHT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY MODERN OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO NIL) (270 15 288 36) NIL) (HEADING NIL (HEADINGTYPE FOOTINGR) (54 27 558 36) NIL) (HEADING NIL (HEADINGTYPE RECTOHEAD) (54 762 558 36) NIL) (TEXT NIL NIL (54 54 504 684) NIL))))) 1ÌøÆÌ)øT1ÌøÆÌ/``øøT(ÌÌø)ø2T)øT)ÌÌøTBøø PAGEHEADING VERSOHEADBøø PAGEHEADING RECTOHEADAøø PAGEHEADINGFOOTINGVAøø PAGEHEADINGFOOTINGR(MODERN MODERN MODERN  HELVETICA TERMINAL MODERNMODERNMODERN   HRULE.GETFNMODERN    HRULE.GETFNMODERN     HRULE.GETFNMODERN    HRULE.GETFNMODERN  HRULE.GETFN< ov* HRULE.GETFNMODERN  HRULE.GETFNMODERN ;¹OU HRULE.GETFNMODERN  HRULE.GETFNMODERN ­¼W HRULE.GETFNMODERN HRULE.GETFNMODERN   1 Zx}¶1R6    ,+!  yc *ƒ HRULE.GETFNMODERN HRULE.GETFNMODERN ÷:”<±•        +ˆ ¾ HRULE.GETFNMODERN% HRULE.GETFNMODERN qys HRULE.GETFNMODERN•&   \ :/ ºIÌ  HRULE.GETFNMODERNUlË 6Ÿ#Ê HRULE.GETFNMODERN+=      o rà OËA Á^ Å $q  ,D~ ¨   "*+   0> HRULE.GETFNMODERN HRULE.GETFNMODERN@ HRULE.GETFNMODERNb„þ߲Ša ›'P30#Eˆ6Ñ:K!L! G!%! FQ n& E&>&E8  HRULE.GETFNMODERN¹4 ¦{K!/ Ó ”^] HRULE.GETFNMODERN HRULE.GETFNMODERNH¼ HRULE.GETFNMODERN HRULE.GETFNMODERNš’­P Ôzº