DocumentRepresentation.tioga
Rick Beach, May 22, 1986 4:50:39 pm PDT
Rick Beach, January 2, 1987 3:42:06 pm PST
JOLOBOFF
TRENDS AND STANDARDS IN DOCUMENT REPRESENTATION
SIGGRAPH '87 TUTORIAL COURSE NOTES
DOCUMENTATION GRAPHICS
Trends and Standards in Document Representation
[Republished from Text Processing and Document Manipulation,
Copyright Ó 1986, Cambridge University Press]
Trends and Standards in Document Representation
Vania Joloboff
Bull Research Center
BP 68-38402 Saint Martin d'Heres Cedex. France
ABSTRACT: This paper starts by tracing the architecture of document preparation systems. Two basic types of document representations appear: at the page level or at logical level. The paper then focuses on logical level representations and tries to survey three existing formalisms: SGML, Interscript and ODA.
1. Introduction
Document preparation systems might be now the most commonly used computer systems, ranging from stand-alone text processing individual machines to highly sophisticated systems running on mainframe computers. All of those systems internally use a more or less formal system for representing documents. Document representation formalisms are very different according to their goals. Some of them define the interface with the printing device, they are oriented towards a precise geometric description of the contents of each page in a document. Others are used internally in systems as a memory representation. Yet others have to be learned by users; they are symbolic languages used to control document processing.
The trouble is that there are today nearly as many representation formalisms as document preparation systems. This makes it nearly impossible, first to interchange documents among heterogeneous systems, second to have standard programming interfaces for developping systems. Standardization organizations and large companies are now trying to establish standards in the field in order to stop proliferation of formalisms and facilitate document interchange.
This paper focuses in the last sections on three document representation formalisms often called 'revisable formats', namely SGML [SGML], ODA [ODA], and Interscript [Ayers & al.], [Joloboff & al.]. In order to better understand what is a revisable format, the paper starts with a look at the evolution of the architecture of document preparation systems.
2. Architecture of Document Preparation Systems
Document preparation systems have appeared as soon as computer printing devices were able to output typewriter-like quality documents. Although the evolution of printing technology have been the major one, several factors have influenced the architecture of document preparation systems: low cost computing power, distributed systems, and the simple maturation of ideas in the field. The evolution of printing technology has lead to the digital representation of documents ready to be printed, called final form representation. The evolution of software techniques has principally lead to representations capturing the logical structure, the structure that is perceived by the author when the document is revised, i.e. constructed or modified.
2.1 Final form representation
On early document preparation systems, printing devices were basically typewriter-like terminals directly connected in character mode to the unique processing computer. Those devices were driven by sequences of control characters inserted in the data stream they received in order to produce layout rendition (underlining, overstriking). A formatting system basically had to translate the formatting commands into printer control sequences.
As printers from different vendors had different control sequences, device independent formats were needed in order to print the same document on different sites with different printers. Final form representation had appeared, that is, the final digital representation of a document before it is printed. The main property of a final representation is that the number of pages in a document has then been computed. The way each object (character string or graphics) should appear on the page is totally determined.
On non impact printers, virtually any image is reproducable: characters in any alphabet, graphics and images as well. There do not exist any more a specific set of imaging functions available from the hardware. Then the limit to the expressive power of the page creator is set by the software interface.
This fundamental change brought by technology has implied a fundamental change in the design of final form representations for non-impact printing. A final form representation is not any more a sequence of characters, it has to be an organized structure. A formal method must be used to describe the page layout, offering a maximum expressiveness to the page creator. Such formalisms theoretically allow for the description of any page for any printer.
They divide into static formats and dynamic ones, more recent. In a static format the page layout is described as a static data structure. The standard CCITT T73 [T73] is a typical example of such formats. In dynamic formats, also referred to as procedural page description languages, a page description actually describes how to compute the layout.
Brian Reid's paper [Reid86] in that very conference talks more extensively on procedural page description languages, such as PostScript[PostScript]. The point we want to emphasize now is that the architecture (figure 1) of document preparation systems has now a clean interface with printing devices. It generates a final form representation of documents in terms of a structured page description formalism.
2.2 Revisable form representation
A document has to undergo many additions or modifications before it is ready to be printed. Working on a page based representation when editing a document would be tedious and cumbersome both for users and the editing system. An unformatted representation of documents is necessary. This representation typically is the output of the editing system and the input of the formatting system. Figure 1 shows the three basic components of a document preparation system: editing, formatting and printing. Revisable form and final form are the two representations interfacing these components.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 1. Typical architecture of a document preparation system.
===
The first document preparation systems have naturally imitated the method used in the publishing industry for typesetting: additional information is interspersed among the document contents to produce a data stream directly processed by the typesetting device. On those early systems the revisable form representation simply consists of a text file containing control sequences, directly keyed in by the user from a standard terminal.
Control sequences consists of a series of markup signs. That was the beginning of so called procedural markup languages, since those markup signs were interpreted as instructions controlling subsequent processing in the formatting system.
Procedural markup has well known inconvenients:
f the logical structure of a document is not much evidenced once the document is marked up. For example, if chapter titles have been marked with a centering command, it does not appear clearly that what follows a centering command is a title. If someone later wants to flush all titles right, changing all centering commands into flush commands will probably not give the expected result.
f the style of the resulting documents, i.e. the aspect of the document layout, is determined by the user who placed the markup signs. A good layout style, if some style at all, requires from the user some typographic knowledge. The lack of this knowlege is responsible for all of the ugly documents produced on procedural markup systems... Also, it makes it difficult to output the same document in a different style.
Disavantages of procedural markup have been avoided with a new method, known as declarative markup. The standpoint in declarative markup is that the user should describe the logical structure of a document, what is to be processed rather than how the document content is to be processed. A user enters mark up signs indicating logical properties of data, for example paragraph or heading, expressing its logical structure, which sounds more familiar, and does not imply a particular processing. The responsibility of making consistent styles, or applying specific functions is left to the system. GML [Goldfarb] and Scribe[Reid83] are two examples of declarative markup systems; the reader is referred to [Furuta & al.] for an extensive survey of such formatting systems.
The SGML formalism is essentially the definition of an international standard by ISO for covering these systems. Yet a SGML entity may refer to non-character data, as shown in the next section, it has been designed in the spirit of all markup systems. As the standard says (page 3);
``The millions of existing text entry devices must be supported. SGML documents can easily be keyboarded and understood by humans.''
A user does not need a specific editor to build a markep up document. As far as there are only characters, any editor will do on any standard terminal. The revisable form representation of a document in a markup system, be it declarative or procedural, is (or should be) fully known from users, they have to key it in...
More recent approaches have a different viewpoint. They assume the revisable form representation is not directly accessed by users, but solely by the editing system. Thus a specific editor is needed, which generates that representation. It is intended such editors will not expose users to the revisable representation; that they will actually hide to the user the internal representation of documents, constructing themselves this representation from the user input.
These editors are expected to provide a more convivial user interface. Most of the editors from this new generation do not run on standard terminals, for example Grif, presented in this conference [Quint & Vatton]. They rather use bitmap display terminals, a window system and a pointing device.
The new type of document representation used in this approach may then be designed to be quite complex, nearly unmanageable by human beings, but very suitable to be handled by computers. Graphics and images may be directly inserted in documents more easily than for markup formats. Graphics may rely on existing standard graphics representation, images may be stored trough specific data compression techniques, while the user only sees on the screen a real layout.
Interscript and ODA both belong to this new genereration of formalisms. They assume more computing power from the editing system, they lose the possibility to be directly entered from a standard terminal, but promise many more possibilities.
3. Generalized Markup Language
SGML stands for Standard Generalized Markup Language. It is essentially a declarative markup language, which has inherited mainly from its ancestor GML. However it includes a lot of new interesting features.
A first difference with its predecessors is that markup is defined rigorously. It is possible from the SGML standard definition to build a general syntactic parser that will not arise ambiguities. According to this rigorous syntax, SGML documents may be processed very much like programs by a compiler. A document may be parsed to build an abstract syntactic tree together with its attributes.
Semantics of that tree may be evaluated by semantic functions according to the attributes values. Thus, SGML can be used for other tasks than formatting ones. Semantics of markup tags and attributes might be used for machine translation, automatic indexing or any other process needing parsing of documents.
A markup sign in SGML is named a tag. Any element which needs to be tagged starts with a start-tag and ends with an end-tag. Any tag is delimited by the characters < and >. A tag is defined by an identifier, which appears first in the start-tag. An end-tag repeats the same identifier preceded by /. Note that all of these mark characters are redefinable for each document.
End tags may also be omitted under conditions specified in the standard. For example, a paragraph will appear as:
<p>This is a short paragraph.</p>
A drawback of usual declarative markup systems is that one is forced to use the catalog of markup tags which is offered by the system. Since markup tags express the logical structure of documents, it means one cannot define the logical structure in other terms than the general tags set up once for all by the system.
A property of SGML is that tags are themselves described trough a formal language: the SGML meta-language, which may be used within SGML documents to dynamically define new symbols. Syntax to introduce a meta-language construct simply follows < by !.
The SGML meta language allows for the definition of complex constructs, named elements. An element declaration defines of a class of objects, i.e. an element type. Subsequent objects in the document may be tagged with the element name. Elements may have a hierarchical structure, and each element in the hierarchy may have its own attributes. Element types may be used either to facilitate the interactive creation of documents, to control the validity of a document structure, or to associate a layout style to a particular document type.
For example, one might define a document type for a conference paper as follows:
<!ELEMENT
1 paper (title abstract sections)
language CHARS
2 title (#CDATA)
3 abstract (p)
4 body (p*)
>
This document type declaration specifies that a paper has a title, an abstract, and a body. The title consists of characters, the abstract is one paragraph and the body one or more paragraphs. A paper has a language attribute to indicate in which language it is written. More complex combina-tions can be designed to define document types that have some commonality.
The facility to define new elements brings troubles when laying out those elements, because the formating system then does not know how to format such constructs. SGML provide two ways for handling that situation. The first one is naturally to add to the SGML system a procedure to take care of the new tags. This requires a good knowledge of the system and prohibits further interchange of documents with such tags to systems which do not have this procedure. The second one is to use a LINK tag. A LINK tag says to the system that a construct should be handled as another one, presumably known from the system, with possible attributes modifications. For example, if one says <!LINK abstract paragaph indent=5>, it means an abstract has to be formatted like a paragraph, however using a different indentation value.
It is often required in a document to be able to refer to other parts of the document. Some binding mechanism is needed in the formalism to attach a value to some identifier, which resembles to progamming language variables. Binding is achieved in SGML trhough entity declaration and entity references. An entity (a value, a character string or any valid SGML constituent) may be bound to a name by the notation <!ENTITY name entity value>. From now on, that entity may later be referenced by its name either to set an attribute value, or to be included into the running text. Entities also provide means to handle non character data. An external entity is declared <!ENTITY name SYSTEM system information>. Then it is known that this entity is not in the document stream. The processing system will find in the system information how to access that content.
If the document is to be interchanged among different computers with different operating systems, this system information is specific to each system. SGML provides an IGNORE/INCLUDE mechanism for that purpose. Information relative to some particular system, let say osx, has to be encoded within the magic declaration <![osx;[<?commands for osx system>]]>. Then a user only needs to turn a switch at the beginning of the document to the local system for the document to be processed correctly.
4. Interscript
We mentioned previously Interscript is a representation formalism from a new generation. Interscript, which was originally designed at Xerox PARC, starts from the idea that a document representation should be suited to be processed by computers, not by the humans who manipulate documents.
Such things as traversing trees, evaluating expressions, searching values of variables within contexts are among what computers can easily do. Thus, a fundamental notion in Interscript is to rely on a formal language to describe document constructs, not only a document logical structure, but all formal constructs that could be necessary into a document representation. These abstract constructs may be data structures such as paragraphs, fonts, geometric shapes, but may also represent computations, like setting a context or evaluating expressions within some context.
The Intescript approach is very much like the approach used in software engineering: general programming languages are used by people to build abstract constructs and procedures to solve their particular problem. A document representation problem should be solved using the a document representation language. The Interscript base language is simple (around 25 grammar rules) and powerful. Its semantics are well defined but its syntax rapidly leads to document that cannot be managed by humans.
A document encoded in the Interscript base language is called a script. A script is very much like a program. The processing paradigm (figure 2) is that a script should be first internalized by a system. Internalizing a script implies execution of computations, which are dictated only by Intercript base language semantics, and result in the construction of another representation available for the client process. This simply means that one translates a standard disk representation into a non standard memory representation, while achieving computations.
Computations are necessary in the internalizing process because the base language includes a binding mechanism and the evaluation of expressions within hierarchical contexts. For example, evaluating the expression:
rightmargin = leftmargin + linelength
needs to obtain the values bound to the variable names.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 2. Interscript processing model.
===
We will not in this paper enters into the details of the internalizing process, which looks like the evaluation of any interpreted programming language, to focus on the central concepts of node and tag.
A script is a hierarchy of nodes. Nodes have contents and tags. The authors have compared an Interscript node to a bottle of wine. The contents of the bottle is qualified by several tags on the bottle: a price tag, a product number tag. Interscript tags similarly qualifies the node contents. To some extent an Interscript tag is similar to an sgml tag, it introduces an element, it has attributes, it denotes structural properties of the contents.
The difference is that, first a Interscript node may have simultaneous tags, second attributes of a tag may be bound to an expression which must be evaluated. For example, a figure caption could be affixed with both a CAPTION and a PARAGRAPH tag. The paragraph tag says that the caption text has to be laid out as a parapragph, the caption tag restricts the placement of that paragraph relatively to the figure picture. The leftmargin attribute of the paragraph might be set to be equal to the margin of some object X. Then the node hierarchy is searched for that X.
Interscript syntax denotes nodes between curly braces. Tags are character strings followed by a dollar sign. A typical node is:
{ PARAGRAPH$ PARAGRAPH.leftmargin = 10
{CHARS$ <paragraph text content> }}
4.1 The pouring process
Markup languages do not provide good support for describing layout. They start from idea that a user should hardly be able to specify layout, in order to enforce style discipline. It is true that users of a document preparation system are usually not interested in setting line and page breaks, selecting fonts, positioning titles, etc. However, they are often concerned with placement of logos, page numbers, whether there are one or more columns; what we might call macroscopic layout.
Interscript provides for that purpose a comprehensive mechanism we shall name descriptive layout. Descriptive layout does not prohibits the use of styles, it would rather enforce their use too, however it allows for the specification of high level layout. Tags have been defined which symbolically represent the layout process. By placing those tags at appropriate places and specifying attributes values, a user may indicate to a formatting process how layout should be achieved.
All of those specifications appear as parameters of the Interscript
pouring process. The Interscript metaphor for this process is that the document content, is
poured into some
liquid layout, resulting in a
solid layout. Liquid layout basically serves as a template which guides the pouring process in its actions. The pouring process is naturally described by means of constructs expressed in the base language. The fundamental pattern for invoking a pouring process is:
{ POUR$
POUR.template = {TEMPLATE$ -- template -- }
contents to be poured }
A template basically is a hierarchy of boxes. A box defines a rectangular area on which constraints apply to locate it relatively to other boxes. Assume a user wants a page layout as shown on figure 3. That page has a header at the top and a logo down the header. The content should be laid out in a right area on the page, leaving some place for margin comments that should be placed on the left.
When the content is poured in the page, the pouring process must not pour any content into the heading or the logo box, neither does it pour text content into the margin comment box. This correct placement of data is ensured by the MOLD mechanism.
When a box is to receive data, it shows a MOLD tag accompanied by a label. The pouring process does not try to pour content into boxes that do not have a mold tag, it directly places them within the page box. If a box is a mold, then it looks in the node contents for some node with a matching label. All content portions with matching label are to be poured into that box.
===
[Artwork node; type 'Artwork on' to command tool]
Figure 3. A page layout.
==
It may be the case that there is too much content to be poured to fit on a single page. A template may specify that it is a sequential template, an iterative or an alternative one. In a sequential template, the pouring process will consider all boxes in the hierarchy sequentially. If a template is full, or no more matching content exists, it considers the next mold. An iterative template will repeat itself until all matching content has been poured.
An alternative template specifies different possibilities for pouring the content, the layout process is responsible for choosing one. One possibility is to try all of them, and pick up the best by its own criteria. Another possibility is to have additive tags indicating when or how to select an alternative. For example, one might indicate a template to be used on screens, another one for paper.
Templates may also be combined. Figure 4 shows an typical example, a paper like this one. The first page has a particular layout showing the title, authors, an abstract of the paper; down of the abstract starts the paper. All subsequent pages are in the same format, showing only a page heading and
===
[Artwork node; type 'Artwork on' to command tool]
Figure 4. Two different page layout in a single template.
===
text. The template for that document is a sequential one. It contains the first page as a single box (a page is a particular box), next an iterative page template.
Many others possibilities are offered by this descriptive layout process. We only described in this paper its general properties.
5. Office Document Architecture
Office Document Architecture is a standard elaborated within ISO to introduce a standard in the data structures used for the digital representation of documents. The most particular property of ODA is that it does not fit in the traditional architecture schema: ODA defines simultaneously the logical structure and the layout structure of a document, i.e. it is both a logical and a page description format.
The argument in favour of that unique representation seems to be that most editing systems have to manage both structures. The standard says (part 2 - page 75)
``In a text processing system with separate editing and formatting subsystems, the specific layout is created after any changes to the specific logical structure and content have been made. In a word processor type editor, small editing changes may be incorporated directly into existing specific layout structure after every command, without recreating the entire specific layout structure.''
This issue is discussed in the conclusion. This section only tries to present the ODA formalism. Figure 5 shows the main constituents of a document. It has six parts, a document profile, a document style, a generic and a specific logical structure, a generic and a specific layout structure.
A document may actually contain only one of the structure. This is indicated when transmitting in the document profile. The document profile also contain data related to the whole document: creation date, last alteration date, originators, status, etc.
Generic and specific are to be interpreted respectively as class and instance. For example a generic structure named conference paper will describe the general structure and properties of a conference paper, as shown in the SGML section. A particular instance of a conference paper will be described by its specific structure. Attributes defined in the generic will be valued in the specific structure, possibly to a default value specified in the generic part. The specific structure should be consistent with the generic one, and will probaly inherit properties from the generic structure. Specific logical and layout structure are trees whose leaf nodes are named basic objects and other nodes composite objects. Any node may carry attributes.
The specific logical structure expresses the structure of the document in, e.g. paragraphs, chapters, titles, etc. The specific layout structure is a tree of page sets (a set of pages identified as a single entity), pages, frames and blocks. Blocks and frames are rectangular areas located within frames and pages.
===
Figure 5. ODA document components.
===
Blocks and basic logical objects both refer to the document content. This content is divided in content portions. A content portion is governed by a content architecture, which basically defines the content type (characters, images, graphics) and its encoding mechanism.
The logical and layout structure are clearly not independent, they refer to the same document content and they may have reciprocal pointers.
Figure 6 (reproduced from the standard) shows the coexistence of the two structures implies particular constraints. A paragraph which spans over two pages has to be split into two content portions.
===
Figure 6. Simultaneous layout and logical structure.
===
Styles simply are a named set of attributes, which can be referenced from other components in the document. They divide in layout style and presentation style. A presentation style is attached to a basic object and depends upon the nature of that objects. For characters, it would indicate font information, for images it would probably indicate colors or half-toning. Layout style defines global style information. It can be referenced only from logical objects.
The standard is somewhat fuzzy about generic structures. There is no clause devoted to the description of generic structures, while there is one for each specific structure. It says that object class description are used by the editing process to construct a specific logical structure but it does not say much about such descriptions. Part 3 of the standard, which describes the layout process, indicates how the generic layout structure should be used, hence give a clearer idea.
The generic layout structure contains common content portions, for example logos or headings that should be used in many places in the document. It also serves as a guide for the layout process.
6. Conclusion
We have focused in this paper on three representation formalisms considered as revisable form representations, namely SGML, ODA and Interscript. On these three formalisms, SGML and ODA have reached the status of ISO draft proposal, which means they will become definitive standards with very little modifications.
SGML results from experience accumulated since more than ten years by current practice in the field of markup languages. The standard has a precise definition, which makes it possible to rigorously parse a document. Document markup tags may induce hierarchical structures expressing the logical content of a document. A simple binding system among entities has been introduced, which allows for cross referencing among entities.
Knowing that SGML has been running on many machines, that high quality text books have been produced through an SGML system, A vendor commercializing a markup document system would probably better take SGML rather than inventing a new formalism.
ODA did not follow the same standardization process as SGML. ODA is an attempt by an ISO working group, consisting mostly of text processing system vendors representatives, to define a standard before there are hundred of representation formalisms around the world that would not be compatible. Hence there is no current practice of ODA and it is only expected that most of new systems will use ODA. However there are a few objections to actually using ODA.
The choice of the two coexisting layout and logical structures. might lead to implementation problems. The content portions which have to be split to satisfy layout constraints will have to be recollected when the layout is modified.
Now that most printing device vendors have upgraded their machines to have a procedural page description language, the design of the layout structure in terms of frames and blocks looks old fashioned and contradictory with the fact that ODA claims to be a standard for future systems.
One might fear too with ODA that vendors will actually offer ``ODA subsets''. Particularly it is the case SGML can be considered as such a subset. The standard explicitly states, probabaly for reasons of compatibility among ISO standards, that an SGML document may be transmitted within an ODA document (part5-page 4):
``Any subdocument within a (ODA) document may be represented either by descriptors and text units or by an SGML entity set. An SGML entity set is a self contained unit of SGML information, which is denoted but the term document in the SGML standard.''
A vendor may actually sell an SGML system as an ODA subset system. If each vendor offers a subset of ODA, it might be the case that all of those systems will be actually incompatible, which is not desirable for an interchange standard. This argument is naturally true for all standards, but ODA design makes it easier to have closed subsets.
For example, it is possible to design an ODA editing system that would not take into consideration all of the layout part and restrict to the logical structures and styles. This editing system will output only documents with logical structures and styles. These documents may be interchanged using the ODA format.
It is possible too, in terms of delimiting an ODA subset, to design a simple word processing machine that would use no logical structure at all to produce office documents with only a layout structure. But both of those ODA systems will not be able to interchange a single document.
Morover, it seems from ODA complexity that a complete ODA system can hardly be implemented on a small word processing workstation. Implementors of these relatively small workstations, who are willing to manage documents with both logical and layout structure, will probably have to define a subset in order to maintain satisfying performance.
Though SGML has been designed so that human beings may enter markup tags into a document, it might well be used as an internal representation for an editor that would not appear to the user as a markup system. Then the structuring possibilities offered by SGML may be used by the implementors to represent complex internal structures, producing equivalent facilities to those of ODA. Documents produced by such an editor could hardly be revised by humans from a standard terminal, but they could still be output with the high quality of an SGML formatting system. Thus a vendor who is willing to implement a document preparation system has to choose among two international standards.
Interscript is not an international standard and it seems it will not become. The reason might be that Interscript design is too much a departure from their existing formalisms to be accepted by most vendors, who are mostly interested in standards. Remember that bitmap displays were developed at Xerox PARC in 1975. In 1985, still very few vendors offer text processing systems with a bitmap display and a pointing device. Interscript was also born at Xerox PARC in 1983 [Ayers & al], as a result of several years of experience with powerful text processing systems running on bitmap displays...
Yet Interscript will not be a standard in the eighties, it has introduced two important ideas, the notions of base language associated with an internalization process, and descriptive layout, which should be retained by people who are participating to the design of a new generation of document preparation systems.
Interscript proves that a base language can be defined which encompasses all abstractions that can be found in the document preparation world. It can describe as well a document logical structure, properties and structure of various kind of entities (font, paragraph, etc), and functional symbolisms like the pouring operation.
A base language considerably simplifies the software development of systems once it is implemented, but all over it gives cleanness to the systems and clarity in concepts. The Interscript base language is certainly not perfect. It can be improved and it might be actually too powerful for its goals.
Similarly, the idea of a layout process formally described and specified by abstract constructs, can be expressed in other terms than the particular Interscript pouring process. But both concepts have opened a direction for present research in the field.
References
[1] Adobe Systems Incorporated (1984). PostScript language manual. 1870 Embarcadero Road, Palo Alto California.
[2] Ayers, R.M., Horning, J.T., Lampson B.W, Mitchell J.G. (1984). Interscript: A Proposal for a Standard for the Interchange of Editable Documents. Xerox Palo Alto Research Center. 3333 Coyote Hill Road, Palo Alto, California.
[3] CCITT (1984). Recommendation T73. Document interchange protocol for the telematics services.
[4] Furuta, R., Scotfiled, J., Shaw, A. (1982). ``Document Formatting Systems: Survey, Concepts, and Issues.'' ACM Computing Surveys, 14, 3, 417-472.
[5] Goldfarb, C.,(1978). ``Document Composition Facility: Generalized Markup Language (GML) User's Guide''. Technical report SH20-9160-0. IBM General Products Division. ACM Computing Surveys, 14, 3, 417-472.
[6] International Standard Organization/ TC 97/ SC 18 (1985). Information Processing. Text and Office Systems—Document Structures. Draft Proposal 8613
[7] International Standard Organization/ TC 97/ SC 18 (1985). Information Processing. Text Preparation and Interchange. Processing and Markup Laguages. Draft Proposal 8879
[8] Joloboff, V., Pierce, R., Schleich, T. (1985). Document Interchange Standard ``Interscript'' International Standard Organization/ TC 97/ SC 18. Information Processing. Document N439R
[9] Reid, B.K. (1983). Scribe: Histoire et evaluation. Actes des journees sur la manipulation de documents, INRIA/IRISA, Rennes, France.
[10] Reid, B.K. (1986). ``Procedural Page Description Languages'' Text Processing and Document Manipulation, Proceedings of the international conference, University of Nottingham, 14-16 April, 1986, Cambridge University Press, 1986.
[11] Quint, V., Vatton, I. (1986). ``GRIF: An interactive system for structured document manipulation''. Text Processing and Document Manipulation, Proceedings of the international conference, University of Nottingham, 14-16 April, 1986, Cambridge University Press, 1986.