Chapter2.Tioga
Last edited by Rick Beach, October 30, 1984 10:51:18 am PST
¶2 Document Composition
2 DOCUMENT COMPOSITION
CONTENTS
¶2.1 Traditional Document Production Techniques
¶2.1.1 How do books get produced?
¶2.1.2 Roles involved in producing a book
¶2.2 Concept of Style
¶2.2.1 Style as design choices
¶2.2.2 What Do Styles Affect?
¶2.2.3 Media-Specific Styles
¶2.3 Early Typesetting Systems
¶2.4 Document compilers
¶2.4.1 troff
¶2.4.2 Scribe
¶2.4.3 TEX
¶2.5 Integrated composition systems
Etude (became Interleaf)
Janus (became IBM product)
Star
Tioga
WYSIWYG (or WYSIAlmostWYG or WYSIAllThatYG)
¶2.6 Document models and views of documents (mumble ...)
--
[Dear Reader: This draft of Chapter 2 is much rougher and took much longer than I wished. In the interests of progress, I am sending it in this form rather than delay any longer. RJB]
This chapter surveys the techniques for producing documents beginning with the traditional graphic arts process for turning a manuscript into a finished book. The survey investigates the concept of document style that arises from the graphic design discipline and that pervades modern electronic composition systems. The use of computers and electronics in document composition is surveyed next, first examining the early typesetting systems, then document compilers such as troff, Scribe and TEX, and finally integrated document composition systems such as Etude, Janus, and the Xerox Star. The survey of document composition techniques concludes with a discussion of several issues concerning the structure of information in documents and a description of some models of structured documents.
¶2.1 Traditional Document Production Techniques
Researchers make substantial use of books and journals in their everyday work. However few people understand how these documents are produced. Only when they decide to write their own book or to edit a scholarly journal do they become involved in the mysterious world of the graphic arts. This survey will help the reader to understand document production, to appreciate the roles and skills necessary, and to realize the vast number of details and parameters involved in producing high-quality documents.
¶2.1.1 How do books get produced?
An interesting review of how books are produced is provided in One Book/Five Ways [One Book/Five Ways]. Five university presses participated in a comparative publishing experiment in which each press prepared the same book for publication. The scholarly presses that participated were the University of Chicago Press, the MIT Press, the University of North Carolina Press, the University of Texas Press, and the University of Toronto Press.
The procedures used in each press were remarkably common. Although the approaches varied somewhat all involved the stages of acquisition, market and preliminary cost estimation, editorial, design and production, sales and promotion. Each of these presses documented their procedures, their forms, and the guidelines they applied to the various processes. One Book/Five Ways contains a rich collection of raw material for anyone interested in the publishing process.
In particular, style guidelines from each of the presses are included. These guidelines establish the house style of the publisher and govern editorial, graphic design, illustration, composition and typesetting decisions. Perhaps the most well-known style guideline for scholarly documents is the University of Chicago Manual of Style, which was referenced by several presses in this experiment although most have their own refinements and special instructions.
An important aspect of the book production process is the parallelism achieved. When a manuscript arrives at the press for consideration, it is quickly copied and sent out for two or more independent reviews. Once the decision to publish is made and the completed manuscript arrives, it is again copied and a copy sent to the production editor who establishes the docket to track all the various stages of the publication, to the copy editor for editorial revisions, and to the graphic designer to design the book and handle the illustrations. Figure 2-1 shows this parallel process in a simplified and hypothetical publication process.
--------------------
ThesisFigure2-1.press
leftMargin: 150 pt, topMargin: 172 pt, width: 324 pt, height: 234 pt
Figure 2-1. Parallelism in a simplification of the traditional graphic arts process for publishing a manuscript. The author's manuscript is copied and sent to the production editor, the copy editor, and the design/illustration department. Edited pages along with the design of the document are typeset by the composition staff. The typeset manuscript and the illustrations are then assembled into pages in preparation for printing.
--------------------
Other parts of the document publication process also involve parallelism. If the book is to have a jacket or cover illustration, then that illustration is undertaken while the insides of the book are prepared. The table of contents and Library of Congress submission are prepared as soon as the book enters production to ensure that the imprint page and the front matter of the book can be ready for printing.
The index is often on the critical path near the end of the document production cycle. Since index entries must have the correct page numbers, the index can not be fully completed until the pages have been assembled. Typically the index entries are compiled in parallel with the book composition. When the page numbers are assigned on the reproduction pages (or page repros) then the index manuscript is completed in parallel with the final proofreading of the book pages.
Even the use of electronic composition tools, preparation of the back matter is on the critical path and may end up with inconsistent page numbering. Such problems appear in the appendices of Introduction to Computer Graphics [Newman & Sproull] where the reference citations refer to a preliminary draft version since the authors forgot to revise the references in the appendices.
An important question to ask about the traditional graphic arts process is `What are the difficult parts and how are they handled?' Typically the difficult parts involve tables, mathematical or chemical notation, and illustrations.
There were only a small number of tables in the One Book/Five Ways experiment but they were treated separately from the main body of text. Many publishers rely on the skill of the compositor or typesetter to handle tables:
"A good composing room can translate almost any tabular copy in a reasonably clear and presentable example of tabular composition" [Williamson, Methods of Book Design, p 160]
The Chicago Manual of Style guides authors about the do's and don'ts of preparing tables in manuscripts. In particular, authors are to prepare tables on separate pages since they will be composed separately. There are some cautions also. For instance, Chicago no longer prefers vertical rules in tables because hot metal Monotype composition, that could prepare a vertical rule easily, is no longer economic and with phototypesetters vertical rules are difficult and expensive:
"In line with a nearly universal trend among scholarly and commercial publishers, the University of Chicago Press has given up vertical rules as a standard feature of tables in the books and journals that it publishes. The handwork necessitated by including vertical rules is costly no matter what mode of composition is used, and in the Press's view the expense of it can no longer be justified by the additional refinement it brings." [The Chicago Manual of Style, 13th Edition, 1982, p 325-6]
Although there were no mathematics in this experiment, math notation is treated very specially by publishers. Kernighan and Cherry note this difficulty in their paper on typesetting mathematics [Kernighan & Cherry]:
"Mathematics is known in the trade as difficult, or penalty, copy because it is slower, more difficult and more expensive to set in type than any other kind of copy normally occurring in books and journals." [The Chicago Manual of Style, 12th edition, 1969, p 295]
Some publishers specialize in mathematical and scientific documents and have both skilled copy editors on their staffs and special suppliers to handle the difficult mathematical material. In my experience, I have encountered North American publishers that send mathematics copy to the Far East where hot metal composition and cheap labour rates prevail.
The treatment of illustrations varied widely in the publishing experiment described in One Book/Five Ways. In one instance a publisher chose to have an artist prepare line drawings rather than include halftone photographs since there were no convenient local suppliers. Another publisher in contrast planned photographs for each chapter opening as well as for the illustrations. Generally, illustrations are prepared while the book is being copy edited and are manually assembled onto the completed pages.
Examining the book design and page layout used by most publishers reveals mainly the results rather than the design process itself. Page dummies and sample pages are the usual products of the design process. Page dummies are sketches of the page layouts prepared by the graphic designer for approval. Sample pages are pages typeset and assembled by the composition supplier. Both techniques may require several iterations between designer, supplier, and publisher to make certain that the publisher is satisfied and that all the style guidelines are followed. Unfortunately, this iterative design process generally means that the publisher's guidelines have never been complete, to the frustration of those attempting to become a supplier with new technology.
An area of great concern to the publisher is administration of the production process. Publishers must have several projects underway at the same time because of the delays involving revisions and approvals from the author of a single project. The production editor controls the document publication process for the publisher, determining time and costs estimates for the publication, selecting and contracting with suppliers, tracking the parallel stages of the composition process, and keeping records of deadlines and expenses. In a journal publishing situation, the problem is compounded by the dual pressures of multiple authors and the frequent publication deadlines for each issue.
These process control functions are the most important contributions of publishers. Some publishing companies are nothing much more than the production editors (except perhaps for marketing), subcontracting most of the skilled jobs such as copy editing, design, illustration, composition, printing, and so on. In the electronic publishing or self-publishing process these subcontracted jobs are conducted by the manuscript author and are the jobs that electronic document production tools will have to successfully complete.
¶2.1.2 Roles involved in producing a book
As I have outlined, the document production process is complex. To help understand the process better, this section examines the individual roles of the people involved in producing a published document. Anthropomorphism, or the application of human behaviour to some problem, has proven beneficial in trying to understand complex parallel processes that are modelled by computer programs [Dyment] [Booth & Gentleman]. The author of this thesis used anthropomorphism to design the multiple processes of a complex interactive paint program [Beach]. Through cataloguing the roles involved in document production, the structure of the problem becomes apparent and targets of opportunity reveal themselves in the context of electronic or automated document production environments.
--------------------
ThesisFigure2-2.press
leftMargin: 78 pt, topMargin: 156 pt, width: 372 pt, height: 222 pt
Figure 2-2. A hypothetical publishing process indicating the roles and their interactions at various stages. The horizontal axis represents elapsed time and the thin vertical lines join activities that begin or end at the corresponding point in time. Delays or inactivity are not shown but may exist at many places in the process.
--------------------
An important thing to remember while reading this categorization of roles is that they describe activities and not people. Sometimes people will fulfill several roles at once, such as an author that types and composes the manuscript, or a graphic designer that does the layout, illustration, and paste-up. The use of document composition tools in universities and research labs has tended to encourage (or force) authors to take on multiple roles. From this experience, people may falsely conclude that each job looks easier than it is, especially when one realizes the scope necessary to accomplish all aspects of that specialist's job. We need to concentrate on each role separately in order to understand the process.
· Author of the manuscript
The author creates the original manuscript. Generally, the manuscript is textual material, although for some subject areas there will be vast quantities of mathematical notation, computer programs, tables, line drawings or photographs. The author may produce several draft manuscripts with the assistance of a typist. Some authors now do their own typing with word processors or text editors. Sophisticated editorial tools, such as the diction and writing style analysis tools offered in the UNIX Writer's Workbench [Cherry, Writing Tools] [Macdonald, Writer's Workbench] and in other commercial editing systems [Seybold report], may be used by an author to improve the quality of the writing.
A draft manuscript is submitted by an author to an acquisitions editor or journal editor for consideration. With a favourable publishing decision, the author completes the manuscript, adds the front matter that may include a preface, an introduction, acknowledgements, etc. If the document is to be indexed or have other reference material, the author may have to prepare this material also. The completed manuscript is sent to the production editor who begins the publication process. Some publishers will now accept manuscript submission in electronic form, such as word processor diskettes or magnetic tape.
The author may be involved in reviewing decisions made by the publisher. The copy editor will mark the manuscript with suggested changes and questions to be dealt with by the author. The graphic designer or illustrator may send drafts of the book design and illustration artwork for review and approval. There may also be an indexer involved who may send the preliminary index entries to the author for review. The author will also have to check the composition process by first looking at the galleys and later proofs of the assembled pages.
· Typist
The typist prepares the draft manuscript for the author using a typewriter, word processor or text editor program.
· Acquisition Editor or Journal Editor
The acquisitions editor solicits and reviews new manuscripts from authors. Opinions of reviewers are sought to determine if the manuscript should be published. The publishing decision is made by a publication board or a committee of journal editors and is concluded by the signing of a publication contract or agreement between the publisher and the author.
· Reviewer or Referee
A manuscript reviewer may be asked by a publisher to give one of several opinions. Book publishers refer to these people as reviewers and journal editors refer to them as referees. Reviews made early in the process seek to establish the marketability of a manuscript or the appropriateness of a journal article. Later more comprehensive reviews seek to assess the subject coverage, research contributions, and technical accuracy of the manuscript. Reviewers are generally most concerned with the document content, although in some special cases they may also consider the format or style of a manuscript.
Some reviewers of technical material may use their own typesetting capabilities to capture their comments in the complex notation of their subject area, such as mathematics or computer programming. In some cases, the reviews may be transmitted electronically, especially when journal editors correspond via electronic mail networks.
· Production Editor
The production editor controls the document production process that turns a complete manuscript into its published form. Initially the production editor deals with the author to ensure that the manuscript is complete, that all the necessary illustrations are available, that all the sections of the manuscript are included, and that any special permissions to reproduce items are sought. The completed manuscript is copied and sent in parallel to the copy editor for editorial revisions and to the graphic designer for book design and illustration. Production editors contact and select appropriate suppliers for graphic arts services when those services are not available within the publisher. To help manage and track the various stages of several publications going on simultaneously, the production editor maintains a production database, either a paper one on the job docket or a computer one.
· Graphic Designer
The graphic designer provides the book design and layout guidelines. This design can only be effectively done when the entire manuscript is available, although some designs are attempted with incomplete information and later revised during publication. These design guidelines are written on a specification sheet or in a style sheet to be sent to the compositor with the copy edited manuscript.
Graphic designers may handle difficult typographic situations not covered in the general scheme, such as designing the layout for tables, and specifying typography for nested lists of material or for foreign language extracts.
Artwork for the illustrations may or may not be the additional responsibility of a graphic designer, depending on their talent or interest. Jacket or cover designs may also be the graphic designer's responsibility.
· Copy Editor
The copy editor ensures that the manuscript meets the publishers house style for language usage, grammar, spelling, citations, references, illustration captions, table arrangements, headings, lists of items, foreign language, et cetera, ad nauseum. The copy editor essentially deals with all the troublesome details that would annoy the reader if they were not treated consistently. For example, the copy editor checks all the cross references to other chapter or section numbers for completeness, and checks that all the captions, footnotes, citations are numbered sequentially. Any missing information or references and any questionable corrections are sent to the author for action.
Obviously electronic editing tools greatly assist the copy editor to accomplish these consistency checks. Multiple windows on a manuscript are helpful in checking cross references; pattern-matching search operations permit quick global checks; style and diction analysis tools mentioned earlier may be of assistance to check the grammar, spelling and language usage criteria of the publisher.
The copy editor also marks the manuscript for composition by identifying the logical parts of the document, such as the chapter openings, the various levels of section headings, the types of lists of items, the captions for tables and illustrations, etc. The treatment of the these logical parts is the responsibility of the graphic designer who specifies the typography for each of these parts to the compositor or typesetting program.
· Indexer
The indexer prepares the index entries for a manuscript, assigns page or reference numbers to each entry, sorts them, and creates an index manuscript. The indexing job may or may not be done by the author, although the author usually must approve the index manuscript. The indexer works with the manuscript in two stages: the copy edited manuscript prior to composition to determine the index entries, and the page proofs to assign the correct page numbers to the sorted index entries. The requirement for correct page numbers places the index on the critical path for publication and sometimes publications will not have an index to reduce the delay.
Electronic aids for indexing have not proven to be a panacea. Winograd and Paxton report the most interesting indexing tools that I have found [Winograd & Paxton] and yet they still required hand editing and fine tuning. The difficulty of preparing an index is the proper selection and cross referencing of index entry terms or phrases. Skilled indexers still produce better indices than most computer-generated ones. My experience in producing the indices for the Computing texts [WATFIV-S, PASCAL] proved the value of an iterative approach. Index terms were high-lighted in the draft manuscript, entered into the manuscript files, and collected automatically by the formatting program. The collected entries were sorted with a keyword-in-context package, and from the expanded list the indexers prepared cross references and new index phrases. Sorting index entries is straightforward, assuming the availability of a dictionary-order sorting package and the ability to handle multiple levels of major and minor index phrases. Assigning page numbers at the last composition pass is reasonable although this requires that all the index entries are entered into the manuscript file, a tedious job not tolerable under tight time constraints..
· Illustrator, Draftsman, Graphic Artist
The illustrations are prepared from initial artwork provided by the author. The range of illustrations spans fine art produced by a graphic artist, engineering drawings prepared by a draftsman, and photographs supplied by the author or a photographic service. Often illustrations are produced by tracing the author's sketches, which results in revision cycles as the author more clearly indicates the correct intentions. [Scientific American selects illustrators by an iterative process; difficulty getting consistent and accurate illustrations for the Computing book lead to the graphical style research]
The graphic designer may produce the illustration artwork directly or may establish the guidelines for artwork size, reduction factors, line weight, typography, shading textures, materials, and so on. Reducing the original artwork improves the quality of the line drawings by making the line weights appear more consistent or or by sharpening the contrast in the image. Careful coordination of dimensions and text size on the original is necessary to ensure that the reduced artwork suits the surrounding typography when assembled on the page.
· Keyboarder, Coder, Inputter
The composition of a document is accomplished in two stages: entering the marked-up manuscript into a typesettable file, and then producing the type on a typesetting device. The manuscript entry job may be further subdivided into several phases: designing format codes, assigning codes to the copy editor's marks, and inputting the manuscript codes and text. All of these jobs may be accomplished by the same person or may be delegated to people with appropriate skills.
The typesettable files may be entered directly on some less expensive and slow typesetting devices, or kept on some storage medium, perhaps paper tape, floppy diskettes, rigid disks or magnetic tape, for more expensive and high-speed typesetters. Corrections to the typeset galleys are most often made by typesetting corrected pieces of the manuscript. In the case of large documents, the management of the corrections is a concern and poses difficulties for subsequent uses of the document, but the expense and delay of retypesetting the large document is often prohibitive.
· Compositor, Typesetter
The format codes for a document are created by a skilled compositor from the graphic designer's guidelines and the copy editor's marked-up manuscript. Typically there is one format code for each logical part of the document marked by the copy editor. For example, there might be a code for the chapter opening, for each level of section heading, to begin an indented list of items, and for a line of a table. The compositor must have the skill to enter specific typographic codes for unusual or difficult typesetting jobs, such as for mathematics, tables, illustration labels, copy fit text that must fit certain dimensions, and so on. The compositor runs the typesettable file through the typesetting device and produces the typeset galleys or pages.
· Paste-up Artist
Most documents are typeset in galley form and later cut and pasted into page assemblies. The paste-up artist collects all the pieces of the manuscript in their final form: the typeset text, the running heads and page numbers, the mechanical artwork for the illustrations, and all the photographs. Pages are assembled by cutting out the parts that will be placed on each individual page and pasting them onto page layout forms. These layout forms are typically printed with light blue lines that will not reproduce when photographic negatives for printing are taken.
In foundry type, the assembly process involved moving metal type slugs into place and performing craft operations like surrounding type slugs with furniture to provide the spacing for page layout, or kerning individual letter slugs by cutting off the corners to make them fit together better. In phototypesetting shops, the paste-up process requires a sharp knife and a waxing machine that makes the photopaper adhere to the layout forms.
The graphic designer may paste-up a document, especially if the manuscript requires frequent design decisions. In such cases it is really difficult to determine the rules and logic that were applied to accomplish some of these layouts.
· Process Camera Operator, Stripper
After the page assembly stage, the completed pages are ready for printing. Depending on the printing process, it may be necessary to use a large-format graphic arts process camera to prepare photographic negatives of each page. The negatives are in turn used to expose printing plates. Text and line art illustrations are photographed directly on very high contrast film, whereas photographs are screened or halftoned to provide the tonal variations on the high contrast film. If the printer is capable of printing several pages in one pass, then the stripper must prepare an imposition of several pages into one printing signature.
The graphic arts process of producing printing plates from master images is equivalent to the concept of rendering device-independent image masters like Interpress from Xerox [Interpress] and PostScript from Adobe Systems [PostScript].
· Printer
Printing processes vary depending on the number of copies or impressions required. Short-run printing, say up to 50 copies, can be printed cost-effectively with a Xerox copier from a paper original. Medium-run printing, say from 50 to 1,000 copies, can be printed with an offset duplicator using an inexpensive paper-based printing plate. Long-run printing, say from 1,000 to 10,000 copies, are generally printed with high-speed offset printing presses in signatures containing several pages and using metal printing plates.
If the document requires colour, then there will be must be separate impressions made for each printing ink colour. These colour separations may be prepared by an outside supplier working from a slide transparency of the coloured image, or the separations may be made by the process camera operator from colour-keyed parts of the original document.
· Binder
The printed pages must be collated and bound together to form a completed document. The bindery specializes in taking the bulk pages, possibly in signature form, folding them, collating them together in the correct sequence, sewing or otherwise fastening the pages together, and trimming the pages to finished size. The cover, whether a cloth-covered hard-cardboard case or a strong paper back, is attached around the document. Any printing on the cover or jacket will be designed and printed in time for binding.
¶2.2 Concept of Style
It is important to observe that there are an incredible number of choices of design parameters that go into producing a document. How do these people make the choices? What controls their choices? How are their choices communicated when they are made?
¶2.2.1 Style as Design Choices
Many design choices are involved in the process of producing a document. For example, the copy editor chooses names for the logical parts of the document and communicates them to the graphic designer and compositor on the marked-up manuscript. The graphic designer chooses the typographical parameters for these marked parts of the manuscript and communicates them to the compositor on type-specification sheets. The compositor acts on the mark-up codes with the type specifications and enters typographical formatting codes in the typesettable file.
All of these choices influence the publishing style of the organization. The dictionary definitions of `style' and `style book' help clarify what style means and how it can be used:
"style n. 1. The way something is said or done, as distinguished from its substance. ... 7. A customary manner of presenting printed material, including usage, punctuation, spelling, typography, and arrangement." [American Heritage Dictionary]
"style book n. 1. A book giving rules and examples of usage, punctuation, and typography, used in the preparation of copy for publication." [American Heritage Dictionary]
The publisher develops their own house style, a way of doing things that will provide a distinctive flavour to documents from that publisher. Perhaps one of the most well known of these style books is The Chicago Manual of Style [The Chicago Manual of Style] from the University of Chicago Press, although most publishers have their own variation. In the publishing experiment described in One Book/Five Ways, the University of Toronto Press provided the most concise set of composition style guidelines which covered the following topics:
text composition: word spacing, word division, letterspacing, paragraphs, leading, small capitals, figures (numerals).
punctuation: dashes, periods, apostrophes, colons, semi-colons, exclamations, question marks, ellipsis, quotations.
special settings: capitals, tables (avoid vertical rules like Chicago), footnotes, extracts, quotations.
page makeup: facing pages, widows.
Who contributes to the publishers' distinctive style? The editorial staff establishes the guidelines for authors and copy editors, such as recommended forms of presentation, spelling and language usage, or the avoidance of vertical rules in tables. Graphic designers have a direct impact on the publisher's style by selecting the typography for book designs. The composition staff frequently determine the final typesetting choices through an iterative process whereby sample pages are modified and approved by the publisher. My experience in providing sample pages to a publisher revealed how significant this iterative design process was; many of the guidelines are left unwritten since other compositors have satisfied the publisher without specifying all the details. This situation existed because ``it was the way it was always done,'' or because their equipment never gave them that choice to make. Authors may affect the style of their documents by writing in a particular way for certain audiences, such as for management versus technical readers, or by organizing the manuscript and illustrations in a particular manner.
¶2.2.2 What Do Styles Affect?
Style may seem to affect or control more than just the appearance of a document. For instance, consider the choice between Canadian and American spelling, something that might be treated as a style choice. Clearly different spellings contain different letters, such as in words like colour vs color, labelling vs labeling, or the same letters in a different order, such as in words like centre vs center. How can the concept of style accommodate these apparent changes in substance?
We need to realize that style can accomplish changes at many different levels. The change in spelling does not change the meaning of the sentence containing those words, therefore the substance of the meaning remains constant while the spelling varies. In fact, many Canadian and American readers easily pass over these different spellings. The style may have changed the characters but not the semantics.
Extending the language processing tricotomy of lexical, syntactic, and semantic analysis, style can be seen to affect primarily the first two stages of analysis. Style at the lexical level affects the token's appearance, such as the choice of spelling of the concept `to place something midway between two limits' as either `centre' or `center.' More common lexical style changes are the use of distinctive typefaces for section headings, the inclusion of whitespace above and below section headings, etc. In fact, most typographic parameters fall into this lexical category of style.
Style at the syntactic level affects the order of information in the document. One example is the order of names in a bibliographic citation; one style places the surname before initials, while another style places initials before the surname. Another syntactic style example is the placement of parts of a document during page layout, such as locating figures at the top or bottom of a page and collecting all footnotes at the bottom of each column.
Style is possible at the semantic level by providing different readers with different views of the document. For instance, a document on how to use the Cedar mail system on a new kind of file server [van Leunen] was prepared for readers with different backgrounds. The document was written as modules of information for one of three kinds of audiences: those who had never used the mail system, those who had used the mail system but stored their files locally, and those who had used the mail system and had some experience with the new file server. A text map was used to compile three versions of the document from the various modules. However, three styles could have been used to let each type of reader view the appropriate set of modules.
¶2.2.3 Media Specific Styles
Another dimension for style is the differentiation in media. Traditional printing processes provides some variation in colours and papers, but other reproduction technology and electronic documents span a broader range of devices. Documents that become projection slides, posters, or video displays represent some of these possibilities.
The device independence notion from computer graphics can be carried over into document formatting. The survey article on document formatting by Furuta et al. presents the notion of a `view' of a document being a device-independent post-processing of a formatted document for a particular device. However, the device capabilities may influence the appearance and readability of the information in the document. In this case, device independence is less desirable; rather we wish to reformat the document to take advantage of the device characteristics or to change the style to suit the media in which the information will be presented.
Low-resolution devices without colour must obviously use different techniques than high-resolution colour laser printers. Type families are hard to distinguish on low-resolution devices; 8 point Times Roman is difficult to distinguish from any other serif typeface, such as Garamond or Baskerville, because there are so few bits available to display the subtle differences. A colour image may loose much of the information when viewed in black and white, especially on low-resolution devices that display only a few grey levels.
In these cases, style much be capable of capturing the device specifics of the range of possible devices. Cargill presents several interesting ideas for managing different views of software [Cargill] that apply to the management of different styles of documents. More about document models and views is discussed later in this chapter. [what about Engelbart's notion of views? to be determined...]
¶2.3 Early Typesetting Systems
The early use of computers in graphic arts typesetting systems has been chronicled in several interesting books. One report of a computer composition system is Barnett's Computer Typesetting [Barnett] that describes his work at MIT in the early 1960's. Arthur Phillips' compendium Computer Peripherals and Typesetting [Phillips] describes the computing and typesetting technologies that were being applied in the graphic arts industry up to the late 1970's. Seybold's classic book, Fundamentals of Modern Photocomposition [Seybold], includes both surveys of the first three photocomposition generations and the state of document preparation systems, as well as his seminal thoughts on the problems of area composition (page layout), computer-generated halftones, and integrated system solutions. Phillips' later book, Handbook of Computer-Aided Composition [Phillips], describes the evolution of electronic tools in the publishing and printing industries. Berg's Electronic Composition [Berg] provides a complete assessment of the issues in composition systems, much in the style of a consultant's report on the options available and pitfalls to avoid.
Most of the early graphic arts systems used rather small resources and simple approaches to the rather complex problem of producing typeset documents: Barnett used the IBM 709 at MIT; Seybold describes the composition software run on an IBM 1130; and the first stand-alone typesetter at Waterloo, a Photon 737 Econosetter, had only a 4K program memory. These computer programs accepted typographic codes that mimicked the manual actions of a typographer using a hot-metal type-casting machine. The coding structure intermixed the action codes with the text character codes, and due to the use of shift-codes, super-shift codes, and even upper- and lower-rail shift codes, the text was often inscrutable for editing purposes.
Table formatting was an early application of computers in typesetting. The earliest such publication that I am aware of was the 1962 NBS Monograph 53, Experimental Transition Probabilities for Spectral Lines of Seventy Elements, by Corliss and Bozman [Corliss and Bozman, 1962]. Since computers were generating numeric data and since typesetting equipment was being driven from magnetic tape, it was natural to combine the two together. This monograph contained only a single table and the table formatting was accomplished by a special purpose program.
Another class of document composition systems evolved from the text formatting programs developed on general purpose computer systems. The evolution of such formatters from Saltzer's RUNOFF document formatter [Saltzer] is chronicled in Brader's thesis, An Incremental Formatter [Brader], and in the Computing Surveys article by Furuta et al., ``Document Formatting Systems'' [Furuta, Schofield and Shaw]. Documents for such formatters were presented as a stream of characters that included embedded control codes. Earliest systems used the period character at the beginning of a line of input, an unlikely occurrence in normal written material, to indicate the presence of a formatting command. Later systems escaped from the `line of input per command' restriction by designating command delimiters from infrequently used characters like braces, backslashes or at-signs. Macro and conditional execution facilities for commands extend the range of document formatting possibilities. One tenet of documentation folklore at that time was if you could make writing a document more like programming, then programmers would take the time to prepare documentation for their work, something which proved difficult to ensure.
The model of a document as a stream of text with embedded commands survives today as a prevalent document formatting model. One consequence of the stream document model in both the early graphic arts systems and the early document formatters is the need to accept the document stream as an abstraction of the formatted document. One early system provided an alternative document model and several alternative views of the document.
The editing and formatting part of Engelbart's augmented human intellect system, NLS [Engelbart], provided a concrete view of formatted documents as they would appear when printed, without the intrusion of formatting commands. The NLS system was the original `what you see is what you get' document formatting system and Engelbart coined the phrase WYSIWYG (pronounced whizy-wig) to describe it. Note that due to the limitations of the display and printing devices, NLS was exactly a WYSIWYG system, unlike later systems that also claim to be WYSIWYG but cannot claim to render printed output exactly on the display.
In a further departure from the stream of text and embedded commands model, the NLS system represented the document contents in a tree-structured hierarchy of text blocks, such as the common hierarchy of chapters, sections, subsections, and paragraphs. The reader of the on-line document could display one of several views. For example, one viewing parameter controlled whether the structure labelling was visible or not, another parameter controlled the number of hierarchy levels displayed, and yet another controlled the number of lines displayed of each text block. NLS could also incorporate line drawings within documents by allowing a graphical object to take the place of a paragraph.
Many early graphic arts typesetting systems did not attempt to deal with page layout but only produced typeset galleys to be pasted-up manually in the normal process. However the RUNOFF-style formatters provided some limited page breaking capabilities and they could print running heads and footnotes. Such formatters relied on the simple and easily-handled dimensions of fixed-width characters on a line printer or teletype page to make the algorithms workable.
Typesetting document formatters which were extensions of the RUNOFF model and could produce output for typesetters paginated documented by executing page breaking algorithms coded as macros. Some early typesetting work with PROFF [PROFF], a RUNOFF-like formatter for the University of Waterloo's Photon Econosetter, used simple page depth measurements to break large documents into pages. This was done mainly to avoid the manual paste-up stage due to a lack of available manpower to handle the number of pages produced. Seybold [Seybold] outlines many of the concerns for page layout or area composition addressed by commercial typesetting suppliers.
More complex typesetting systems for high speed typesetters, like the Page-1 composition language [Pierson] for the RCA Videocomp, required page breaking logic to better utilize the typesetter for very large documents. Page-1 is one of the few early composition systems with a widely available published description. A programmer would write a page breaking algorithm and style handling routines in the Page-1 language, have that compiled, and then execute the resulting composition program against the document input data.
¶2.4 Document compilers
A significant stage in the evolution of document formatters occurred when the embedded commands in the document began to describe the logical content of the document. Now one could specify that this part of a document was a heading by including a tag like .heading rather than a sequence of detailed commands like leave 24 points of whitespace, select Times Roman bold typeface, 14 point type size, unjustified line ending, and so on. Such a tag model requires a level of indirection to associate the detailed commands with the tag. Initially this association was provided by a macro processor treating the tag as a user-supplied macro, while later systems like Scribe provided a document formatting language to define how to handle the tag and a document compiler to produce the formatted document.
The use of document tags involved more processing than handling simple typesetting commands. As a consequence the development of document compilers was restricted to larger general purpose computer systems while commercial graphic arts systems lacked these features and remained on less expensive and smaller mini-computers.
This evolution of document compilers provided the means to accomplish a document style. Document design was separated from the coding of a manuscript to use the elements of that design. A document style could be designed once and shared among a set of documents, such as all the chapters of a book, all the theses written at a university, or all journal articles submitted to a particular journal. Then authors, who generally lacked the skills of document design and who were in much greater numbers, could produce good-looking documents by choosing and inserting the appropriate tags. While this strategy did not handle all the author's desires, it did extend the range of documents the author might produce.
The separation of design and coding provides important leverage to use the document content in different situations by changing the style definition. At Bell Laboratories while troff was being developed, a manuscript would often be published in three forms: first as an internal memorandum circulated within the lab, second as a technical report cleared for external review, and third as a journal article. A single set of macros could format the same manuscript for each of those three designs by substituting different style parameters. In fact, there were several journal styles developed for JACM, CACM and ACM conference paper formats [Steve Johnson, private communication].
However, the notion of document compilers implies connotations of a massive undertaking. Indeed, problems with compiling monolithic documents occurs frequently. Large documents often grow from smaller ones rather than being planned, resulting in large computing requirements, long turnaround, and delays in producing drafts of the formatted document. There is a tension between making the document out of smaller modules and managing the pieces. Simple problems like numbering the pages sequentially can be a problem with document compilers.
The use of a document compiler presents similar debugging problems as for computer languages. Debugging tools are necessary. My first typeset pages produced on a high-speed typesetter were fifty pages with a single column 1.5 inches wide; the error was in the logic of my width specifications caused by unexpected interactions of my macros. Debugging documents involves syntax checkers, simulators of the final output device on less expensive or faster devices, and perhaps previewers to display the typeset document on an interactive graphics terminal. The complexity of writing document format designs in the language of the document compiler leads to the designation of gurus, experts and wizards.
Document compilers have not solved the entire document production problem. Some aspects of documents may not be handled at all or very well, such as tabular composition, incorporating line drawings, and almost certainly raster images from scanned or continuous tone images. The lack of integration of all aspects of the document may require special handling or processing.
Final corrections and revisions are frequently done by manual cut and paste methods because reprocessing the document would require too long or create new problems. Certainly, some kinds of changes are much easier to make to a compiled document, such as a correcting a chapter heading in one place and having that correction automatically affect the chapter opening, the table of contents, and the running heads for all the pages in the chapter. However, some algorithms in the document compiler may not handle all the situations or may not produce acceptable results in some situations. Avoiding widows, hyphenation problems, and rivers of whitespace may be things that are too difficult or costly to program and may have to be handled by intervention in the automatic techniques.
The following sections describe aspects of three document compilers, troff, Scribe, and TEX. Of special interest will be the way these systems handle style, mathematics composition, illustration, table formatting, and page layout.
¶2.4.1 troff
troff is a document formatting machine language
its strength is that document tools are implemented as preprocessors:
tbl for tables
eqn for mathematics notations
pic and ideal for line drawing illustrations
refer for bibliographic references
filter/pipe model, tools are executed in a sequential only, one pass only
diagram of the model, determine the order
curious situation where document command is treated differently at different stages in the pipeline
.TS and .EQ are commands to tbl and eqn but simply macro names to troff
strength of the model is its simplicity
you can create your own toolets to solve difficult problems
style mechanism
macro packages provide style -ms and -me
two schemes
parameterize the macros to produce different styles, e.g. .RP and .TM produce different title pages froom the same macros
replace different macro definitions to create different effects, e.g. JACM, CACM, and conference format sets of -ms macros or writing -ms macro definitions to produce a book design
mathematics composition
special language parser generated from yacc productions generated troff code and troff does the actual positioning and character output
uses boxes model, like Shaw's picture defintion language, defines head and tail to have anchor points to talk about
knows no math concepts; you supply your own spacing because operators are context sensitive and language parser is context free
illustration
lack of style indirection
simple line drawings
table formatter
very comprehensive facility, can do most any table design
sometimes used to produce boxed illustrations [Rosenfield's GIIT paper]
awkward to provide table style, can't use macros since wrong stage of the pipeline
layout mechanism
diversions implied multiple streams
recombine diversions as big boxes
pipe model limitations
no recursion
no tables of tables
awkward to format text within illustrations
illustrations without tables if you want tables with illustrations
implementation limitations influence view of document formatting
troff is an old program, facelift when device independent ditroff
collision of two-character register names in macro packages and use by preprocessors
space limitations on ???
author built system to alleviate many of the shortcomings of troff
TYPE
macro programming language facilities, conditional execution
[Macros have similar syntax to TRAC or GPM]
register names real symbol table
data structures grow dynamically
math matrices and alignment
tables: no preprocessor, but more typographic facilities
pagination
list of users: publishers (PH, Reston, UT Press, UW Press, UW CS Dept, SIAM, SIGGRAPH, Honeywell, UW Solid Mechanics Division)
¶2.4.2 Scribe
Reid's thesis [Reid] and Unilogic manual [Scribe]
form vs content made explicit
possibly difficult to override separation
document compilation made explicit
database of style environments
global solution to document composition
provide for interative solutions through symbol table
lack of preprocessors
no math, little table composition
good bibliographic stuff built into Scribe
check with Brian about tbl clone
¶2.4.3 TEX
Knuth made document formatting legitimate computer science topic [Knuth TeXBook, Digital Press book, AMS papers]
boxes and glue model and algebraic treatment of things
mathematics composition very important
special fonts
special operators that imbed math notation knowledge
general operations that work well for mathematics
global solution in one pass
notion of dynamic programming line breaking [Knuth & Plass]
dynamic programming pagination [Plass thesis]
can do everything although language is complex
penalties are indirect specification of desired results
example of leaders automatic folding of entries in the TeXBook
language promotes document formatting sublanguages
LaTeX [LaTex reference?]
¶2.5 Integrated composition systems
Etude (became Interleaf)
[MIT Reports, Shaw, Brader]
Janus (became IBM product)
now IBM product
[IBM Systems Journal article]
Star
office documents major focus [Xerox reports on Star, Seybold Report]
integrated several classes of objects
property sheets vs style sheets (attribute specifications, lack of indirection or naming, lack of scoping)
interactive user interface
Tioga
interactive, structured documents, WYSIWYG for display or printer
limited typesetting services (no footnotes, no floating figures)
extensible: user interface, client objects (artwork, photographs, tables)
WYSIWYG or WYSIAlmostWYG or WYSIAllThatYG (anything else is too hard)
Kelly's comment: how do I change style here?
How to control all this complexity?
Style specifications
abstract the attributes and parameterize the algorithsm
supply extensible specifications
future rule specifications could provide algorithms
--
¶2.6 Document models and views of documents (mumble ...)
flat vs structured
similar to batch versus integrated dicotomy
tree versus DAG
Engelbart NLS, Nelson Xanadu, van Dam Hypertext
distributed documents
data files
column order, sort row values to provide organization
my query formatter idea
rewriting rules for query matches to provide structure and formatting information
index generator
extract index entries and positions during formatting pass
Scribe symbol table approach
troff index file approach
my index tool for Tioga approach
node properties for index entry, additional properties for location and formatted location
operations on structured documents
replicate columns for finding things/viewing purposes [Phillips, Tabular Composition]
tick mark problem, adding finders (rules or space) every so many rows or entries [Malcolm, tick mark problem]
Cargill's notions of views
multiple views of information stored in one structure [Cargill]
providing redundant information (lister) for finding things (headers, contents, cross reference)
program visualization
Baecker & Marcus typesetting of C programs
other pretty printing examples
--
¶2.7 How to control complexity?
document production
laying out 2-dimensional information
how far can we push style mechanism?
to illustrations?
--