<> <> 4 Tabular Composition 4 TABULAR COMPOSITION ==================== 4.1 What is a table? The previous chapter concentrated on illustrations and provided a style mechanism for the graphical information within documents. We now turn our attention to the problem of laying out the arrangement of information in two dimensions. Table formatting presents a concentrated form of this problem. It has a strong two-dimensional nature because tables are composed of entries arranged into rows and columns. This chapter defines a general notion of a `table' and surveys early work in typesetting tables. It then describes the typography of tables and tabular composition. The final section reviews the capabilities for typesetting tables in existing electronic document composition systems. A good summary of tabular formatting in the graphic arts is contained in Phillip's article ``Tabular Composition'' published in The Seybold Report [Phillips, Tabular Composition]. A table is an orderly arrangement of information. Tables are defined to be `rectangular arrays exhibiting one or more characteristics of designated entities or categories' [, Dictionary]. Tables may be less structured than this, simply serving to present a list of entries. However in most cases, tables have some structure that is relevant to the presentation of information. We will take a fairly general view of tables, encompassing a broad range of layout possibilities. Within this view, the layout of mathematical notation might be considered a small instance of table formatting and page layout might be considered a large instance of table formatting. These comparisons will be elaborated in Chapter 6, where we will argue that the table formatting framework presented in Chapter 5 can be extended to deal with both of these other layout problems. The succinctness of a table aids in revealing and understanding complex relationships within the information. ``Tables offer a useful means of presenting large amounts of detailed information in small space. A simple table can give information that would require several paragraphs to present textually and can do so with greater clarity. Tabular presentation is often not simply the best but the only way that large quantities of individual, similar facts can be arranged. Whenever [the] bulk of information to be conveyed threatens to bog down a textual presentation, an author should give serious consideration to use of a table.'' [, The Chicago Manual of Style, 1982, p 321] Designing table typography is a hard problem. There are many formatting details to get right and there is only a small amount of space with which to work. The two-dimensional nature of tables requires alignment in both directions at the same time. It is very important to maintain control over placement because the organization of information in tables is part of the message. Juxtaposition and other spatial relationships within tables have an important impact on the way in which tables convey information. ``The principles of table making involve matters of taste, convention, typography, aesthetics, and honesty, in addition to the principles of quantification.'' [Davis, Tabular Presentation, p 497] There seem to be great opportunities for applying electronic techniques to typesetting tables of information: ``Tabular setting has proved both the easiest and the most difficult form of composition to bring under computer control. Because tabular setting is mainly for numeric data, it might seem strange that there should be any difficulty in providing computer-generated [typeset] tables.'' [Phillips, Handbook, p 189] However, tables in technical documents contain a wider variety of information than the traditional mathematical tables of roots, logarithms, and trigonometric functions. ``While many tables of physical and scientific data are being compiled by computer, there is still a requirement to include these data in technical publications because they are considered of interest to the reader who may not have access to the generating algorithms even if he is a computer user. The publication of such data in printed form may also be considered necessary to establish the status of the author! It would appear that the need for tabular composition in general bookwork will continue for some time.'' [Phillips, Tabular Composition] The sources and purposes of tables in documents span a broad range of information. Some examples include computed data from mathematical algorithms, statistical data from scientific experiments, financial data and spreadsheets, taxonomies of observed data, extracts from databases of information, or just about anything else an author might wish to convey to a reader. 4.2 Early Table Formatting Systems As mentioned in Chapter 2, several early composition systems could produce typeset tables. Computers were heavily involved in numeric computations at that time. Publishing tables of numeric results by traditional methods required the error-prone transcription of line printer or punched card data by keyboard operators of typesetting devices. Because photomechanical typesetting devices used electronic input data that were compatible with computer systems (mainly punched paper tape and occasionally magnetic tape), it was natural to conceive of a computer program that would convert the numeric data directly to the formatting commands suitable for driving the typesetting devices. These commands could then be passed directly between the computer system and the phototypesetter. Several reviewers [Barnett, Computer Typesetting] [Stevens, NBS99] [Phillips, Handbook] have reported that the earliest book of computer typeset tables was the monograph produced at the National Bureau of Standards by Corliss and Bozman in 1962 [Corliss&Bozman, NBS53]. Those tables of numeric calculations were formatted by a special program running on an IBM 7090 computer and were output onto tape for a Linofilm phototypesetter. The tables included column heads in bold type centered over numeric data aligned on decimal points. Reportedly, this monograph contained only a single tabular format throughout [Stevens, NBS99, p 6]. Another pioneering effort in typesetting tables was TABPRINT [Barnett, Computer Typesetting] developed by Barnett at MIT in the early 1960's. TABPRINT ran on an IBM 7090 computer and prepared tapes for a Photon 560 phototypesetter. The tabular data was input in a fixed format typical of numeric computations in that era. Typographic specifications for each table preceded the set of data records and provided rudimentary style capabilities. Several typographic refinements to the program were proposed as future developments, including such features as folding long column heads over narrow data fields, introducing blank lines every 5 or 10 lines, and grouping digits of long numeric values for readability. These programs for formatting tables of numeric data were relatively simple. ``The significance of this early work in tabular composition is that all the typographic parameters were defined by program.'' [Phillips, Handbook, p 195] However, tables of numeric data constitute only one aspect of table formatting problems: ``But there are really two very different categories of tabular composition: One comprises a book of similar tables in which the values shown can be calculated by program algorithms from the minimum of data input, and the other consists of the tables appearing in technical texts. In the first case the style is similar for many consecutive pages, but in the second case each table, and there are sometimes several tables on the same page, has different column widths, different numbers of columns, and also ranges the entries differently, both vertically and horizontally; in addition, each table may have different complex box headings.'' [Phillips, Handbook, p 189] To format the more general table designs required in technical publications, we need effective interactive design tools that can handle a wide range of typographic requirements. Interactive tools seem preferable because many table designs are unique. The variety of table designs limits the amortization period for the time invested in programming a table formatter with sufficient specifications to accomplish each arrangement. Phillips expected interactive table composition programs to be necessary because of the typographic complexity of tables: ``These complications will tend to keep interactive terminals employed for page make-up and with soft-copy proofs on page view terminals.'' [Phillips, Tabular Composition, pg. 23-11] The next section investigates these complications and the typographic requirements for formatting aesthetic tables. 4.3 Typographic Requirements for Tables 4.3.1 Tables are Two-Dimensional Tables have a two-dimensional structure because of the organization of the table into rows and columns. These row and column structures intersect to identify the characteristics of the table entry at the intersection. The layout of a table must simultaneously align table entries horizontally in a row and vertically in a column. The table width is determined by accumulating the widths of each column. In turn, column widths are determined by the widths of the entries in the column. Similarly, the table and row depth are determined by the depths of the entries in each row. The arrangement of table entries can be expressed separately from the actual widths of rows and columns. This separation of the topology (ordering) of entries from the geometry (positioning) of entries will be exploited in Chapter 5. The two-dimensional nature of tables differentiates table formatting from simpler text formatting. Tables deal with areas and graphical relationships, both of which have two degrees of freedom. Lines and paragraphs of text have only one degree of freedom (where to break the line), although even then a complex algorithm may be necessary to produce aesthetic line breaks [Knuth, Line Breaking]. The conventional two-dimensional structure of tables is illustrated in Figure 4-1. The rows and columns intersect to form the table entries in the panel which is the main body of the table. The area with row identifications at the left of the panel is called the stub. The column headings in the area along the top of the panel are called the box head because when a table is fully outlined the column heads are completely boxed. Some headings group several columns together and are referred to as spanning heads or spanning subheads, depending on the depth at which they occur in the box head. Spanning row headings for several rows are also possible. As an example, the box head in the table of Figure 4-1 has completely determined the width of each column because the headings themselves are wider than the information in the columns. In other tables, the table entries may be wider than the headings above them, and they would then determine the column widths. ==================== Grid 6 Rows 6 Columns ByRowThenColumn Background (0,0) (3,6) 0 0 0.85 Background (1,0) (6,1) 0 0 0.7 Rule (0,0) (0,6) 1 bp Rule (1,1) (1,5) 1 bp Rule (2,2) (2,4) 1 bp Rule (3,0) (3,6) 1 bp Rule (5,0) (5,6) 1 bp Rule (6,0) (6,6) 1 bp Rule (0,0) (6,0) 1 bp Rule (0,1) (6,1) 1 bp Rule (1,2) (6,2) 1 bp Rule (2,3) (6,3) 1 bp Rule (1,4) (6,4) 1 bp Rule (0,5) (6,5) 1 bp Rule (0,6) (6,6) 1 bp Box (0,0) (3,1) Center Center 6 bp 6 bp 6 bp 6 bp Stub Head Box (0,1) (1,5) Center Center 6 bp 6 bp 6 bp 6 bp Spanning Head Box (0,5) (3,6) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (1,1) (3,2) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (1,2) (2,4) Center Center 6 bp 6 bp 6 bp 6 bp Spanning Subhead Box (1,4) (3,5) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (2,2) (3,3) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (2,3) (3,4) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (3,0) (4,1) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (3,1) (4,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,2) (4,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,3) (4,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,4) (4,5) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,5) (4,6) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,0) (5,1) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (4,1) (5,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,2) (5,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,3) (5,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,4) (5,5) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,5) (5,6) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,0) (6,1) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Total line Box (5,1) (6,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,2) (6,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,3) (6,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,4) (6,5) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,5) (6,6) Center Center 6 bp 6 bp 6 bp 6 bp xxx Figure 4-1. THE TWO-DIMENSIONAL STRUCTURE OF A TABLE includes the arrangement of its entries into rows and columns. Here the parts of a table have been shaded for easy identification. The light grey area is the box head that contains all of the column headings. The dark grey area is the stub that contains all of the row identifications. The remaining white area is the panel containing the actual table entries. ==================== Various graphic embellishments to the basic row and column structures help convey the table information. Dividing lines called rules help separate dissimilar parts of the table. The box head and stub in Figure 4-1 are completely outlined; all possible horizontal and vertical rules are present in the headings. Some table designers prefer only horizontal rules (see below for more discussion of table rules). The word `rules' will appear frequently in this and the next chapter, in relation to the typographic lines (rulings) drawn in a table to separate rows or columns. This use of the word is traditional in the graphic arts. However, it may be confused with the notion of `style rules' from the previous chapter. Throughout this thesis, the word `rule' by itself refers to a typographic line and terms `style rule' and `formatting rule' refer to a way of doing things. The content of table entries may vary considerably. Certainly textual and numeric information are commonly organized into tables. Other types of information often included in tables are pictures, illustrations, mathematical equations, and even other tables. In most table designs, the table entries are fully contained within the row and column intersection. More general table designs permit the content of one table entry to flow into another. Connected entries would be necessary when folding a long table entry of text into two column entries, or when flowing a caption around several illustration entries. This capability is necessary to extend table formatting to full page layout requirements. 4.3.2 Typographic Treatment This section discusses the wide range of typographic details required to format tables. Careful text placement is an obvious requirement steming directly from the two-dimensional structure of tables. Alignment choices to guide the placement are also needed. Formatting attributes can be applied to different parts of the table structure. The treatment of whitespace, typographic rules, and rows of dots between table entries are devices for guiding the eye along rows or columns. Footnotes on table entries must be referenced and positioned appropriately. Finally, readability concerns in table formatting are important. Fine Resolution Placement Compared to the line lengths in normal text, table entries are formatted to a relatively short line length within each column. These short line lengths force more hyphenation and line breaks in text entries. Placing several narrow columns side-by-side requires inserting some whitespace between each column to improve readability. Centering table entries or balancing space between entries also requires fine control of their position. These small distances must be chosen carefully since the human visual system perceives patterns and groupings, whether intentional or not, and bad choices may dramatically affect the way the table is interpreted by a reader. Typesetting devices often provide resolutions in very small units, a common one being 1/10 of a printer's point (about 1/720 of an inch). Formatting a table with the coarse positioning typical of a fixed-pitch printer or typewriter is a much easier task. The results are normally not very aesthetic, since the fixed-size units are quite large, but they eliminate many choices and decisions: ``Tabular material is always difficult to typesetmuch more so than to compose on the typewriter. This is true even though figures have a `monospaced' value. Letters do not, and therefore it is more difficult to align material or even to determine what will fit in a given space . . . The monospaced typewriter  where you can actually visualize what you are setting  is certainly the simplest way for the novice to proceed. And it will not be an easy task for the typesetter to imitate what the typist has done.'' [Seybold, Fundamentals] Alignment within Tables The alignment choices within tables correspond to the two-dimensional nature of table layouts. Horizontal row and vertical column alignments predominate. However, other alignments for spanned headings, equal width rows or columns, and balancing the extra whitespace between columns are common. Column entries are vertically aligned with each other in various ways, as seen in Figure 4-2. (Note that one must adjust an entry horizontally in order to align it vertically with another above or below it; such distinctions are made carefully in the remainder of the thesis.) The three most frequent choices for vertical alignment are flush to the left (generally for textual material), flush to the right (generally for numeric material), or centered within the column (generally for headings and textual material). ==================== Grid 5 Rows 4 Columns ByRowThenColumn Rule (0,0) (0,4) 1 bp Rule (1,0) (1,4) 1 bp Rule (5,0) (5,4) 1 bp Rule (0,0) (5,0) 1 bp Rule (0,1) (5,1) 1 bp Rule (0,2) (5,2) 1 bp Rule (0,3) (5,3) 1 bp Rule (0,4) (5,4) 1 bp Box (0,0) (1,1) Center FlushLeft 12 bp 12 bp 3 bp 3 bp FlushLeft Box (0,1) (1,2) Center Center 12 bp 12 bp 3 bp 3 bp Center Box (0,2) (1,3) Center FlushRight 12 bp 12 bp 3 bp 3 bp FlushRight Box (0,3) (1,4) Center FlushRight CharAlign '. 12 bp 12 bp 3 bp 3 bp Decimal.Align Box (1,0) (2,1) Center FlushLeft 12 bp 12 bp 3 bp 3 bp xxxxxx Box (2,0) (3,1) Center FlushLeft 12 bp 12 bp 3 bp 3 bp xxxxxxxxxxx Box (3,0) (4,1) Center FlushLeft 12 bp 12 bp 3 bp 3 bp xxxx Box (4,0) (5,1) Center FlushLeft 12 bp 12 bp 3 bp 3 bp xxxxxxxx Box (1,1) (2,2) Center Center 12 bp 12 bp 3 bp 3 bp xxxxxx Box (2,1) (3,2) Center Center 12 bp 12 bp 3 bp 3 bp xxxxxxxxxxx Box (3,1) (4,2) Center Center 12 bp 12 bp 3 bp 3 bp xxxx Box (4,1) (5,2) Center Center 12 bp 12 bp 3 bp 3 bp xxxxxxxx Box (1,2) (2,3) Center FlushRight 12 bp 12 bp 3 bp 3 bp xxxxxx Box (2,2) (3,3) Center FlushRight 12 bp 12 bp 3 bp 3 bp xxxxxxxxxxx Box (3,2) (4,3) Center FlushRight 12 bp 12 bp 3 bp 3 bp xxxx Box (4,2) (5,3) Center FlushRight 12 bp 12 bp 3 bp 3 bp xxxxxxxx Box (1,3) (2,4) Center FlushRight CharAlign '. 12 bp 12 bp 3 bp 3 bp 000000 Box (2,3) (3,4) Center FlushRight CharAlign '. 12 bp 12 bp 3 bp 3 bp 00.000000 Box (3,3) (4,4) Center FlushRight CharAlign '. 12 bp 12 bp 3 bp 3 bp .000 Box (4,3) (5,4) Center FlushRight CharAlign '. 12 bp 12 bp 3 bp 3 bp 0000.0 Figure 4-2. VERTICAL ALIGNMENT WITHIN A COLUMN of table entries is commonly flush left, flush right, centered, or decimal aligned. ==================== Numeric data with a varying number of decimal digits require another type of vertical alignment where the data items align on the decimal point. Numeric entries without decimal points must have one inferred, usually after the last decimal digit. The alignment on decimal points can be generalized to alignment on any character. For example, mathematical equations are often aligned on their equality signs. More complex alignment possibilities arise when multiple alignment points are needed, such as aligning the terms of polynomials in a system of equations where each of the additive and subtractive operations require alignment (although the unary minus sign does not): Grid 3 Rows 7 Columns ByRowThenColumn Box (0,0) (1,1) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 10x1 Box (0,1) (1,2) TopBaseline Center 3 bp 3 bp 3 bp 3 bp  Box (0,2) (1,3) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 7x2 Box (0,5) (1,6) TopBaseline Center 3 bp 3 bp 3 bp 3 bp = Box (0,6) (1,7) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 7, Box (1,0) (2,1) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 3x1 Box (1,3) (2,4) TopBaseline Center 3 bp 3 bp 3 bp 3 bp + Box (1,4) (2,5) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 6x3 Box (1,5) (2,6) TopBaseline Center 3 bp 3 bp 3 bp 3 bp = Box (1,6) (2,7) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 4, Box (2,0) (3,1) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 5x1 Box (2,1) (3,2) TopBaseline Center 3 bp 3 bp 3 bp 3 bp  Box (2,2) (3,3) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp x2 Box (2,3) (3,4) TopBaseline Center 3 bp 3 bp 3 bp 3 bp + Box (2,4) (3,5) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 5x3 Box (2,5) (3,6) TopBaseline Center 3 bp 3 bp 3 bp 3 bp = Box (2,6) (3,7) TopBaseline FlushRight 3 bp 3 bp 3 bp 3 bp 6. Just as for columns, row entries are horizontally aligned with each other in various ways, as shown in Figure 4-3, again with three frequent choices, flush to the top, flush to the bottom, or centered. (Again, note that an item is adjusted vertically to accomplish horiztonal alignment.) ==================== Grid 3 Rows 4 Columns ByRowThenColumn Rule (0,0) (0,4) 1 bp Rule (1,0) (1,4) 1 bp Rule (2,0) (2,4) 1 bp Rule (3,0) (3,4) 1 bp Rule (0,0) (3,0) 1 bp Rule (0,1) (3,1) 1 bp Rule (0,2) (3,2) 1 bp Rule (0,3) (3,3) 1 bp Rule (0,4) (3,4) 1 bp Box (0,0) (1,1) FlushTop FlushLeft 12 bp 12 bp 6 bp 6 bp Flush Top Box (0,1) (1,2) FlushTop Center 12 bp 12 bp 6 bp -10 bp xxx xxxxxxxx xxxx Box (0,2) (1,3) FlushTop Center 12 bp 12 bp 6 bp -10 bp xxxxxxx xxxxxx Box (0,3) (1,4) FlushTop Center 12 bp 12 bp 6 bp 6 bp xxx+yyy+zz Box (1,0) (2,1) Center FlushLeft 12 bp 12 bp 6 bp 6 bp Center Box (1,1) (2,2) Center Center 12 bp 12 bp 6 bp -10 bp xxx xxxxxxxx xxxx Box (1,2) (2,3) Center Center 12 bp 12 bp 6 bp -10 bp xxxxxxx xxxxxx Box (1,3) (2,4) Center Center 12 bp 12 bp 6 bp 6 bp xxx+yyy+zz Box (2,0) (3,1) FlushBottom FlushLeft 12 bp 12 bp 6 bp 6 bp Flush Bottom Box (2,1) (3,2) FlushBottom Center 12 bp 12 bp 6 bp -10 bp xxx xxxxxxxx xxxx Box (2,2) (3,3) FlushBottom Center 12 bp 12 bp 6 bp -10 bp xxxxxxx xxxxxx Box (2,3) (3,4) FlushBottom Center 12 bp 12 bp 6 bp 6 bp xxx+yyy+zz Figure 4-3. HORIZONTAL ALIGNMENT WITHIN A ROW of table entries is commonly flush top, centered, and flush bottom as indicated by the stub labels on each row. The entries in the second and third columns have multiple lines of text. The last column contains entries with superscripts and subscripts that affect the height and depth of the text entry. Alignment without regard to baselines produces unaesthetic results, especially when centering an even and an odd number of lines, and when aligning entries with different heights and depths. A fine point: note that the capitalization of the stub labels affects their position when aligned. ==================== Row entries possess an additional characteristic similar to decimal-point alignment: the baseline on which successive characters are aligned. The rightmost column of the table in Figure 4-3 contains entries with baselines different from the other three columns. Without a horizontal alignment choice for baseline alignment, table entries with different baselines will not be arranged in a visually pleasing manner. This problem is addressed in Chapter 5. Spanned headings are aligned within a set of columns, or set of rows if the heading spans several rows. The set of columns spanned by the heading determines the aggregate dimensions of the spanned heading. Should the heading exceed this size, perhaps because it is longer than the narrow columns it spans, then the heading may be folded to make it shorter, or the columns spaced out to accommodate the long heading. Spanned row headings have similar needs. Equal widths of columns (or equal heights of rows) may be called for. In some cases the precise size will be specified by the designer and applied to the table. In other cases, the size can be determined automatically by the largest entry in the set of rows or columns. Formatting Styles Tables are often formatted with a different (but related) set of attributes to those used for normal text. Frequently tables are typeset in the same typeface but in a smaller point size, both to attract less attention to the table and to include more information. These changes in formatting attributes promote the use of a separate formatting environment or set of style rules for tables. Further specification of formatting attributes is necessary when rows or columns are to be distinguished. For instance, a row of totals may be the most important aspect of the table and therefore should be set in a bolder type face, or one column of information may be exceptional and thus be distinguished in an italic type face. Finally, individual table entries may be distinguished with special formatting attributes such as highlights. Whitespace Treatment The treatment of whitespace between table entries is more complicated than between paragraphs of text because there are more relationships for each table entry. The space between two columns of text is called the gutter in normal formatting, while the space between table entries is generally referred to as the bearoff or bearoff distance. The separation of rows or columns with whitespace helps to establish the apparent grouping of data. The introduction of rules into a table permits the physical separation to be reduced or eliminated since the grouping is provided by the rule. Some strategies for compacting large tables to fit a page (discussed later) involve shrinking the bearoff space. The bearoff may provide a place for a footnote reference or gloss marker to intrude between table entries without expanding the column width. These markers do not participate in the alignment of table entries and therefore need not be separated with the same bearoff distance. Excess whitespace due to a large spanned heading requires apportioning the space among bearoffs for the spanned rows or columns. Rules and Decorations The use of dividing rules within tables to separate rows or columns is a traditional practice. Rules run along the row or column boundaries in either the horizontal or vertical direction. In large tables with narrow columns, vertical rules are often indispensible in maintaining order among the vast quantity of data. The preference for horizontal rules is a recent phenomenon due in part to faddish design preference and in part to harsh economic reality. Consider the experience of the University of Chicago Press by comparing the statements from the 1969 and 1982 editions of The Chicago Manual of Style: ``Ruled tables, for example, are usual in the publications of this press, in part because Monotype composition has always been readily available. For a publisher who is restricted to Linotype, open tables or tables with horizontal rules alone may be the only practical way tabular matter can be arranged.'' [, A Manual of Style, 1969, p 273] ``In line with a nearly universal trend among scholarly and commercial publishers, the University of Chicago Press has given up vertical rules as a standard feature of tables in the books and journals that it publishes. The handwork necessitated by including vertical rules is costly no matter what mode of composition is used, and in the Press's view the expense can no longer be justified by the additional refinement it brings.'' [, The Chicago Manual of Style, 1982, p 326] The difficulty with inserting vertical rules stems from the mechanical properties of photocomposition devices. With manual makeup of pages from metal type, inserting rules involved laying down a thin metal strip. High-speed phototypesetting devices that have only a narrow aperture across the page are strongly biased towards the horizontal, both for typesetting text and for drawing typographic rules. This same bias towards the horizontal is reflected in the composition software that supports these devices. Newer typesetting devices with more accurate positioning of laser beams can print in both orientations with equal ease and eliminate this restriction. There are several distinguished rules that frequently occur in tables: the head rule above the box head, the cutoff rule below the box head, the spanner rule below a spanning head, the foot rule below the table, and the total rule above the total row. These rules may be of different thicknesses, with the outermost head and foot rules generally drawn thicker than rules inside the table. Rules come in a variety of shapes, sizes, and patterns. Different thicknesses or weights of rules provide appropriate emphasis. A common design is to use medium-weight rules for the head and foot rules above and below the table, and fine hairline rules for the cutoff rules between the column headings and the table entries [Williamson, Book Design, p 159]. Double rules or combinations of thick and thin rules are sometimes used to provide emphasis and closure to a table. The intersection of these patterned rules is a complicated affair. Braces that group table entries are sometimes required within tables. The brace is placed in the space between two rows or columns, sometimes requiring extra space to accommodate its curly shape. Braces are frequently added by hand from transfer lettering sheets because they are not supported by table formatters and their positions are awkward to specify and align properly. Ornaments, such as flowers or other interesting designs, are inserted at the corners or along the outer border of a table. They are old fashioned and used mainly as a decoration for the purpose of catching the reader's attention. Background tints were used in Figure 4-1 to highlight the different parts of the table. Traditionally, tints would be added by hand at the page makeup or camera stage since they involved halftone screens. Phototypesetters and laser printers can produce screens automatically by shading the area of the table before the content is typeset. Leaders Various graphic techniques, such as dot leaders, help the reader capture the content and meaning of the table. Leaders are the dot patterns that guide your eye from an item at one side of a table to the related item at the other side of a table. Headings in tables of contents are often connected with dot leaders to the page numbers on the right. Typically, leaders are formed from dots although dashes or rules are sometimes used. Dot leaders are positioned congruently so that successive rows of leaders all have the dots in the same horizontal position. The harmony of the aligned dots enhances their purpose of guiding the reader without distraction. Leaders cross through column gutters and possibly vertical rules, although rules are ill-advised when leaders are used. Footnotes within Tables Footnotes within tables pose an interesting layout problem. As in page layout, footnotes for table entries are collected and placed at the bottom of the table within the page area allocated to the table. This means that for the table formatter to accommodate footnotes, it must be at least as powerful as the page formatter. Most table formatters only handle footnotes placed manually within the table. By convention, footnote references are separately marked or numbered for each table. Typically, footnote references within tables use letters or symbols rather than superscript numbers to avoid confusion with numeric exponents in the data. Should footnote references be numbered, they usually are sequenced independently from any text footnotes. Readability Issues Tables of numeric information have been published for many years and there are classic methods for making tables more readable [Knott, Napier commemorative]. For example, long columns of numbers are separated with extra whitespace or with thin rules every 5 or 10 entries to provide `chunks' that help the human visual system scan the long columns. Background tints behind rows of a table are another technique to improve readability in long tables. Grouping digits in threes with commas or extra whitespace provides the same chunking for long decimal expansions of logarithms or trigonometric functions. 4.3.3 Large Tables are Awkward Tables tend to be awkward to handle in page composition. They must be treated separately from the running text because they contain separate information. However, the tables may be too wide for the page width or too long for the remaining space on the page, or even too long for the page height. Following are some of the problems and solutions for dealing with large tables. Common Strategies for Large Tables Tables are commonly formatted in a smaller type size to reduce the impact of the table on the reader. This choice also helps fit more information in a table. Reducing the point size to 70% or 80% of the text size reduces the character height and width proportionately. Common sizes for text are 10-point type on 12-point leading. Tables often use 8-point type on 9-point leading or even 7-point on 8-point. Compressed type faces have the same height but reduced width that permits more text in the same horizontal space. For example, Helvetica Light Condensed is a narrow font commonly used in tables. The bearoff distances between table entries can be reduced to eliminate whitespace and thereby reduce the width and height of a large table. Transposing rows into columns and vice versa [Williamson, Book Design, p 159] may make a large table fit the page. Wide tables with many columns are transposed into longer tables with fewer columns, and long tables with few columns are transposed into wider tables with many columns. A table and its transpose are shown in Figure 4-4. Note that the stub heads and spanning heads have been transposed in a nontrivial matrix transposition that preserves the column heading relationships. One must be careful about transposing statistical tables that might imply an incorrect cause and effect relationship [Zeisel, Figures, p 41]. ==================== Grid 5 Rows 6 Columns ByRowThenColumn Background (0,0) (5,1) 0 0 0.85 Background (0,1) (2,6) 0 0 0.7 Rule (0,0) (0,6) 1 bp Rule (1,1) (1,6) 1 bp Rule (2,0) (2,6) 1 bp Rule (5,0) (5,6) 1 bp Rule (0,0) (5,0) 1 bp Rule (0,1) (5,1) 1 bp Rule (0,6) (5,6) 1 bp Box (0,0) (2,1) Center Center 6 bp 6 bp 6 bp 6 bp Stub Head Box (0,1) (1,6) Center Center 6 bp 6 bp 6 bp 6 bp Spanning Head Box (1,1) (2,2) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (1,2) (2,3) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (1,3) (2,4) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (1,4) (2,5) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (1,5) (2,6) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (2,0) (3,1) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (2,1) (3,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (2,2) (3,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (2,3) (3,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (2,4) (3,5) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (2,5) (3,6) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,0) (4,1) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (3,1) (4,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,2) (4,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,3) (4,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,4) (4,5) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,5) (4,6) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,0) (5,1) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (4,1) (5,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,2) (5,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,3) (5,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,4) (5,5) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,5) (5,6) Center Center 6 bp 6 bp 6 bp 6 bp xxx Grid 7 Rows 4 Columns ByRowThenColumn Background (0,1) (2,4) 0 0 0.85 Background (0,0) (7,1) 0 0 0.7 Rule (0,0) (0,4) 1 bp Rule (1,1) (1,4) 1 bp Rule (2,0) (2,4) 1 bp Rule (7,0) (7,4) 1 bp Rule (0,0) (7,0) 1 bp Rule (0,1) (7,1) 1 bp Rule (0,4) (7,4) 1 bp Box (0,1) (1,4) Center Center 6 bp 6 bp 6 bp 6 bp Stub Head Box (0,0) (2,1) Center Center 6 bp 6 bp 6 bp 6 bp Spanning Head Box (2,0) (3,1) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (3,0) (4,1) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (4,0) (5,1) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (5,0) (6,1) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (6,0) (7,1) Center Center 6 bp 6 bp 6 bp 6 bp Col. Head Box (1,1) (2,2) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (2,1) (3,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,1) (4,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,1) (5,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,1) (6,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (6,1) (7,2) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (1,2) (2,3) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (2,2) (3,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,2) (4,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,2) (5,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,2) (6,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (6,2) (7,3) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (1,3) (2,4) Center FlushLeft 6 bp 6 bp 6 bp 6 bp Row Head Box (2,3) (3,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (3,3) (4,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (4,3) (5,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (5,3) (6,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Box (6,3) (7,4) Center Center 6 bp 6 bp 6 bp 6 bp xxx Figure 4-4. TRANSPOSING A TABLE may help make a table fit on the page. The top table is wide with more columns than rows. The bottom table is the transpose of the top table and is narrower with fewer columns than rows. ==================== Long Tables Some tables can be made shorter by folding a long column into multiple columns. For instance, one long list of names in a single column would become two or more lists of names. This folding trades off shorter table length with increased table width. Long tables that exceed the page height must be broken into smaller tables. Breaking a table is similar to breaking lines of text at page boundaries, and similar algorithms [Plass, Optimal Pagination] can be applied. However, broken tables must introduce continuation headings in the second and subsequent parts of the table. The continuation headings may be very complicated functions of the table entries: ``It would be asking rather a lot of a page make-up program to insert carried forward and brought-forward totals automatically at a table break, and indeed these were often omitted when tables were made-up by the hand compositor.'' [Phillips, Tabular Composition, p 23-11] The continuation headings can be supplied in the table input as variants of the regular headings. When a table is broken then these variations can be used. Brought-forward totals could be supplied automatically when the table structure and content is recognized within the formatting program, for example, in financial spreadsheets. This is an instance of a particular table entry (a total) that might compute itself on behalf of the table formatter (for the current total of all formatted entries). An extensible table content structure, such as that described in Chapter 5, provides a general mechanism for incorporating self-totaling table entries and other continuation headings. Wide Tables A table that is wider than it is long may be made to fit the page by rotating the table and printing it broadside. A broadside table has the long dimension of the table along the long dimension of the page, that is, rotated 90° so the rows read up the page and the columns read from left to right. Right-hand pages are preferred for such tables since a turned book will present the broadside table closer to the reader [Williamson, Book Design, p 271]. Broadside tables (or illustrations) impact page composition, because these pages are typically designed with page numbers in a different position and without running heads (otherwise the page numbers would appear in a different orientation to the broadside table and detract from the readability of the facing page). Instead of rotating the entire table to make a wide table fit the page, it may be sufficient to rotate the text of column headings to read vertically. Especially when the column headings are much wider than the column entries, turning the text so that it reads upwards with successive heading lines to the right reduces the column width. If column headings in a broadside table are turned, they should instead have the descenders to the left, otherwise the text would appear upside down on the page [Williamson, Book Design, p 159]. Wide tables may be formatted as a two-page spread across two facing pages. A two-page upright table would appear with the box head spread across the binding gutter. A two-page broadside table is possible with the rows split across the gutter. Continuation headings may not be needed in a two-page broadside table, but would be if the table continued onto subsequent pages. Extremely wide tables may be printed on a foldout plate. This requires special paper to be folded and inserted into the book at the binding stage. The extra manual handling makes this alternative very expensive and rarely used. Otherwise, wide tables are broken into smaller table parts with continuation stub headings. Any spanning headings in the box head will have to be continued across the break. Some reference columns, such as sequence numbers, may be repeated to assist in finding information in the continued table parts. 4.4 Previous Approaches to Table Formatting 4.4.1 The Typewriter Tab Stop Model for Tables The use of a fixed width character model for table formatting is a key simplification available with typewriters or line printers. The fixed width of each character on these devices permits a coarse grid with complete specification of the character positions. Spreadsheet programs take advantage of this to provide regularly spaced grids and simple typographic features. There are fewer possibilities for positioning and aligning characters when using a fixed grid, making the formatting problem much simpler. The typewriter tab stop model is often provided in document composition systems as a rudimentary table formatting capability. Tab stops are based on a physical escapement mechanism in mechanical typewriters. The carriage in old typewriters is spring-loaded and advanced to the next character position whenever a key is pressed. The tabulator key permits the carriage to fly to the right past several character positions until stopped by a mechanical `finger.' These fingers are the tab stops. Any number of them can be requested along the carriage. The measurement between stops is always in units of character positions. Furthermore, the stops indicate only the left margin of a tab column. Aligning numeric information or centering headings requires spacing the carriage manually. Teletype devices also have tab stops, but they standardized on 8 characters between stops to ensure that the sending and receiving devices would place characters in the same positions. Early computer terminals also have 8-character tabs, while later ones have settable tab stop positions. Document formatters extended the typewriter tab stop model to align numeric and centered information. Defining a tab stop requires specifying a formatting attribute for the position of the stop and for the alignment choice. Different formatters choose to interpret the tab stop differently as determining a position (Runoff) or a column (Scribe). The entries in Figure 4-5 are aligned at tab stops according to the different interpretations made by the Runoff class of formatters and by Scribe. ==================== Grid 4 Rows 5 Columns ByRowThenColumn ColConstraints ColConstraint 2*gx2 - 1*gx3 - 1*gx1 = 0 ColConstraint 2*gx3 - 1*gx4 - 1*gx1 = 0 ColConstraint 2*gx4 - 1*gx5 - 1*gx3 = 0 Box (0,0) (1,1) TopBaseline FlushLeft 12 bp 24 bp 6 bp 6 bp Runoff/troff Box (0,1) (1,2) TopBaseline FlushLeft 0 bp 12 bp 6 bp 6 bp left Box (0,2) (1,3) TopBaseline FlushRight 12 bp 0 bp 6 bp 6 bp right Box (0,3) (1,5) TopBaseline Center 6 bp 6 bp 6 bp 6 bp center Box (1,1) (2,2) TopBaseline FlushLeft 0 bp 12 bp 0 bp 6 bp ^ Box (1,2) (2,3) TopBaseline FlushRight 12 bp 0 bp 0 bp 6 bp ^ Box (1,3) (2,5) TopBaseline Center 6 bp 6 bp 0 bp 6 bp ^ Box (2,0) (3,1) TopBaseline FlushLeft 12 bp 24 bp 6 bp 6 bp Scribe Box (2,1) (3,2) TopBaseline FlushLeft 0 bp 12 bp 6 bp 6 bp left Box (2,2) (3,3) TopBaseline FlushRight 12 bp 0 bp 6 bp 6 bp right Box (2,3) (3,4) TopBaseline Center 6 bp 6 bp 6 bp 6 bp center Box (3,1) (4,2) TopBaseline FlushLeft 0 bp 12 bp 0 bp 6 bp ^ Box (3,2) (4,3) TopBaseline FlushRight 12 bp 0 bp 0 bp 6 bp ^ Box (3,3) (4,5) TopBaseline Center 6 bp 6 bp 0 bp 6 bp ^ Figure 4-5. TAB STOPS are interpreted differently by various document composition systems. The first row treats a tab stop as an alignment position for text; the text is aligned at the tab stop. The second row treats a pair of tab stops (or a tab stop and the page margin) as defining a column within which text is aligned. ==================== Defining a column to be the space between two tab stops creates an inconsistent notion of a column. Two short pieces of text can be positioned in the same column if the first is left-aligned and the second right-aligned; two longer pieces of text will be positioned in different columns. The tab stop defining a column does not imply a boundary, only an alignment point. A long line of text is not folded when the text extends beyond the next tab stop. Thus editing the text entry may result in aligning it in a different column than before. Tab stops provide only a very limited table formatting functionality with few typographic features. They are not satisfactory for most tables, yet they are the only table formatting capabilities offered by several document composition systems. The tab stop model breaks down completely when table entries must be folded from one line to the next. The next section discusses the first real table formatter available with an electronic document composition system. 4.4.2 tbl Preprocessor The tbl table formatter [Lesk, tbl] for troff is a preprocessor that accepts a table definition and generates formatter commands to render the table. The table definition is in two parts: the table arrangement part and the table content part. These parts may be intermingled to keep the arrangement definition close to the affected content. The last row definition is reused whenever more row contents are encountered than defined in the arrangement part. The table arrangements may include spanned headings across arbitrary columns or rows. However, the specification of spanned column headings is asymmetric to spanned row headings. The spanned column heading is specified in the table arrangement part while the spanned row heading is specified in the table content part or the heading, as discussed in Chapter 2. Folded table entries are possible with the column width stated explicitly. Whole paragraphs or complex formatted objects may be included as a table entry. Should the table contain objects formatted by another troff preprocessor, the order of processing must be carefully chosen to avoid interaction between the preprocessors. Formatting attributes may be specified to apply to all entries in a given column. No similar capability is provided for rows, although troff commands may be inserted before and after rows to control some of the formatting attributes. Rules may be specified within the table in a stylized fashion. Thin single and double rules may be specified; tbl computes the intersections between single and double rules that are only a single thickness and color. There is a shorthand specification to box all table entries. Rules are specified in an asymmetric fashion where vertical rules are included with the table topology and horizontal rules are included with the table content. The position of table entries is computed by troff. tbl assigns the width and height of table entries to troff registers and uses troff commands to compute the position of table entries. Thus, tables are limited in complexity by the number of available registers, which is in turn limited by the two-character naming restriction. Long tables with repeated rows can be formatted with tbl. Distinct continuation headings can be supplied explicitly or the original box head can be repeated for each table fragment on successive pages. The tbl preprocessor has the generality to accommodate most table arrangements. The specification of spanned row and column headings permits general layouts. The asymmetric treatment of rows and columns often impacts the ease of table specification. The content of tables is limited by troff resource restrictions and by the interaction with other troff preprocessors such as eqn and pic. Recursive content is not possible. For example, a troff document cannot contain both a table of mathematical equations (tbl includes eqn) and a display equation that includes a table of equation fragments (eqn includes tbl), since the troff preprocessors must be executed in a sequential pipeline order. 4.4.3 TEX There is no table formatting preprocessor for TEX. Equivalent functionality is provided through extensive macros [Knuth, The TEXbook, Chapter 22]. The TEX halign (horizontal alignment) primitive defines a template for the table layout that specifies a separate formatting environment for each column. Successive rows of the table then match entries in the preamble. Complications arise when introducing horizontal and vertical rules. Sophisticated knowledge of TEX is required to master them [Knuth, The TEXbook, Chapter 22]. The LaTEX macro package [Lamport, LaTEX] provides a specification language similar to tbl for defining tables. The table topology is defined as successive rows of column entries with vertical rule codes. The LaTEX scheme provides a more robust implementation than tbl since it is based on the TEX box and glue abstraction, whereas tbl simulates this abstraction with troff macros. The difference is revealed in the success with integrating mathematical notation in tables. LaTEX provides a more reliable and predictable table formatter compared to tbl and eqn which may fail in unexpected ways when interactions between the two preprocessors occur. However, the LaTEX table formatting macros do not ensure that aesthetic layout is easily accomplished. The LaTEX manual cautions authors of complex tables that some final hand tuning of the space around boxes will be required to achieve the best results [Lamport, LaTEX, p 105]. 4.4.4 TABLE None of these previous approaches provide interactive table design capabilities. They are all batch-oriented formatters. The TABLE editor [Biggerstaff, TABLE] is a prototype interactive graphics editor developed for editing complex structures. The prototype editor provides an interactive front end to the tbl formatter as part of an experiment in object-oriented programming. The objects implemented were tables and text. Operations on objects would be determined by the nature of the objects, such as deleting a row or column from the table. Editors were to provide WYSIWYG feedback so that the screen image would be identical to the printed form. Several software engineering concerns about object-oriented programming were tested in this experiment, such as the ease of building and modifying an editor, and the responsiveness of editing interactions. The results favor the first two criteria but raised some concerns about the latter. The TABLE prototype is a WYSIWYG editor for table structures. The table layout is presented graphically. The layout of table objects is accomplished with the tbl preprocessor. This provides TABLE with sufficient layout generality but with the associated performance penalty of using batch programs. The paper recommends developing a custom post-processor for a production system [Biggerstaff, TABLE, p 343]. The user interacts with TABLE objects through a selection mechanism. The granularity of table selections are the entire table, a table element, or selections within the object contained in the table element. Positioning commands allow the user to traverse the table structure along rows or columns and from the table to within the table element objects. The TABLE prototype succeeded in providing a graphical interface to complex table structures. Table designs could be generated more quickly and more accurately with TABLE than by coding tbl commands directly. However, TABLE inherited from tbl the resource restrictions, the lack of object structure characteristic of troff preprocessors, and the sluggish performance of a large batch formatter. TABLE lacks operations suited to the logical structure of tables through limitations in its internal table data structure and its selection mechanism. There is no style provision in TABLE, possibly because tbl did not provide one to be inherited.