<> <> ¶5 A New Framework for Tabular Composition by Computer ¶5 A NEW FRAMEWORK FOR TABULAR COMPOSITION BY COMPUTER CONTENTS ¶5.1 The Interactive Table Formatting Problem ¶5.1.1 What we need to do ¶5.1.2 How are we going to do it ¶5.2 Complexity of Table Formatting ¶5.3 Table Document Structure ¶5.3.1 Table Entry Representation ¶5.3.2 Table Arrangement ¶5.3.3 Style Attributes for Tables ¶5.4 Implementation data structure ¶5.4.1 Graphic Arts References to Grid Systems ¶5.4.2 Tables Described by Grids ¶5.4.3 Overlapping Planes ¶5.4.4 Grid Algorithms ¶5.5 Constraints ¶5.5.1 Constraint Systems ¶5.5.2 The Constraint Layout Problem ¶5.5.2 The Layout Algorithm ¶5.5.3 Handling Large Tables ¶5.6 Some Subproblems ¶5.1 The Interactive Table Formatting Problem This chapter presents a new framework for formatting tables suitable for interactive editors and displayers. The framework integrates an object-oriented approach to document content by extending the document structure model used in Chapter ¶3 to include arrangements of objects into tables. The new framework supports a wide range of typographic requirements for typesetting tables through extensions to the document style mechanisms. This framework specifies the table layout through grid designs familiar in the graphic arts and through a constraint solver that determines the placement of table entries. Several interesting new research problems arise from the use of this framework, and some of those are outlined for future work in Chapter ¶6. ¶5.1.1 What we need to do? To support interactive table formatting, one needs efficient algorithms for WYSIWYG table display and for direct manipulation of the table content and structure by pointing at and selecting components of the table. Tables in an electronic document may be represented by specifying the arrangement of table entries and the content for each entry. Each table entry can be an arbitrary document object, such as text, an illustration, mathematics notation or another table. Typographic rules, decorations, and shaded backgrounds are specified at the boundaries between table boxes and can be handled by a structure that represents these boundaries explicitly. Tables are rectangular arrangements of information that have both a row and column structure. Manipulating the table content requires dealing with either a row or a column at different times. Thus interactive operations should handle both row and column structures, making it equally easy to select a row or a column and to perform a movement or formatting operation on the selected table part. Selection hierarchies from a single table entry through the containing rows or columns to the entire table should work equally well for groups of rows as for groups of columns. Transposing the rows and columns of a table should be easy to accomplish with the table structure. Style attributes for tables should be provided in an analogous manner to graphical style, but the additional structure in tables poses some difficulties. Style attributes may be applied at several levels in a table, for instance, to the entire table, a single row, a single column, spanned rows or columns, or to the individual table entry. A method is needed to determine the style attributes that apply to each table entry as it is formatted. The table layout mechanism needs to handle general arrangements of rows and columns and to specify the many alignments possible within a table design. Spanning a column heading across several columns is a special case of the general problem of aligning one set of entries with another set of entries. The new framework for table formatting must be capable of expressing these arrangements and of determining the positions of the table boxes in any such arrangement. Another generalization of the table formatting problem is to provide for overlapping layers of information. Background tints, such as the coloured tints used in tables published in recent issues of the Communications of the ACM, are one class of overlapped information. The table formatting framework should be capable of layering background tints, scanned images, and illustrations within a table. ¶5.1.2 How are we going to do it? The new table framework has three parts: document structure extensions, grid layout specifications, and layout constraint satisfaction. The document structure and style machinery for tables is an extension of the model proposed in Chapter ¶3. Each table entry is a document object. Recall that objects can format themselves and are represented by a set of dimensions and two procedures for laying out and rendering the object. Additional style attributes must be created for the various typographic features in tables, such as rule thicknesses and colours, alignments, bearoff distances, and so on. These table formatting style attributes for a particular table entry may collected into formatting rules and dictionaries and attached as properties to the table entries. Since a table does not have the same simple hierarchy as the tree-structured document structure, a more elaborate binding algorithm is needed to associate style attributes with the table entry. The table arrangement, or table topology, is expressed using a grid design. Each grid line either surrounds the table or is the boundary between table entries. Each table entry is surrounded by four grid lines. Rules and decorations are superimposed on the grid boundaries. This implies that grid lines have a nonzero width equal to the widest rule on that grid line. The arrangement of table entries is expressed with respect to the grid coordinates for the table. Table entries that share grid lines are aligned together. Operations that change the table topology have to make corresponding changes in the grid system. The table layout, or table geometry, is computed from both the table topology and the dimensions of the table entries. A linear inequality constraint solver is used in this new framework to provide a general mechanism for computing the table geometry. Grid lines are given symbolic names. The table entry dimensions and grid variables are combined to express the table arrangement as a set of linear inequalities. The constraint solver then solves the system of constraints to produce the page coordinates of the grid line positions. Interactive use of the table formatter demands that changes to the table be handled efficiently and in an incremental fashion. A constraint solver which can handle interactive changes is used in this scheme. ¶5.2 Complexity of Table Formatting The table formatting problem to produce an optimal table size is intractable. The table formatting problem becomes tractable when sufficient structure is added to the table arrangement. As shown below, if the table entries are arranged in balanced columns, the problem is NP-complete. When only an approximation to the optimal size is needed and the table can be formatted in horizontal slices, the problem is O(nlogn). When a total grid structure is imposed, the table formatting problem is linear. The table formatting problem belongs to the class of two-dimensional packing problems first introduced by Baker et al. [Baker, Coffman, Rivest, 1980]. Random Pack The Random Pack problem is to layout rectangular table entries in a table bounded by the page width and with minimum height. THEOREM 1. The Random Pack problem is NP-complete. Proof. The proof reduces the table formatting problem to the PARTITION problem known to be NP-complete [Garey & Johnson]. Consider a table of rectangular pictures with picture widths less than one-half the page width and with random picture heights that sum to 2k. One possible table arrangement is in two balanced columns. This requires choosing two subsets of the pictures that result in equal column depths of length k. Thus we have a set of pictures, P, with heights h(pi) such that wish to find two subsets of pictures, S and S` = P  S such that PARTITION problem. This reduction does not produce a a strong NP-complete result, since if the sizes of the pictures are bounded, as they would be for any practical typeset table, then a polynomial time algorithm based on dynamic programming can easily be found to produce the optimal arrangement of pictures. However, one can also reduce the table arrangement problem to the BIN PACKING problem which is NP-complete in the strong sense [Garey & Johnson]. THEOREM 2. The Random Pack problem is NP-complete in the strong sense. Proof. Consider a table of rectangular pictures with random widths less than the page width and uniform heights. The table arrangement is to place as many pictures in a row and to minimize the number of rows. This requires partitioning the set of pictures P with widths w(pi) into rows R1, R2, ..., Rk which is an instance of the BIN PACKING problem. Stub Pack The Stub Pack problem introduces the notion of horizontal rules into the table arrangement. These rules subdivide the table space by placing a rule between the bottom of an entry and the top of the next entry. Once a rule is placed it extends all the way to the right. Sorting the set of rectangular boxes and using variants of the greedy algorithm provide polynomial time algorithms to approximate the optimal placement. This table formatting problem is exactly the level-oriented two-dimensional packing problem for which several approximation algorithms are known [Coffman, Garey, Johnson, Tarjan, SIAM J. Computing, Nov 1980] [Baker, Brown, Katsef, J.Algorithms, 1981]. Lattice Pack If the topology of the table arrangement is given as a lattice or grid structure, then the formatting algorithm can be solved in linear time. Essentially one verifies the arrangement by running through all the table entries incident on a lattice boundary and determining the maximum entry size. Since there are on average entries must be examined to verify the table layout. Constraint Solvers The Lattice Pack problem does not admit equal sized columns or spanned column entries. As shown later, one can describe the arrangement of spanned entries or equal sized grid lines with linear inequalities. Solving this system of linear inequalities is equivalent to the LINEAR PROGRAMMING problem that is known to be solvable in polynomial time [Khachian, 1978]. A table arrangement without centered table entries can be described by a linear inequality system using only two variables per inequality. This LINEAR INEQUALITY(2) problem can be solved in O(n3) worst case time [Aspvall and Shiloach]. Algorithms for solving more general systems of linear inequalities based on the Simplex method [Dantzig, 1963] have O(n3) expected time behaviour [Dantzig, 1980]. A variant of the Simplex method algorithm due to Nelson [Nelson, Program Verification] is used in ¶5.5 to compute the table geometry from linear inequalities. ¶5.3 Table Document Structure Recall the document structure extensions outlined in Chapter ¶3 for including graphical illustrations in documents. This tree-structured hierarchy of document objects will be used to represent tables. The classes of document content included only textual and graphical material. The document structure was described recursively with text documents containing illustrations that in turn could contain text labels. To represent tables a new class of document content is introduced. The recursive document structure still pertains, and text documents may contain tables that in turn contain text, illustrations, another table, and so on. A large example of such a table is shown in Figure 5-1. This recursive document structure permits a general and extensible treatment of information presentation within documents, illustrations, and tables. -- -------------------- Grid 5 Rows 3 Columns ByRowThenColumn RowConstraints RowConstraint 2*gy1 - 1*gy2 - 1*gy0 = 0 RowConstraint 2*gy2 - 1*gy3 - 1*gy1 = 0 RowConstraint 2*gy3 - 1*gy4 - 1*gy2 = 0 RowConstraint 2*gy4 - 1*gy5 - 1*gy3 = 0 Rule (0,0) (0,2) 2 bp Rule (0,0) (2,0) 2 bp Rule (0,2) (1,2) 2 bp Rule (1,1) (1,3) 2 bp Rule (1,1) (2,1) 2 bp Rule (1,3) (3,3) 2 bp Rule (2,0) (2,2) 2 bp Rule (2,0) (4,0) 2 bp Rule (2,2) (3,2) 2 bp Rule (3,1) (3,3) 2 bp Rule (3,1) (4,1) 2 bp Rule (3,3) (5,3) 2 bp Rule (4,0) (4,1) 2 bp Rule (4,1) (5,1) 2 bp Rule (5,1) (5,3) 2 bp Box (0,0) (2,1) FlushTop Center 12.0 bp 12.0 bp 12.0 bp 12.0 bp ///Beach/Thesis/Figure5-1-ChesterRotated.AIS width: 2 in Box (1,2) (3,3) FlushTop Center 12.0 bp 12.0 bp 12.0 bp 12.0 bp ///Beach/Thesis/Figure5-1-PressFile.Press leftMargin: 228 pt, topMargin: 172 pt, width: 123 pt, height: 145 pt Box (2,0) (4,1) FlushTop Center 12.0 bp 12.0 bp 12.0 bp 12.0 bp ///Beach/Thesis/Figure5-1-GriffinRose.Press leftMargin: 195 pt, topMargin: 143 pt, width: 102 pt, height: 130 pt Box (3,2) (5,3) FlushTop Center 12.0 bp 12.0 bp 12.0 bp 12.0 bp ///Beach/Thesis/Figure5-1-RecursiveTable.Press leftMargin: 121 pt, topMargin: 94 pt, width: 157 pt, height: 46 pt Box (0,1) (1,2) FlushTop FlushRight 12.0 bp 12.0 bp 12.0 bp 12.0 bp Chester Carlson Box (1,1) (2,2) FlushTop FlushLeft 12.0 bp 12.0 bp 12.0 bp 12.0 bp Formatted Document Box (2,1) (3,2) FlushTop FlushRight 12.0 bp 12.0 bp 12.0 bp 12.0 bp Griffin Rose Box (3,1) (4,2) FlushTop FlushLeft 12.0 bp 12.0 bp 12.0 bp 12.0 bp Table Figure 5-1. A WIDE RANGE OF CONTENT can be incorporated within tables using an object-oriented document structure. This table includes five kinds of content: text, scanned illustrations, synthetic line drawings, composed pages, and tables. The text captions in the centre column are positioned flush at the top of each row and alternate flush right and left. The picture of Chester Carlson, the inventor of xerography, was scanned from an original photograph and is 367 scan lines by 474 pixels with each pixel containing an 8-bit grey value. The formatted document is the output of other software that produces a compatible printer format used at Xerox PARC. The synthetic graphic image was created by Maureen Stone with the Griffin illustrator. The table was composed using the table formatting prototype described in this chapter. -------------------- -- ¶5.3.1 Table Entry Representation For formatting purposes, a table object in the extensible document structure can be represented as a box with four dimensions and two procedures for laying out and rendering the content. The layout procedure is given the arrangement and content of the table entries and determines their size and positions. The table arrangement or table topology is explicitly determined by the creator when the table entries are collected in the document. The rendering procedure is given the positions of all the table entries and produces the human-readable form of the table by recursively rendering the content of the table entries at the positions given by the layout procedure. The rendering procedure is implemented with a device-independent graphics package that can equally easily produce a screen display or a file-based printer description of the table. This device-independent capability is necessary for WYSIWYG interaction. Furthermore, the separation of the layout and rendering steps permits the rapid redisplay of the table without recomputing the layout when the view of the table is moved or scrolled. The table entry is abstractly represented as a box with four dimensions and an implied origin. This origin is the starting point for the rendering procedure for the table entry. The alignment point within the box, as shown in Figure 5-2, is the point to which other table entries relate to this entry, and may be different from the origin depending on its typographic style. Four dimensions are necessary to permit alignment in both the horizontal and vertical directions, unlike some other box dimensions schemes like TEX that only have one width dimension instead of two. -------------------- ///Beach/Thesis/Figure5-2-TableBox.Press leftMargin: 151 pt, topMargin: 112 pt, width: 111 pt, height: 71 pt Figure 5-2. A TABLE BOX is represented by four dimensions that are the left, right, up, and down distances from an alignment point. This example is typical of a table entry that is aligned on a decimal point in the text content. -------------------- The alignment relationships are specified by the typographic treatment of the table entry. Centred alignment divides the sum of the dimensions by two. Flush left, right, top or bottom alignment sets one of the dimensions to zero. Aligning on a character within the table entry sets the alignment point to be the origin of that particular characters. Figure 5-3 illustrates two columns of character-aligned table entries, the first aligned on decimal points including the implied decimal point at the end of a numerical string, and the second aligned on a multiplication sign. -------------------- Grid 3 Rows 2 Columns ByRowThenColumn Rule (0,0) (0,2) 1 bp Rule (1,0) (1,2) 1 bp Rule (2,0) (2,2) 1 bp Rule (3,0) (3,2) 1 bp Rule (0,0) (3,0) 1 bp Rule (0,1) (3,1) 1 bp Rule (0,2) (3,2) 1 bp Box (0,0) (1,1) Center CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp 0 Box (1,0) (2,1) Center CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp .625 Box (2,0) (3,1) Center CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp 1023.5 Box (0,1) (1,2) Center CharAlign 'X 3.0 bp 6.0 bp 3.0 bp 3.0 bp speed Box (1,1) (2,2) Center CharAlign 'X 3.0 bp 6.0 bp 3.0 bp 3.0 bp acceleration Box (2,1) (3,2) Center CharAlign 'X 3.0 bp 6.0 bp 3.0 bp 3.0 bp force Figure 5-3. ALIGNMENT within a table entry may be based on specific characters within lines of text, such as decimal points (actual or implied) in the first column, and multiplication signs in the second column. -------------------- Another alignment relationship is shown in Figure 5-4. In this example, the table entries in each column are aligned on decimal points. However, the last two columns need further specification: how to treat the excess whitespace induced by the very long column heading? This additional specification determines how the aligned set of column entries are positioned within the column, such as centred, flush left, or flush right. Currently, the table formatting prototype does not accept such a positioning specification, but a complete implementation would have to accept a row and column positioning attribute. -------------------- Grid 4 Rows 5 Columns ByRowThenColumn ColConstraints ColConstraint 2.0*gx4 - 1.0*gx3 - 1.0*gx5 = 0 Rule (0,0) (0,5) 1 bp Rule (1,0) (1,5) 1 bp Rule (4,0) (4,5) 1 bp Rule (0,0) (4,0) 1 bp Rule (1,1) (4,1) 1 bp Rule (1,2) (4,2) 1 bp Rule (0,3) (4,3) 1 bp Rule (1,4) (4,4) 1 bp Rule (0,5) (4,5) 1 bp Box (0,0) (1,3) TopBaseline Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Short Head Box (0,3) (1,5) TopBaseline Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Very Long Column Head Over Narrow Entries Box (1,0) (2,1) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (1,1) (2,2) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (1,2) (2,3) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (1,3) (2,4) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (1,4) (2,5) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (2,0) (3,1) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xxx.xx Box (2,1) (3,2) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xxx.xx Box (2,2) (3,3) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xxx.xx Box (2,3) (3,4) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xxx.xx Box (2,4) (3,5) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xxx.xx Box (3,0) (4,1) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (3,1) (4,2) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (3,2) (4,3) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (3,3) (4,4) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Box (3,4) (4,5) TopBaseline CharAlign '. 3.0 bp 6.0 bp 3.0 bp 3.0 bp xx.xxx Figure 5-4. SPECIFYING POSITION WITHIN A COLUMN as well as aligning table entries may be necessary when there is excess whitespace to disperse among the row or column entries. All the table entries are aligned on decimal points but the last two columns have excess whitespace due to the very long column head. -------------------- Textual table entries may be more complex than simple phrases or numbers. Some entries may be paragraphs with long lines folded within the column boundary. Yet we wish to determine the column boundaries from the content of the table entries. How does one break this circularity? The only technique available in existing table formatters like tbl is to supply an explicit column width attribute. In fact, this is a cheap way of specifying equal sized columns, since no matter what the content of the entry, all such column entries are the specified width. However, justifying the short line lengths typical in table columns results in aesthetically unacceptable results due to excessive hyphenation or large justifying spaces. Therefore these entries are often formatted ragged right or folded intentionally with the consequence of accumulating objectionable whitespace at the right of the entry. A better strategy is to accept the line length as only a hint and determine the actual bounding box of the composed text. Such a bounding box can be returned by the object layout procedure or determined by a scan of the formatted object. With the bounding box information, the folded table entry can be treated uniformly as any other entry. A feature of this strategy is that a column of folded entries is only as wide as necessary without the excess whitespace. There remains the problem of determining an aesthetically pleasing column width when table entries are of widely varying widths or when the sum of column widths exceeds the page measure. Some ideas on determining optimum column widths for folded table entries and for balancing whitespace are outlined as future work in Chapter ¶6. Vertical positioning of odd-sized table entries requires additional information, especially for entries that are folded into multiple lines or for ones with varying height like mathematical expressions, The top row in Figure 5-5 demonstrates the unacceptable result when baselines of folded entries do not align across the column. Simple vertical alignment attributes such as flush top, flush bottom, or centre do not take into consideration any internal structure of the table entry. Additional alignment choices are needed to align the baselines within a table entry: align on top or bottom baseline, centre on the top or bottom baseline. Note that there are two centred baseline choices for text boxes with an even number of lines. The bottom row of Figure 5-5 demonstrates the range of alignment choices necessary to align baselines within table entries. -- -------------------- Grid 2 Rows 5 Columns ByRowThenColumn Rule (0,0) (0,5) 1 bp Rule (0,0) (2,0) 1 bp Rule (0,1) (2,1) 1 bp Rule (0,2) (2,2) 1 bp Rule (0,3) (2,3) 1 bp Rule (0,4) (2,4) 1 bp Rule (0,5) (2,5) 1 bp Rule (1,0) (1,5) 1 bp Rule (2,0) (2,5) 1 bp Box (0,0) (1,1) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Optically Center, Ignores Baseline Box (0,1) (1,2) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Baseline Baseline Box (0,2) (1,3) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Baseline Baseline Baseline Box (0,3) (1,4) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Baseline Baseline Baseline Baseline Box (0,4) (1,5) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Baseline Baseline Baseline Baseline Baseline Box (1,0) (2,1) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Optically Center, Ignores Baseline Box (1,1) (2,2) TopBaseline Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Baseline at Top Box (1,2) (2,3) BottomBaseline Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Bottom Baseline Box (1,3) (2,4) CenterOnTopBaseline Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Center Baseline at Top of many Box (1,4) (2,5) CenterOnBottomBaseline Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Center on Bottom Baseline of Many Figure 5-5. OPTICAL CENTRING of lines within a table may not produce aesthetic results when table entries have several lines of text. All of the entries in the top row are optically centred; all of the entries in the bottom row have been centred on a selected baseline. -------------------- The alignment on baselines requires that the document object must understand all these alignment choices to establish the correct alignment point for the table entry. Some objects have little similarity to text objects yet the alignment attributes must apply to all objects on a uniform basis. The technique used in this prototype is to differentiate between alignment attributes that deal with content, such as decimal alignment, and those that deal with structure, such as centre on top baseline. The alignment on content must be understood by the object if they are to have any effect. Otherwise they can be ignored, such as a scanned illustration object ignoring decimal alignment. The alignment on structure can be accommodated by extending the document object representation to include a list of baselines with the dimensions. Then the table alignment algorithms can uniformly treat any object that returns this baseline information without requiring each object to implement the alignment algorithms. ¶5.3.2 Table Arrangement The document structure extensions for tables must capture the arrangement of the table entries as well as their content. Tables are two-dimensional, and in this respect are similar to mathematical notation in the microscopic sense, and to page layout in the macroscopic sense. However, tables are significantly unlike paragraphs of text wherein text flows from one line to the next and there are few positioning relationships between text elements. It is these positioning or alignment relationships between table entries that must be included in the table document model. Since the document model is hierarchical and tree-structured with only parent-child relationships, we must map the table entry arrangement onto this structure. One might begin with the obvious row and column structure of the table. Other document formatting systems have chosen one dominant structure and expressed the table structure in terms of the other. For example, rows are the dominant structural element for tables in tbl. This choice appears natural since all the column entries of a row can be entered on the same input line. However this row dominance makes column operations more difficult. Adding a new row is simple; one adds a new input line. Adding a new column is tedious since each input line must have a new column entry inserted in the appropriate place. Another asymmetry between rows and columns in tbl concerns how one specifies spanned headings. A spanned column heading is specified in the column layout part using a sequence of s layout codes, whereas a spanned row heading is specified in the table entry part using a entering a special formatting code, \^, as the table entry content. For interactive table formatting, any asymmetry should be avoided so an operation can apply equally well to a row or to a column. Another structure suggests itself from the subdivision of rows and columns of tables. Figure 5-6 illustrates a table organization with this subdivision structure and a hierarchical representation of the table. The column heading that spans several entries becomes a subtree root. The descendants of this root are the elements of several columns. Multiway branches are for spanned headings, single branches are for table entries. A dual structure can be created for the row structure and two independent hierarchies, one for rows and another for columns, could be maintained. -------------------- Grid 1 Rows 2 Columns ByRowThenColumn Box (0,0) (1,1) Center Center 12.0 bp 12.0 bp 12.0 bp 12.0 bp ///Beach/Thesis/Figure5-6-HierarchicalTable.Press leftMargin: 120 pt, topMargin: 94 pt, width: 190 pt, height: 54 pt Box (0,1) (1,2) Center Center 12.0 bp 12.0 bp 12.0 bp 12.0 bp ///Beach/Thesis/Figure5-6-TableHierarchy.Press leftMargin: 208 pt, topMargin: 156 pt, width: 102 pt, height: 82 pt Figure 5-6. HIERARCHICAL TABLES contain rows and columns that span other rows and columns like the parent nodes in a tree span descendant nodes. -------------------- The dual row and column hierarchies make interactive selections of table structures very powerful. This selection hierarchy notion is similar to text selection which enables one to select a character and extend the selection through various levels of structure from a character to a word, sentence, paragraph, section, and ultimately the entire document. Starting from a selected table entry, one can extend the selection to include all entries in the containing row or column by traversing upwards in the row or column hierarchy. Then successive extensions of the selection would include containing rows or columns until the entire table is selected. Such a dual hierarchy scheme was unsuccessfully attempted in a previous table formatting prototype [Beach, CS740 project]. The selection mechanism worked effectively as expected. However, the implementation was complex and the dual data structures difficult to coordinate. Much of the code had to be parameterized by row or column operations and some code had to be duplicated for each operation. The crucial realization was that not all table designs were hierarchical and a more general underlying arrangement structure was necessary. A more general representation of table arrangements must simultaneously satisfy several goals: conform to a hierarchical document structure, embed arbitrary table entry objects, and describe the arrangement and positioning relationships. The proposed scheme is to define a grid or lattice of boundaries between table entries. The grid may be nonuniformly spaced to permit both wide and narrow entries. Each table entry is surrounded by four grid lines on the left, right, top and bottom. A row is bounded by two horizontal grid lines, and a column by two vertical grid lines in a symmetric fashion. A spanned heading will cross several grid lines but still be bounded by four grid lines. Any arbitrarily complex lattice arrangement of table entries can be specified by listing the four grid lines for each table entry. The entries can be listed in any order if an area data structure is used to capture their arrangement. A planar table will not have any overlapping entries. A nonplanar table can be handled by maintaining an overlap order and using the Painter's algorithm to render the overlapped entries from back to front, however it still is expressed using a single set of grid lines. An important observation of this structure is that it makes the boundaries explicit. Rules between table entries can be expressed as running along a grid boundary. Horizontal and vertical rules are treated symmetrically. Boundaries with rules present have a nonzero width equal to the maximum width of any rule along that boundary. Describing the table arrangement as a grid permits a simpler document structure. Table entries now have grid coordinate positions, and perhaps an overlap number. They can be listed in any order in the document structure and mapped into the grid structure by a parser that maps grid coordinate positions into the data structure for the grid. Section ¶5.4 describes the grid data structure. The external representation of the coordinates can be explicit because these do not change. In the data structure they can be implicit through either symbolic references or relative pointers. Traversing the data structure will produce a new set of coordinates for the external representation. This scheme is analogous to text editors accepting carriage returns in the external representation but maintaining a convenient data structure as a list of text strings for each line. ¶5.3.3 Style Attributes for Tables The document style mechanism must be extended to include additional table formatting attributes and to apply these attributes to table entries. The additional table style attributes include various alignment specifications, such as decimal or character alignment, top or bottom baseline alignment, and centred top or bottom baseline alignment, various rule parameters, such as rule thickness and colour, and various bearoff distances that specify the whitespace between table entries and the grid lines. Extend style mechanism for tables. 1) provide additional style attributes -- easy through the style parameter mechanism. 2) apply styles to the document structure for tables -- harder because search rules are different than hierarchical documents. Applying style attributes to table entries is harder because the attributes may come from several nonhierarchical sources. Figure 5-7 illustrates some of the possible sources. A table style may determine general formatting rules for the whole table. Style rules attached to rows or columns may specify the formatting of entries within those rows or columns. Each table entry may have additional local formatting attributes applied that override the other specifications. Since the row and column structures are not a single hierarchy, the document hierarchy model used to apply style attributes is insufficient. -------------------- Grid 6 Rows 6 Columns ByRowThenColumn Rule (0,0) (0,6) 1 bp Rule (0,0) (6,0) 1 bp Rule (0,1) (6,1) 1 bp Rule (0,6) (6,6) 1 bp Rule (1,1) (1,6) 1 bp Rule (1,2) (6,2) 1 bp Rule (1,5) (6,5) 1 bp Rule (2,2) (2,5) 1 bp Rule (2,3) (6,3) 1 bp Rule (2,4) (6,4) 1 bp Rule (3,0) (3,6) 1 bp Rule (4,0) (4,6) 1 bp Rule (5,0) (5,6) 1 bp Rule (6,0) (6,6) 1 bp Box (0,0) (3,1) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp XxxXxxx Box (0,1) (1,6) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp XxxXxxx Xxx Xxxxx Box (1,2) (2,5) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp XxxXxxx Box (1,1) (3,2) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp XxxXxxx Box (1,5) (3,6) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp XxxXxxx Box (2,2) (3,3) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (2,3) (3,4) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (2,4) (3,5) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (3,0) (4,1) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (3,1) (4,2) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (3,2) (4,3) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (3,3) (4,4) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (3,4) (4,5) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (3,5) (4,6) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (4,0) (5,1) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (4,1) (5,2) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (4,2) (5,3) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (4,3) (5,4) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (4,4) (5,5) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (4,5) (5,6) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (5,0) (6,1) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (5,1) (6,2) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (5,2) (6,3) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (5,3) (6,4) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (5,4) (6,5) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Box (5,5) (6,6) Center Center 3.0 bp 6.0 bp 3.0 bp 3.0 bp Xxxx Figure 5-7. STYLE ATTRIBUTES for a table entry are determined by several style rules specified for the entire table, a row, a column or an individual table entry. This table table style rule specifies a Helvetica type family. One row has a style attribute for bold face. The spanned column has a style attribute for italic face. One of the three table entries in the intersection of the row and column has a Times Roman type family attribute. The style attributes for a particular entry are determined by accumulating all the style attributes according to a natural search order: table, row, column, then table entry. -------------------- Due to the row and column structure of tables, there may be several row or column style rules applicable for a given table entry. The nested hierarchy of the row/column containment of a table entry determines the appropriate search path. First apply the style attributes for the table, then the outermost row or column style rule, then the inner containing row or column style rule, and so on until the local formatting rule for the table entry is applied. Since styles permit unchanged style attributes to carry on in the search path, it is preferable to apply both the row and column style rules. One difficulty remains and that is how to disambiguate the row or column preference? The solution taken here is to provide a Boolean selector for row-before-column or column-before-row preferences. Such a selector is itself a style attribute that may be specified in a general style rule dictionary or as a local formatting property. ¶5.4 Grid Systems ¶5.4.1 Graphic Arts References to Grid Systems Grids have been used in the graphic arts since grid systems were developed in Switzerland after World War II. The principle behind a grid system is an objective attitude to the presentation of the subject material and uniformity in the layout of all pages. Grids institute a disciplined approach to design and layout, limiting the number of choices among a vast set to provide regularity and order in chaotic situations. Several graphic designers have written about grid systems including Muller-Brockman's Grid Systems in Graphic Design [Muller-Brockman], Hurburt's The Grid [Hurburt], and Williamson's Methods of Book Design [Williamson]. Some computer compositions systems have also incorporated grid systems into their design, and one early example was Tilbrook's NEWSWHOLE [Tilbrook] for interactive newspaper page layout. Hurburt [Hurburt] describes the modular design and sense of proportion induced by a grid design. Several proportions, such as the Golden Ratio, square, grid design. Unlike the orthodox grid formed by uniformly spaced horizontal and vertical lines to produce square modules, the typographic grid is more casual with lines introduced to specify margins on the page, column measures, and alignment guides. Common vertical alignment points are the headline, the first line of text on a page, the first line of a chapter title, the first line of text in a chapter, the last text line, and the footline. Figure 5-8 shows the grid design for the pages of this thesis. -------------------- ///Beach/Thesis/Figure5-8-GridDesign.Press leftMargin: 214 pt, topMargin: 142 pt, width: 126 pt, height: 200 pt Figure 5-8. GRID DESIGN for the pages of this thesis illustrates the traditional use of boundary lines to determine margins, column measures, gutter widths, alignment points, etc. -------------------- For formatting tables, the grid system gives us both the sense of proportion and the logical and disciplined arrangement of information. ``The fewer the differences in the size of the illustrations, the quieter the impression created by the design'' [Muller-Brockman, p11] The interactive newspaper pagination system, NEWSWHOLE [Tilbrook], is an example of the application of grid systems for layout and formatting. The grid lines in that system were active and could be stretched and moved across the page at the user's will. There was a simple constraint satisfaction to ensure that there were always an integral number of grids across the section of the page and that the grids were equal width. However there were only limited text composition capabilities in the NEWSWHOLE prototype and it did not attempt to present a WYSIWYG presentation. ¶5.4.2 Tables Described by Grids Grids promote the concept of treating table entries as areas. The grid lines act as the boundaries between table entries. Each table entry is in effect stapled to four grid lines, the top, bottom, left and right grid boundaries. Figure 5-9 presents a table with a grid overlay drawn as grey lines. Some of the table entries are contained by the smallest grid modules while others, especially the spanning column headings, occupy several modules. Any arbitrary arrangement of table entries can be described by such a rectangular grid. -------------------- Grid 4 Rows 5 Columns ByRowThenColumn GridOverlay 3 bp 0 0 0.7 Rule (0,1) (4,1) 1 bp Rule (3,0) (3,5) 1 bp Box (0,1) (1,5) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp XxxXxxxxXxx Box (1,1) (2,3) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp Xxxx Box (1,3) (2,5) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp Xxxx Box (2,1) (3,2) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Box (2,2) (3,3) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Box (2,3) (3,4) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Box (2,4) (3,5) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Box (3,0) (4,1) TopBaseline FlushLeft 3.0 bp 3.0 bp 6.0 bp 6.0 bp XxXxxXxx Box (3,1) (4,2) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Box (3,2) (4,3) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Box (3,3) (4,4) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Box (3,4) (4,5) TopBaseline Center 3.0 bp 3.0 bp 6.0 bp 6.0 bp xxx Figure 5-9. TABLE DESCRIBED BY A GRID in which the grid lines are drawn in light grey. Some table entries are contained within a single grid module while others occupy several modules. The two rules, one horizontal below the column headings and one vertical after the row stub, run along the grid boundaries. -------------------- The grid structure represents the table topology, the arrangement of table entries. Rows and columns are treated in a symmetric way: a pair of horizontal grid lines determines a row, and a pair of vertical grid lines determines a column. Table elements are assigned to grid coordinates according to the relationships between table entries. For example, those entries belonging to the same row share the same pair of grid lines, and an entry spanning several columns uses a pair of grid lines chosen from the left grid line of the leftmost spanned entry and the right grid line of the rightmost spanned entry. Rules within the table are assigned the grid line corresponding to the boundary that the rule should follow. The area data structure used for the grid system is the corner stitching technique developed by Ousterhout for VLSI layout tools [Ousterhout, VLSI data structure]. This data structure tesselates the plane with rectangular tiles. The tiles are connected at the corners with pointers to adjacent tiles. Figure 5-10 illustrates a prototypical tile with four pointers at the corners and a fragment of the tesselation showing how several tiles are connected together. Special border tiles surround the tesselation to simplify the search and manipulation algorithms. -------------------- ///Beach/Thesis/Figure5-10-CornerStitching.Press leftMargin: 134 pt, topMargin: 152 pt, width: 328 pt, height: 131 pt Figure 5-10. CORNER STITCHED DATA STRUCTURE uses tiles that are joined together by four pointers, two at the NorthEast corner going North and East, and two at the SouthWest corner going South and West. -------------------- The corner stitching algorithms are well suited for interactive use since they depend only on local information in the vicinity of the operation. The point finding algorithm takes O(the data structure. The creation and deletion algorithms both take O(h) average time, where h is the height of the tile, and O(n) worst case time. The area enumeration algorithm is O(s) where s is the number of tiles in the enumeration area. A corner stitched tile is created for each table entry and for each rule. To make the grid boundaries explicit, coordinates are chosen so that table entry tiles and rule tiles can coexist in the data structure. The grid data structure for the table in Figure 5-9 is shown in Figure5-30. The table entry tiles are grey and the rule tiles are black. -------------------- ///Beach/Thesis/Figure5-11-TableGrid.Press leftMargin: 142 pt, topMargin: 151 pt, width: 320 pt, height: 265 pt Figure 5-11. TABLE MAPPED ONTO GRID DATA STRUCTURE for Figure 5-9. The grey tiles represent table boxes and the black tiles represent rule elements. The coordinates are chosen to make the grid boundaries explicit. -------------------- Only topological information is stored in the grid data structure. Each tile however references both the content and positioning information for the table entry or rule. The actual positions of the grid lines and table boxes must be determined by the table layout algorithm by enumerating the tiles in the grid. For each table box and rule, the size information can be used to determine where the grid lines must be placed on the formatted page. Once the layout information is available, the table rendering algorithm can traverse the grid data structure and display the document object at the position referenced by each tile. ¶5.4.3 Overlapping Planes The corner stitching data structure is inherently planar whereas tables may involve overlapping table entries or rules. There are three things that may overlap: table entries that appear pasted over top of other entries, rules that intersect with other rules, and backgrounds that underlay part or all of the table. The positions of all these overlapping elements can be expressed in terms of grid boundaries so no additional layout information is necessary or generated by overlapped entries. Backgrounds can be dispensed with quickly since there are generally few of them, often only one per table, and since their size does not influence the layout of the table entries. It is sufficient to maintain a list of background rectangles to be shaded, and for the rendering algorithm to traverse that list before treating the rules or table entries. Overlapping table entries can be handled by maintaining an overlap order. The tiles in the grid data structure presently reference a single table entry. For overlapping entries, the tiles would have to maintain the list of overlapped table entries in overlap priority order. The rendering algorithm would then be modified to render the list of overlapped entries from back to front using the Painter's algorithm. These extensions for overlapping table entries have not been implemented in the prototype although background tiles are included. Overlapping rules within tables are more interesting and more common. Rules intersect when horizontal and vertical rules meet. Even if the rules are of equal thicknesses, the end points of the rules must be adjusted to ensure that they avoid a notched corner, such as the leftmost examples in Figure 5-12. Additional complications arise when two rules of different colours intersect. One of the rules should dominate and the other rule should have its endpoint adjusted to avoid penetrating the other, as shown by the middle examples in Figure 5-12. When rule intersections are captured this way, several opportunities arise to introduce new functionality. For example, new capabilities for rounded corners, as shown at the bottom right in Figure 5-12, or for placing special ornaments can be introduced at the intersections between rules. The intersections between double or multiple rules can be treated in various ways. This technique of capturing the overlapping rules permits a general mechanism for an implementation to detect the intersection and to provide appropriate treatment. -------------------- ///Beach/Thesis/Figure5-12-Intersections.Press leftMargin: 129 pt, topMargin: 141 pt, width: 331 pt, height: 198 pt Figure 5-12. INTERSECTING RULES in a table may require several special treatments. Lines of different thicknesses; lines of different colour; lines with repeating patterns (dashes, borders); lines with ornaments at the end; rounded corners where lines intersect with sufficient clearance are all examples of the treatments possible with this technique. -------------------- ¶5.4.4 Grid Algorithms This section outlines several of the algorithms used with the grid data structure. Enumerate Area Algorithm The grid data structure must be traversed to enumerate all the table entries. The corner stitching algorithm for directed enumeration [Ousterhout] can be used directly, providing an algorithm with time linear in the number of tiles visited. The area enumeration algorithm is used by the table layout, table rendering, deletion and insertion algorithms. The action performed at each tile is specified by a client-supplied call-back procedure, for example, to establish the grid constraints for a table entry or to delete a table entry. Table Layout Algorithm Given the grid data structure representing the table arrangement, this algorithm determines the (x,y) coordinates of the grid lines and all the table entries. Since the rules and background areas are coincident with the grid lines, their positions are determined by the rendering algorithm from the grid lines. The layout algorithm uses linear inequality constraints described in ¶5.5 to compute the grid positions. The time complexity of the layout algorithm is dominated by the algorithm for satisfying the linear inequality constraints. Table Rendering Algorithm Given the layout positions of all the grid lines and table boxes, this algorithm renders the content of the table. Using the object-oriented document structure, each table entry will be rendered by its own client-supplied procedures specific to the content, for example, for text, scanned images, synthetic illustrations, and tables by invoking this algorithm recursively. To handle the possible overlapping of data, the table content is rendered in back to front order: shaded backgrounds first, then rules, then the table entries. The rendering algorithm uses device independent routines that produce similar output for display or printing devices. This device independent imaging model [Warnock & Wyatt] provides the WYSIWYG capability for interactive editing of tables. R1: [Output Backgrounds] Traverse the list of shaded backgrounds and for each one, output a shaded rectangle with corners determined by the coordinates of this table entry's four grid line positions. R2: [Enumerate Rule Tiles] Traverse the table grid structure and render the table rules: R2.1 [Draw Rule] Output a rule with corners determined by the thickness of the rule and the grid line position. R3: [Enumerate Table Entries] Traverse the table grid structure and render the table entries: R3.1 [Output Table Content] Invoke the rendering procedure for this table object at the position (x,y) for the table entry and inside the rectangular region determined by the four grid line positions. The table rendering algorithm is linear since the area enumeration algorithms are linear in the number of tiles enumerated. One can extend the rendering algorithm to do minimal repainting. The technique is similar to that used by WYSIWYG text editors. By keeping state information about which table entries have changed, then only the changed entries need be repainted. By keeping track of which grid positions have changed, then performing a block-move instruction on parts of the bit-mapped screen permits opening up (or closing up) grid lines to update the table. The table editing algorithms would have to cooperate and maintain these state variables Pointing Algorithm The pointing algorithm is used to determine the table entry from a position pointed at by the user on the display screen. Given the screen (x,y) coordinates, the pointing algorithm returns the table box at those coordinates. The technique is to first convert the screen (x,y) coordinates into coordinates relative to the table origin by translating them to the table origin relative to the display window. One must find the appropriate row and column grid positions bounding the table coordinates and then return the table entry in the grid data structure at those grid coordinates. P1: [Convert Coordinates] Given (Sx,Sy), the screen coordinates, convert these to table coordinates (Tx,Ty) by subtracting the screen coordinates of the table origin, (Ox,Oy), in the display window. Raise an error if (Tx,Ty) lie outside the table. P2: [Find row and column grids] Search the two vectors of grid coordinates for row grid line Gr containing Tx and the column grid line Gc containing Ty. P3: [Find Table Entry] Locate the grid tile at the grid coordinates (Gr,Gc) and extract the reference pointer to the table entry. The pointing algorithm is O( Delete Algorithm There are two variations of the deletion algorithm. The first variant deletes table entries without changing the table topology, and the second deletes grid lines and hence changes the topology. In the first variant, the algorithm is given two pairs of grid lines that determine a rectangular region, and by enumerating the area within that region it deletes all table entries found. This algorithm is linear in the number of entries deleted due to the behaviour of the area enumeration algorithm. The second variant of delete algorithm deletes grid lines. The algorithm is given the table grid structure and one pair of grid lines. All table entries found between the grid lines are deleted. Any table entry that crosses the pair of grid lines is changed to reflect the new topology with one of the grid lines removed. D1: [Prepare New Grid Structure] Prepare a second grid data structure to receive the modified table topology. D2: [Enumerate the Old Grid] Use the area enumeration algorithm to traverse the old grid structure. For each table entry, perform the appropriate action: D2.1: [Table Entry Within Grid Pair] Ignore this table entry; it will not appear in the new structure. D2.2: [Table Entry Precedes Grid Pair] Create a new grid tile for this unchanged table entry. D2.4: [Table Entry Straddles Grid Pair] Adjust one of the grid coordinates of this table entry and create a new grid tile for this change table entry. D2.3: [Table Entry Follows Grid Pair] Adjust both grid coordinates of this table entry and create a new grid tile for this change table entry. D3: [Install New Grid Structure] Update the table data structure to use this new grid structure. This implementation of the deletion algorithm uses two copies of the table grid structure and ping pongs between them. While this costs twice the space for the table grid structure, there are several performance benefits and the manipulation of the grid structure is dominated by the time to refresh the display screen. The major benefit is avoiding garbage collection since the grid data structure implementation caches deleted grid tiles and reuses them without allocating new ones. A more space efficient deletion algorithm is possible by relying on the directed enumeration algorithm to traverse the grid structure in a way that tiles to be moved are enumerated only when it is safe to move them. The snowplow and compaction operations [Ousterhout] for the VLSI version of the corner stitching data structure are examples of these change in place operations. Insert Algorithm The insertion algorithm also comes in two variants, one which inserts a table entry without changing the topology and another which inserts a new grid line that changes the table topology. In the first variant, the table entry is created and inserted into the grid structure at the grid coordinates corresponding to an empty tile. In the second variant where the table topology is changed by adding a new grid line, the same technique of using two grid data structures is applied. A similar approach can be taken to achieve a space efficient insertion algorithm. ¶5.5 Constraints A linear constraint solver is used to determine the table geometry from the table topology. The grid data structure holds the topology of all the table entries. The layout algorithm enumerates all the table entries and generates appropriate constraints on their positions. Solving the resulting system of linear inequalities results in the computed positions of all the grid lines and table entries. The constraint solver permits the layout algorithm to describe arbitrary table designs as well as providing an extension mechanism for custom layouts. Allowing additional constraints to be included with the table enables features like creating rows or columns of equal width, or making one column twice the size of another, or making three columns equal in width to another two, and so on. ¶5.5.1 Constraint Systems Several constraint system provided the motivation and insight into this application. Sutherland's Sketchpad [Sutherland] was a seminal interactive graphics application that relied on constraint satisfaction to position the graphical elements of his simulations. Borning's Thinglab [Borning, Thinglab] demonstrated the use of constraints in several different problem domains. Of particular interest were the very brief document layout and document content examples. The layout example was a paragraph surrounded by a rectangle precisely big enough to hold the paragraph. When the width of the rectangle or content of the paragraph was changed, the rectangle's height was adjusted accordingly. Work on program verification [Nelson, Program Verification] has required linear inequality constraint solvers to reduce and propagate constraints on program variables. Some of these solvers found their way into the JUNO interactive illustrator [Nelson, Juno, to appear at SIGGRAPH]. The ability to position parts of an illustration relative to one another was very attractive and analogous to the layout of table entries relative to one another. For the table layout problem, linear inequalities are sufficient. All the relationships within a table can be expressed as rectilinear (horizontal or vertical) distances. Thus addition, subtraction and scalar multiplication are the only operations needed to express the table positioning relationships. Using inequalities introduces slack variables that capture a useful notion for interactive use. Centering a column of table entries requires equal whitespace on both sides of each table entry. It is this whitespace that is represented by the slack variable. When an entry is edited and reformatted, if the new size does not exceed the old size plus the slack, then the table design has not changed. This fact helps to avoid recomputing the table geometry for each interactive update. Another interactive feature of table layout can be handled by the constraint solver design from the program verification work. When changes to the table topology result in a table design that exceeds the page dimensions, one would like to undo the changes and return to the original layout. The theorem prover constraint solvers also need this incremental property where the state of the constraint tableau is marked and if provisional changes to the constraints result in an infeasible tableau, the changes can be backed out. ¶5.5.2 The Constraint Layout Problem The table layout algorithm given earlier depends on a linear inequality constraint solver. Given the table arrangement in the grid data structure and the sizes of all the table entries, the layout algorithm can generate linear inequalities to describe the relationships between all the table boxes. It computes the solution to the constraint system to produce coordinate values for the grid lines and the alignment points of all the table entries. Independence of Horizontal and Vertical Constraints A simplifying assumption made in the table layout algorithm is that the row and column constraints can be solved separately. By treating the two sets of constraints independently, the number of constraints is reduced and the size of the corresponding constraint tableau is reduced. It worth assessing the necessity of this assumption and the situation when it is violated. The assumption does not hold when one wants to create grid designs with square modules. Such a square module results from making the row height equal to the column width for a grid module. Obviously in this situation, the sets of row and column constraints are not independent. The assumption is not true when trying to balance whitespace within a table by folding long horizontally oriented table entries to trade off more vertical depth. In this situation, decreasing the column width increases the row height and the constraints are not independent. What is the cost of solving interdependent row and column constraints? The two sets would have to be combined into the same constraint system, thereby doubling the problem size. Since the constraint solver takes O(n3) time on average, the combined constraint layout would require eight times more computations. For most table layouts, the assumption of independent row and column layout does hold and the simpler constraint problem can be handled. However, in more esoteric or optimized situations, the combined problem can be posed and solved. Linear Inequality Constraints The table layout can be described by linear inequalities. Each table entry is treated as a rectangular box attached to four grid lines: left, right, top, bottom. The table entry dimensions are designated by the four box dimensions: Bleft, Bright, Btop and Bbottom. The grid positions for each table entry are designated Gleft, Gright, Gtop, Gbottom. The alignment position for all table entries in the column between grid lines left and right is Aleft,right and in the row between top and bottom is Atop,bottom. All variables are restricted to have positive values. The linear inequalities describe the table layout including the order of the grid lines, the alignment of boxes within grid lines, and any special or custom constraints. The column constraints are used as examples in what follows, but similar row inequalities are also needed. The order of the grid lines imposes the following inequalities for the column grid lines that surround a table entry: Gright  Gleft Aleft,right  Gleft Gright  Aleft,right Positioning the grid lines far enough apart to contain each table entry generates this constraint: Gright  Gleft Aligning a table box flush left in the column generates these constraints: Aleft,right  Gleft = 0 Gright  Aleft,right Similarly, aligning a table box flush right in the column generates these constraints: Aleft,right  Gleft Gright  Aleft,right = 0 Centring a table box within the column generates this constraint: 2Aleft,right  (Gleft + Gright) Additional These constraints can be generated automatically from the information kept in the grid data structure. A small expression language for constraint inequalities is provided to describe additional constraints for special effects. For instance, to create two columns of equal width, the following constraint is needed: (Gleft` + Gright`)  (Gleft + Gright) = 0 It is possible that right` { left if the two columns are adjacent. The expression language accepts the standard form of the variable names. A variety of user interfaces to this facility are possible, however, only the simplest interface of actually typing the constraint equations is supported by the prototype. Other options include keeping lists of equal width columns or rows, or allowing interactive specification of the constraints. ¶5.5.2 The Layout Algorithm The complete table layout algorithm including the constraint solver can now be given. The algorithm is given the grid data structure of the table arrangement and all the table entries. The first step in the algorithm is to ensure that all the table entries are dimensioned by invoking the layout algorithm for the table entry object. This dimension information is remembered since the box dimensions are used twice in the row and column constraint satisfaction. L1: [Dimension the Boxes] Determine the dimensions of the table entry by invoking the layout procedure for that class of object. Remember the dimensions since they are used for both row and column layout. L2: [Layout the Rows] L2.1: [Establish Row Constraints] Enumerate the table grid structure and generate the appropriate constraints for each table entry. L2.2: [Solve] Invoke the constraint solver on the row constraints, and record the grid coordinates. L2.3: [Position Table Entries] Enumerate the table grid structure and compute the layout coordinates of the alignment position for the each table entry. L3: [Layout the Columns] L3.1: [Establish Column Constraints] Enumerate the table grid structure and generate the appropriate constraints for each table entry. L3.2: [Solve] Invoke the constraint solver on the column constraints, and record the grid coordinates. L3.3: [Position Table Entries] Enumerate the table grid structure and compute the layout coordinates of the alignment position for the each table entry. ¶5.5.3 Handling Large Tables The constraint scheme just outlined suffers from rapid growth in the problem size as the tables grow larger with more and more rows and columns. A couple of strategies can greatly reduce the number of constraints generated. Observe the structure of these constraint inequalities that determine the grid positions: Gright  Gleft Gright  Aleft,right Aleft,right  Gleft 2Aleft,right  (Gleft + Gright) Each of these inequalities is dominated by the maximum value of the right hand side. By searching for the maximum right hand side value of all the boxes within a column, the set of inequalities can be replaced by the following four inequalities, thereby vastly reducing the number of constraints. Gright  Gleft Gright  Aleft,right Aleft,right  Gleft 2Aleft,right  (Gleft + Gright) Most of the constraint inequalities have only two variable terms, leading to the observation that sparse matrix techniques would vastly reduce the space requirements. ¶5.6 Conclusions The new framework for table formatting provides a mechanism for including tables in electronic documents and formatting for WYSIWYG interactive editing. The document model was extended to include arbitrary arrangements of table entries and the document style mechanism extended to provide style attributes suitable for the table formatting algorithms. The grid data structure represents the table explicitly as an area layout problem and provides simple linear algorithms for working with tables interactively. Rows and columns within tables can be manipulated with equal ease. The use of a linear inequality solver to compute the table geometry provides an extensible method for capturing the arrangement of table information.