ThesisProposal.Tioga
Rick Beach, May 25, 1984 3:56:32 pm PDT
Lauer's Computing article on CS Thesis Proposals:
1. Statement of the problem and why is should be solved
2. Reference to and comments upon relevant work by others on the same or similar problems
3. Candidate's ideas and insights for solving the problem, and any preliminary results
4. Statement or characterization of what kind of solution is being sought
5. Plan of action for the remainder of the research
6. Rough outline of the thesis
Evolutionary Thinking vs The Great Leap Forward
1. Statement of the problem
Interactive Document Composition for Tables, Illustrations
Build an interactive WYSIWYG table formatter.
Use grid systems to define the layout of tables and pages.
Define a framework for specifying both layout and style rules.
Present some layout algorithms:
Determine a grid layout from the table content and its alignment relationships.
Accept a previously defined grid layout and present the table.
Adjust a grid layout to achieve feasible solution to unsatisfied alignment or size constraints (on column width, say)
Why should the problem be solved?
Tables are hard!
Tables provide a compact representation for large amounts of data.
Table layouts are page layouts in microcosm and math notation in the large.
Why are tables hard?
Tables has two dimensional alignment relationships (rows and columns)
In galleys, text words become lines and lines become pages. Constraint is line length and H&J algorithms break lines appropriately.
In tables, text words become table entries, entries are aligned simultaneously into rows and columns. Constraints are page width and entry alignment both horizontally and vertically.
Table entries may flow from one to another (especially when treating free form grid designs as a table, where pictures and captions are placed on a grid and text flowed around those elements).
Typeset tables require fine resolution in placement.
Spreadsheets have it easy with a matrix format.
Typewriters have it easy with fixed width characters and fixed escapments for tab stops.
Typesetters have variable width fonts on various fine resolutions (e.g. 1/10th of a point or 1/720 of an inch).
block of type model: treat table entry as an area, furniture in layout
[Seybold, Fundamentals of Photocomposition, p 14]
Row and column alignment:
Align table entries within its row or column.
Headings may span several rows or columns.
Equally spaced rows or columns independent of content.
Foldable columns (or sets of columns) continued in adjacent columns and balanced.
Alignment choices:
Horizontal alignment: flush left, flush right, center, align on character (for example, decimal point)
Vertical alignment: flush top, flush bottom, top baseline, bottom baseline, center, center on top baseline, center on bottom baseline
Style treatment of tables may be grouped:
Table may have different type attributes than surrounding text.
A row or column may be distinguished with different type attributes.
A table entry may have different attributes.
A table entry might contain any text or illustration permissible by the formatter.
Whitespace allowances:
Bearoff distances above, below, to left, to right of table entry.
Intrusions permitted for footnote marks or glosses.
Rules and decorations:
Rules along row or column boundaries
Rule patterns, such as double rules, thick-thin pairs, etc.
Various weights and patterns or rules or borders
Background tints for table entries or whole rows or columns.
Rules within boxes, for example, for total of a column of entries.
Braces to group entries in a column
Leaders (the dots that lead your eye):
Leaders may be replicated characters or rules.
Congruence which arranges that replication pattern aligns. Congruence of several different sizes of leaders.
Leaders run from one column into another and possible across several columns.
Footnotes within tables:
Footnote may be included with table entry, if the entry is large.
Footnotes may be collected at the bottom of the table, outside the table layout but within the space allocated in the page layout.
Footnotes at the bottom of the same page (?) as the table formatting. That is the footnotes are continued in the stream of footnotes within the text. (Sounds a bit hokey to me.)
Special treatments to make a table fit.
Rotate headings to typeset vertically to reduce column width. Possibly set text vertically with characters horizontally (vertical stack arrangement).
Reduce size of type within table.
Reduce whitespace bearoff to make table fit.
Tables may be larger than a single page:
Wide tables may be printed broadside, rotated 90 degrees (either way), so the long table dimension is along the long paper dimension.
Tables may be laid out as facing pages in a two-page spread.
Tables may be printed on fold-out pages for special circumstances.
Tables may be continued on several subsequent pages.
Boxheads (set of column headings) or Stubs (set of row headings) may need to be repeated if table is continued. Continued headings (add the text "continued") may be necessary for such tables.
Readability concerns are well known to those who make mathematical tables.
Grouping rows or columns by adding whitespace or rules every so many entries.
Provide eye guides, for example, thin rules between rows, thick rules every fifth row, background tints every so many rows.
Various sources of tabular materical:
Financial spreadsheets from calculator program.
Financial reports.
Program generates voluminous data.
Extracts from a database.
author composes a simple table.
Unfortunately, almost anything can be a table!
2. Relevant work
tbl (troff)
Batch (noninteractive) preprocessor for troff.
Layout can be determined from content.
Except when a table entry folds onto a second line, then a column width specification is required.
Table specification is interleaved with table content implying no easy indirection for table design.
Not WYSIWYG.
Asymmetric treatment of rows and columns. You can easily interchange rows with a line editor but its harder to interchange columns, especially if T{...T} constructs are used to group troff commands for a table entry.
Lined rules around table, and along major row and column boundaries are easily specified.
Simple rule patterns (single and double rules).
TEX
Batch (noninteractive) document compiler.
Table specification is interleaved with table content implying no easy indirection for table design.
Not WYSIWYG.
Layout determined by lists of boxes. Lists have single horizontal or vertical bias.
Complex alignment within horizontal or vertical bias possible. Similar power to tbl.
TEX Masters are those who can handle vertical rules in tables because of the very difficult method of interposing vertical rules into the list structures.
Scribe
Not much written in Brian Reid's thesis. Have not yet looked through the Scribe User's Guide.
Type (Waterloo)
Arcane batch document processor.
Table entries are formatted as diversions with styles associated with each column.
Style indirection supported for table design this way.
No convenient alignment control between table boxes, for example, no top baseline alignment.
Tabular Composition (Seybold report)
Arthur Phillip's article [Seybold, Tabular Composition] is a gold mine of ideas.
His boxhead skeleton is very similar to my grid system layout scheme (sigh!!)
Summary of points in Phillip's article:
— need to horizontal align on characters as well as center within a column (content specific choice of character, his example used a multiplication sign!)
— specifying nil entry contents (editing operation)
— interactive skip between column entries (table property: enumerate entries by row or column major order)
— narrow than normal set for text in wide columns (adjusting the type attributes to fit the size constraints)
— tables of text are generally simple; but can become difficult if braced items included (otherwise he never mentions braced items)
— box skeleton scheme (his coordinate system was relative to boxhead nesting level structure)
— horizontal panning of wide tables when editting them interactively (horizontal scroll bar suffices for this)
— table transpose by computer (exchange rows and columns to see if the resulting table fits the width constraint)
— 90 degrees rotation of text entries to save column width
— boxhead repeated across several pages for long continued tables
— landscape or portrait tables (here is broadside tables)
— double-page spread without splitting a single column across two pages (requires adjusting column widths to permit gutter break at the right place).
— breakpoints in the stub, automatic continued headings generated (is this appropriate for a style rule or is it editing content?)
— folded columns to save overall table depth (need to do page breaking within tables!)
US Government Printing Office
Style manual for tabular material.
Some computer program support for this (batch only I believe). I read a later assessment of GPO composition tools but have lost the reference when Paxton moved.
Commercial composition systems (? Penta, forms?)
I need to search through the Seybold report for more reports on forms and tabular composition tools. This clearly is seen as only a part of the general solution to interactive composition systems. Any such system treats tables as further development after they have delivered their book composition system.
Graphic Arts tradition
I contacted Michael Rogodino, a book designer in Palo Alto, and have arranged to visit and chat when I return from Graphics Interface'84.
3. Ideas and Insights
Table Formatting as Microcosm of Complex Composition
If we can do a good job of formatting complex tables, then these techniques will apply to complex page composition. Some of the same problems come up within tables, such as balancing folded columns, placing footnotes, or flowing text from one table entry to another (the page break problem).
Table entries can flow from one box to another like columns of text on a page.
Pages have constant information (headers and footers) like the table boxhead and stub information.
Pages can have boundaries determined by the content on the page, e.g. footnotes.
Style attributes apply to the geographical as well as logical parts of a table/page/document.
Table formatting problem scales up to pages, down to notation
Math notation is two-dimensional like tables
Alignment of notation similar to table alignments (vertically centered limits on summations, several summations within an equation each with limits horizontally aligned). Of course matrices might be viewed as tables in minature.
Grid Systems & Constrained Placement
Grid systems are a way of specifying positions & relationships of areas.
Boundaries of the area are associated grid lines.
Area contents must be formatted to fit within the grid boundaries.
Moving a grid line affects all the associated areas.
Areas may span several grid lines, and lines may be physicall coincident without being logically coincident. Hmmm, how to say THAT any better without a picture!
Constraints describe way to solve layout (alignment, sizing constraints)
General solution by converting relationships and positions into linear inequalities
Some ad hoc and general simplifications possible to reduce the problem size
sets of inequalities on same variables become one inequality on the maximum
Algorithm to search for feasible solutions to unsatisfied constraints
Could choose to require content to be changed
Might accept hints for different formatting parameters for table entries (the try table approach)
Use heuristics to change dimensions of table entries
white space balancing
monte carlo techniques
worst case adjustment
Search strategies for new dimensions
Monotonic paragraph property: paragraph
Monotonic Paragraph Property
Conjecture: Paragraph depth increases monotonically with decreased width.
TEX experiment seems to indicate liklihood of this.
Should be able to prove something given the boxes arrangment and the hyphenation kerf points.
Whitespace balancing
Table rows or columns can contain unequal amounts of text, resulting in unequal amounts of whitespace.
Equally spaced rows or columns that align to grid boundaries are an obvious driving force.
Statistical summary of the rectangular areas give standard deviations. Goal is to minimize the standard deviation. What does the standard deviation look like?
Badness function seems lurking here with an arrangement of the grid positions resulting in different badness values.
Using Monotonic Paragraph Property we can postulate that reducing the width will increase the depth, thereby improve the badness value.
Interactive Composition Techniques
Pointing selection techniques adapt from Tioga documents to table structures
Demonstrated selecting table boxes, rows, or columns
Extend selections to containing rows and columns until whole table selected
Selection may contain range of boxes, rows, or columns for grouped operations
Specifying rules and decorations in tables/layouts are based on grids rather than boxes
TEX has real problem inserting rules within tables, especially vertical rules
tbl
Layout Style
Style of a document can describe looks, attributes, layout, illustrations, design rules, etc.
Problem in describing all the attributes and relationships
Design rules are represented (rather abstractly) by "penalties" and "glue" specifications
Interscript
"pouring" model of text layout
regular expressions describe the "boxes" into which text is poured
4. Solution being sought
framework to describe tables, math notation, gridded pages
algorithm to satisfy layout relationships
5. Plan of action
Hierarchical table data structure for CS740 project
May 28 present proposal to Booth & Beatty
Area data structure for interactive table formatter.
June 18 consult with Booth at PARC
Summer spent on algorithms and framework specification
6. Thesis outline
1. Introduction
2. Computer Composition
3. Graphical Style
4. Tabular Composition
5. New Framework for Tabular Composition
6. Future Directions
7. References