[_CDMiwok_01_]<cedarcommon2.0>Fit>fit.tioga!1

Internal Memo

XEROX

To From

whomever Maureen Stone

CSL/Imaging

Subject Date

Fit February 19, 1985

The curve-fitting project

This memo is to give some structure to the curve-fitting project. My goal is to turn this long-term, low activity project into a tangible result.

This project began back in 1980 as the result of my experiences with Griffin users. We are committed to the idea of haveing our images represented by outlines rather than pixels. For outlines we use lines and cubic splines. It has long been observed that many users prefer to draw curves rather than construct them. Current techniques for spline-curves demanded a construction algorithm. The goal was to provide a method for allowing user interfaces which presented a sketching paradigm but produced an analytical result. A second goal is the desire to input shapes which already exist on paper and represent them as outlines. When these images are scanned, they are reduced to an array of samples, or pixels. Converting these pixels to an outline is another example of the problem we have come to call: "sampled to synthetic".

I was not the first in Xerox to get interested in this problem. Flegal and Baudelaire had built and algorithm that was implemented in the program Draw in 1974. The algorithm is overly simple and the implementation was flawed, so it cannot be considered a truely successful attempt in this area. Bill Reeves attacked this problem as part his thesis and submitted a paper to TOGS in 1982 based on his work. I saw the draft of that paper, but to my knowledge, he never finished it. Greenberg and Barsky have a simple algorithm also (Siggraph 79?).

The problem of fitting converting a sketch to a sequence of parametric cubics and lines is a complex one. Michael Plass has come up with a very interesting iterative technique for finding the best single parametric cubic through a sequence of points. (Plass and Stone, Siggraph 82). The problem of finding the best way to break a complex curve into pieces is still unsolved, though Michael has also contributed a dynamic programming algorthm that gives interesting results.

We're back to the problem of "best fit." I find as I try to make the program useful I keep wanting it to fit "what I mean". How do you do research in this area? If you look in general at the semantics or say, a font, you can imagine capturing what an expert user does to select knots and lines. How hard is rule-based programming? What primitives would we need in this system if we took this approach?

Tools for manipulating samples

We have developed a number of different tools for analyzing and manipulating samples. This is an attempt to write them down and categorize them. A sample is a point on the x,y plane that is part of an ordered list. It has two tangents associated with it, TanIn and TanOut. It has a property that indicates whether it is a potential joint, forced joint or neither.

Samples and image processing

Contouring for grayscale, bilevel contouring that keeps the corners. Contouring produces a set of ordered samples that describe the outline of dark regions.

Centerline finder—still looking for the perfect algorithm. Have one that sort of works, current rewrite untested. Would produce a set of ordered samples that described the center of dark lines.

FFT filter (never used it but it's in the set)

Mik's TRC tool (change the black to gray profile)

JaM sketch and Michael's point editor

Thought of: link to DIP, improved point editor, research in centerline finders

Related work (sort of): Dorian Kermisch at WRC

Font utilities

AC to contouring to samples

SD to samples

Analyze samples

Find a closely fitting polygon using dynamic programming (Michael's dynfit). Two extra parameters define a line length and angle. If two lines longer than the specified length meet with at least that angle of turning, force a corner.

Horizontal and vertical lines

Horizontal and vertical extremes

Find 90 degree corners

Averaging filter: 2-d three point average. Moves the points. Option to not move a point based on property.

Set Tangents

Quick tangents: computes tangents at node points by differencing neighbors

Square tangents: computes tangents at node points by differencing the square of the neighbors

Circle tangents: computes tangents at node points by finding a circle through 3 adjacent nodes

Cubic tangents: Sets tangents at nodes by fitting a cubic between neighboring nodes

Adjust Tangents: Looks at existing tangents and adjusts them

Nearly equal: TanIn nearly equal TanOut

HV Tangents: If nearly horizontal or vertical, make it so

Force breaks in the curve

Force Corners: If internal angle is sharp enough, force a corner. Similar feature is available on most of the tangent setters. Can also force breaks on extremes and at the end of horzontal lines.

Selecting final knots from potential knots

Fit between each potential knot

Grow until error is too large, then back up

Dynamic programming (dynspline)

Dynamic programming for finding correct knots

The crux of our approach is that you find some quantity of potential knots, then use an algorithm to select the best set of knots from the potential knots. The information on the knots is the local tangents, and a property that declares whether or not there is a forced joint in the curve. I rewrote Michael's original routine to do the dynamic programming and set the error metric.

Dynamic programming demands the ability to formulate the problem as a sequence of subproblems. The badness function must add as B(j) = B(j-k)+B(k). It is possible to extend this by keeping multiple badness functions at each point: B(j,n) = B(j-k,n)+MIN[B(k,n)] where there are n badness functions at each point. To date we have "tinkered" with the badness functions. Michael did not see a significant pattern in his tinkering though he did produce functions that gave good results. How do I decide whether I can make any progress here? What is the best way to include the different definitions of a "good fit".

Current set of badness functions based entirely on maximum deviation. This works because the error is not monotonically increasing. Metrics define error for inner loop. Notice that this inner loop does not usually work very hard. Point is to try and force it to fit within the allowable maximum deviation. I'm not sure that works yet, but I try.

Stepwise procedures for selecting potential knots

The current structure has all the JaM modules calling on a default set of samples. I'm finding that the different techniques for finding interesting features are interfering with one another. A possibility, given that JaM now allows REFs any on the stack is to change these modules so that they take their samples off the stack. I could then work on different copies of the samples independently then "merge" the results back.

Merging can be tricky if the samples moved. I could use the index as a starting point then find the "closest" sample. Or, I could add a point. One issue is that the current error metric in dynspline doesn't work well if there are two nodes close together. This seems like a shame, and should be fixed. Problem seems more subtle than that....

Assuming the merging and multiple potential knot problem can be solved, a JaM script that is a set of "rules" and priorities for finding features could be constructed. For example: find horizontal and vertical lines exactly (good for bitmaps). Find serifs. Find maxima and minima. Find a general polygon fit for the rest of the curve. Set corners. This is similar to the current scheme but instead I've been trying to construct each potential knot procedure so that it doesn't affect the results of the previous one. It might be possible to get this to work but it feels less general than the solution above.

General purpose feature finder

To date we have concentrated on global techniques for finding potential knots such as dynpoly. Especially in the case where we are trying to fit at too low a resolution (and this seems popular) an option to specify a sample pattern and define it's potential knots is attractive. My idea is to specify the curve in terms of its curvature profile, k=f(s). Slide the profile for the feature along the profile for the entire curve, computing g=curveprofile-featureprofile. I should be able to take the sum compare it to zero. My need a maximum deviation as well. The whole trick is coming up with something general enough. I really want a similar feature finder, not an identical one.

Fit as a hot-topic

Several calls, especially from people wanting to convert bitmap fonts to outlines.

Fitting the TEX Fonts

We would like to use the AMR fonts on the 1200 spi platemaker without running Metafont to get each face, point size and magnification. There are 54 Fonts with an average of 3 magnifications each on Clover. What is the best and most interesting way to get this directory onto Platemaker?

I would like to use Fit to make spline outlines of these fonts. The problem splits into finding font masters and defining "a good fit". For masters we can get 1) Metafont generated bitmaps from Stanford. This route involves asking a favor of Dave Fuchs, so we want to minimize this. 2) bitmaps from Clover. While 384 spi seems too low a resolution for 10 points and smaller, the different magnifications increase the effective resolutions as follows:

m0 = 384, mh=420.65, m1=460.8, m2=552.96, m3=663.55, m4=796.26, m5=955.51

To define "good fit" I am first going to use the bitmaps produced by metafont as the "truth" and compare bitmaps using the imager's scan converter. Then I will use the press scan-converter to see if there is a difference. I have received from David Fuchs:

AMBX10, AMEX10, AMR10, AMSL10, AMSY10, AMTI10, AMTT10 at 1200 spi

AMR10 at 880 and 1760 spi

I have now fit all seven of these fonts using the same parameters. There are still some problems. The circumflex in AMEX10 has a rounded inner corner. Adjusting the parameter on dynspline might fix that. The HV line finder currently sets its tangents using quicktangents. This doesn't always force an H/V tangent where I want it.

The current direction seems to call for an actual line semantic. The issue seems to be that lines need to be fit with a tighter tolerance than curves. Rumor has it that this is true—the computer vision people must know something about this.