*start*
00926 00024 USt
Date: 1 June 1981 9:15 am EDT (Monday)
From: Lampson.PA
Subject: Re: Some Doc Issues
In-reply-to: Mitchell's message of 31 May 1981 12:27 pm PDT (Sunday)
To: Mitchell
cc: InterDoc↑

I think the question you are asking is: what happens to an Interdoc of level i
when it is edited by an Interdoc editor of level j<i.  That is certainly a question
which doesn't arise with Press.  I agree that the proper answer is unclear.  

It might be worth noticing that the simplest answer is that the Interdoc is
restricted to level j first.  This presumably means that anything not expressible at
level j is discarded, or mapped in some unimaginative and irreversible fashion
into someting expressible at level j.  This solution seems too draconian, but I
have a feeling that it will not be easy to find a less draconian solution which is
still reasonably well defined.  Still, it is certainly worth some effort.

*start*
00944 00024 USt
Date: 1 June 1981 9:15 am PDT (Monday)
From: McGregor.PA
Subject: Re: Some Doc Issues
In-reply-to: Lampson's message of 1 June 1981 9:15 am EDT (Monday)
To: Lampson
cc: Mitchell, InterDoc↑

I recall a number of proposals in SDD where a level j (like a Cub or something)
editor would find itself editing a Star document.  The goal was to preserve all of
the information, yet simply be unable to display/edit the higher level objects.

Question:  Do we believe that any k-level document object that is
incomprehensible to a level j editor can be represented and displayed as a "black
box" that can be at least deleted and perhaps moved or copied?

I'm unhappy with Butler's "draconian" solution of discarding all the k level
objects.  At least in the product organisations, the ability to do touch-up editing
using a lower class of editor than the one that originally composed the document
seems quite important.

Scott.

*start*
01637 00024 USt
Date: 1 June 1981 10:44 am PDT (Monday)
From: Mitchell.PA
Subject: Re: Some Doc Issues
In-reply-to: McGregor's message of 1 June 1981 9:15 am PDT (Monday)
To: McGregor
cc: Mitchell, InterDoc↑


I am sending this message again because Laurel threw in a reply-to field that I
didn't want on the previous transmission.

I also find Butler's solution a little too Draconian, and I think it is worthwhile
trying to find some way to preserve structure in an InterDoc script even when it
is edited by a low-level editor.

Think of an InterDoc script, S, as a database containing document pieces with
attributes, no matter how represented.  If level(S)<level(some editor, E), this
could be handled by E accessing S through a level filter (i.e., a database view),
which would hide the intricacies of pieces of higher level than E's and might
well present a different interface, X, to E for the pieces it does understand than
a higher level editor might use.  E's updates would then not disturb structures
that it couldn't touch, and its changes would be reflected back into S via X.  If
changes were reflected back to S in a batch or only in relatively large chunks,
this could even be made relatively efficient.

There are certainly difficulties with this approach; e.g., If E doesn't understand
paragraph structure but S contains paragraphs, how would X map alterations back
to S (some of which cross paragraph boundaries)?  

A Note:
Even if E doesn't understand illustrations, for instance, it might still be able to
delete or move entire illustrations in S as long as E does understand about
embedded objects. 

JGM

*start*
00536 00024 USt
Date: 1 June 1981 3:01 pm EDT (Monday)
From: Lampson.PA
Subject: Re: Some Doc Issues
In-reply-to: Mitchell's message of 1 June 1981 10:44 am PDT (Monday)
To: Mitchell
cc: InterDoc↑

This idea has promise, but it has some problems.  The most common situation
may be one in which the code for the editor and the machine running the editor
don't understand about higher levels at all.  This means that the filter must be
quite simple and quite generic, so that it can be included in the editor from the
beginning.

*start*
01665 00024 USt
Date: 1 June 1981 12:10 pm PDT (Monday)
From: Horning.pa
Subject: Re: Some Doc Issues
In-reply-to: Mitchell's message of 31 May 1981 12:27 pm PDT (Sunday)
To: Mitchell
cc: InterDoc↑

I think that several issues got confused in this conversation.

To Leo's original question: "Do we really expect that when a document is
transformed from a representation private to an <editor-formatter> to InterDoc
format and back, that there will be no information loss?" I hope that the answer
is an unequivocal "Yes."

Most of the discussion focussed on what we expect to happen when an Interdoc
script of some level is edited by an editor of lower level.  I had thought that one
of the "saving graces" of this excercise was the recognition that we were
specifying a standard for something STATIC (an editable document), not the
dynamics of what various editors do to documents.  I doubt that a standard can
enforce many guarantees in this area.

I agree that, as a matter of good design, if an editor is applied to a script of
higher level, it should generally preserve unchanged elements and structure that
it is not prepared to deal with.  It would even be a good idea if the standard
made that relatively easy to do (as the Bravo format, for example, does not).

For example, I could imagine a simple editor that was prepared to edit any string
in a script that had uniform attributes (producing a new script in which the
edited string had the same attributes), but forced the user to edit separately each
string delimited by attribute changes.  Similarly, the user might be restricted to
moving entire hierarchical units within the script.

Jim H.

*start*
01549 00024 USt
Mail-from: Arpanet host SU-SCORE rcvd at 1-JUN-81 1514-PDT
Date:  1 Jun 1981 1512-PDT
From: Brian K. Reid <CSL.BKR at SU-SCORE>
Subject: issues
To: InterDoc↑ at PARC-MAXC

Either I am missing out on a big piece of the design conversations here or
you guys' minds have been rotted by JaM and Interpress. Possibly both.
In Interpress/JaM semantics is everything, and the language 
design is primarily concerned with execution semantics and imaging 
primitives. A language for the representation of editable documents
should have almost diametrically opposite properties. Syntax is
everything. If your mental model of the alleged InterDoc (I don't like
this name either) is that it is "something like Interpress", then I can
see how you are worried about being able to ignore pieces that you don't
understand. But since the essence of InterDoc should be syntactic and
not semantic, it should be a complete cake walk to parse and ignore some
piece of it that you don't understand. 

At the moment I believe that a useful property of InterDoc would be that
it had absolutely no inherent semantics. Something like an S-expression
language, for example, although I think that the pure nesting structure
is far too restrictive. I think that our initial design energy should be
going into determining the kind of structural information that must be
representable syntactically and the kind of structural information that
it is ok to punt to the (arbitrary) semantics.

I look forward to our meeting tomorrow afternoon.

Brian
-------
*start*
00890 00024 USt
Date: 2 June 1981 11:36 am PDT (Tuesday)
From: Ayers.PA
Subject: Re: Editing at level "j"
In-reply-to: McGregor's message of 1 June 1981 9:15 am PDT (Monday)
To: McGregor
cc: InterDoc↑
Reply-To: Ayers

"I recall a number of proposals in SDD where a level j (like a Cub or something)
editor would find itself editing a Star document.  The goal was to preserve all of
the information, yet simply be unable to display/edit the higher level objects."

I proposed an arrangement that would, I claimed, allow a Star document's text to
be edited at an 860, even though the 860 cannot display graphics, fonts, etc. 
The proposal required ancillary smarts -- the 860 was given a construct, edited
itm, and then the returned construct was carefully compared with the original.

Not really InterDoc, tho I will be happy to circulate the claims at an appropriate
time ...

Bob

*start*
01956 00024 USt
Date: 3 June 1981 2:43 pm PDT (Wednesday)
From: Mitchell.PA
Subject: Meeting #1, 2 June 1981
To: Interdoc↑.pa

This message is intended to capture some of the points discussed and questions
raised (there are more of the latter) at the first meeting to discuss an interchange
standard for editable documents.


The next meeting of the InterDoc group will take place Friday, 5 June at 13:00 in
the CSL Commons


Attendees: John Warnock (JW), Robert Ayers (RA), Scott McGregor (SMcG),
Alan Perlis (AJP), Brian Reid (BR), Jim Horning (JJH), and Jim Mitchell (JGM)

The meeting began by our trying to determine what we are doing and what an
editable document is, anyway.

BR: Sees a document standard as a syntactic structuring mechanism.
     Having only hierarchical structuring is insufficient (e.g., linear order for
pagination vs. hierarchical chapter, section, paragraph structure)

JGM: Even structure that is not understood by a given editor should be
preserved (e.g., if a neanderthal editor was  used to change text in a Star
document, the illustrations and structuring information should not thereby be
deleted).

JW: Defaults are important.  How are properties inherited?  Perhaps a property
should say how it is to be inherited.

RA:  Inheritance is a property of the editor, not of the document pieces (e.g., Star
does it both ways: typing inherits from the left, but copying text does not cause
it to inherit the properties of contiguous text)

SMcG:  Properties should have properties.

JGM: Where does it end?  Is there some base set of meta-properties to which a
certain amount of semantics can be attached or is everything syntactic?

BR: Some properties must not be inherited; e.g., "the paragraph must begin a new
page" should not be inherited by an illustration embedded in that same
paragraph.

BR: What are the set of meta-properties of properties?

General: Do we need to handle inter-document references? 

JGM
*start*
03993 00024 USt
Date: 8 June 1981 2:19 pm PDT (Monday)
From: Mitchell.PA
Subject: Meeting #2, 5 June 1981
To: Interdoc↑.pa

Attendees: Bill Paxton (WHP), Robert Ayers (RA), Scott McGregor (SMcG),
Alan Perlis (AJP), Brian Reid (BR), Jim Horning (JJH), and Jim Mitchell (JGM)

AJP: If we do this job right, it ought to be possible to build a parametric editor
that could accept any kind of document.

BR: some such editors are one by Nievergelt at ETH Zurich, Zog at CMU, and
Steve Woods' thesis at Yale.  BR will distribute copies of papers by Nievergelt
and Woods.

JGM: Tioga is also an example of such an editor (based on CedarDocs)

SMcG:  We ought to be concerned with the efficiency of transforming an
InterDoc script into an internal encoding and back again.

BR: If I were building an editor and had the InterDoc standard, I would make the
internal format be well matched to the standard.

There was some general discussion about wanting to be able to represent changes
(deltas) to documents in the standard.

RA: (wrt to SMcG's comment above)  In general, we can't slurp up an entire
document, but need to do updates to them.

JGM: (in an attempt to side step the issue of deltas and history) View a script as
a set of nodes with properties and then store history or deltas as extra properties.

BR: We should use the InterPress notion of lossless transformations a la Lampson,
vis., require that f(g(S))=g(f(S)), where f and g are transformations in the
appropriate directions.

RA: We should stay away from determining which set of basic editing primitives
should be used as a model (e.g., cut/paste vs. append/delete)

JGM:  InterDoc scripts must be able to represent audio documents as well
(general agreement)

BR: We need a strong formal idea of what an editor is.

AJP: We need to understand either documents or editors, but we can't let both
float free.

RA: I only understand a lot of individual editors.

WHP: Tioga uses properties (as suggested by JGM above)

BR: Is there some kind of distinguished property of each node?

JJH: The only thing that every node needs to have is its own identity, as in the
PIE system (e.g., a unique ID)

JGM:  This might help with external document references also: an external
reference would be a (file name, node UID) pair, which is location independent
within a document.

WHP: What happens if you have an external reference to a paragraph and then
you move it from one document to another? Does the reference follow it?

JJH: If we used a write-once storage model, the problem might be more tractable.

BR:  We need to be able to handle documents like collected bibliographies by
reference.

JGM:  Perhaps we should distinguish different binding strengths; some external
references follow the referees, others (the majority) do not.

JJH:  Unique IDs only make sense in immutable files (generally not agreed with)

SMcG:  The InterDoc standard should be pretty basic, almost a syntactic
convention.  We also need a set of recommendations for text documents, graphics
documents, etc.

JJH (quoting Tony Hoare):  We should only standardize on those things that
don't matter.

There followed some general discussion on the fact that document nodes must
have types.  Two models were proposed: (1) each node has a list of types (i.e.,
the ways it can be viewed); and (2) how you view a node depends on how you
got there.  It seems we need both.

RA:  As SMcG proposed, we really need two standards, one basic, almost
meta-standard, and a second set of specific standards.

For the next meeting in two weeks, various people will try to produce initial
base-level proposals (BR and JGM said they would).  It would be nice if these
proposals were distributed at least two days before the next meeting.

Proposed next meeting: CSL Commons, Friday, June 19, 13:00 (let me know if you
have a conflict).  Written proposals to participants by Wednesday, June 17,
therefore.

JGM
------------------------------------------------------------

*start*
03713 00024 USt
Date: 8 June 1981 2:24 pm PDT (Monday)
From: Mitchell.PA
Subject: Some Meta issues about any InterDoc standard
To: InterDoc↑.pa

In thinking about the form that an InterDoc standard might take, it occurred to
me that

(1) If, as was discussed in meeting #2 (81.06.05) we view an InterDoc script as a
set of nodes with properties, then, following InterPress, we need to find a "source
language" to represent them.  Whatever the form of this source language, it must
be efficiently processable by programs, which processing must include at least
generation (from some private encoding), parsing (into some private encoding),
and probably some kind of random access, depending on how many smarts an
editor has (e.g., it might form a private encoding by recording that information
in the InterDoc script to which it needs frequent access in a private structure
while still accessing some of the original information in the script - somewhat as
Bravo does for text pieces)

(2) Many of the syntactic decisions made in the InterPress standard fit well here,
too, and we should just adopt them where there is no obvious advantage in
inventing our own.  For example, an InterPress master is readable by both
humans and programs, which, besides the obvious debugging/reading benefits,
also has the virtue of thereby staying away from many machine-dependent
issues, e.g., word size.

Given that almost all InterPress masters and InterDoc scripts will be produced and
consumed by software, their encoded forms should be as efficiently generatable
and parseable as possible.  In addition, InterDoc scripts can be expected to have
rather long lifetimes, so minimizing the storage taken by them is a strong
secondary priority.

I propose the following mechanism to keep us honest about efficiency of script
processing and storage:  We should build both a generator and a parser for any
language we come up with.  The generator will take a Bravo file and produce an
equivalent InterDoc script for it; the parser will take an InterDoc script and
produce an equivalent Bravo file for it.  This gives us a large test set (all the
Bravo files in the world!) for measuring both processing speed and storage
requirements.

In particular, the generator and parser times give some idea of the overhead in
transforming between private and InterDoc representations, and the ratio of size
of InterDoc script to size of Bravo file can give us some idea of the space
efficiency. Perhaps just as importantly, it will allow us to take any Bravo file
and make it suitable for editing with another editor (e.g., Tioga when Scott's idea
of having it be able to ingest InterDoc scripts is realized).  If we parsed InterDoc
scripts into a parse-tree form as part of generating a Bravo file, the size of the
parse tree would represent a loose upper bound on the space that a private
encoding ought to take, assuming the parse-tree form would not take advantage
of any special knowledge about scripts.  These would be good numbers to have
in hand and would greatly increase my confidence in our work.

(If we had the manpower, we might also consider writing a translator to map
InterDoc scripts into InterPress masters to provide a further test of the standard(s)
as well as a transition mechanism for printing old but valuable Bravo files on
InterPress print servers.) 

Being able to sample a large number of Bravo files (or any other for which an
InterDoc generator/parser pair can be made) makes lowering the average entropy
of an InterDoc script more of an engineering and less a by-guess-and-by-golly
task (it has worked exceedingly well for Mesa byte codes, why not here?).

Your comments are earnestly solicited.

JGM
*start*
00832 00024 USt
Date: 8 June 1981 5:09 pm PDT (Monday)
From: Ayers.PA
Subject: Re: Some Meta issues about any InterDoc standard
In-reply-to: Mitchell's message of 8 June 1981 2:24 pm PDT (Monday)
To: Mitchell
cc: InterDoc↑
Reply-To: Ayers

Your proposed mechanism (parser/generator) is an interesting thought.  

One quickie keep-the-ball-rolling comment:

You are postulating, in your discussion on efficiency and size, that the Interdoc
standard (some straightforward encoding thereof) will be the format that
documents are normally stored in.  This is not yet clear to me.  I could believe,
for example, that Star, in 1984, might file documents in its internal format
(perhaps formally equivalent to the Interdoc standard, but in no sense an
"encoding") and be willing to supply an interchange version on demand. 

Bob


*start*
01218 00024 USt
Date: 8 June 1981 5:41 pm PDT (Monday)
From: Mitchell.PA
Subject: Re: Some Meta issues about any InterDoc standard
In-reply-to: Your message of 8 June 1981 5:09 pm PDT (Monday)
To: InterDoc↑.pa

If documents are not stored as InterDoc scripts, my arguments on efficiency and
size are certainly less important, and your arguments about Star documents
sound right.  What I also envision in 1984, however, is a cloud of
Ethernet-compatible text-editing systems able to send messages.  I don't expect
that cloud to be in the shape of a big X (for Xerox): it will have Wangs, IBMs,
and who knows what else in it.  Thus, many messages will travel as InterDoc
scripts and the time you spend waiting for one to be ingested so you can read it
will be important.  The same arguments apply to sending messages.

Assume there are N such product lines using InterDoc.  By analogy with the
argument that the value of a communications system goes up as the square of the
number of its subscribers, the value of an InterDoc standard goes up as N↑2.  I
expect that we will (after much blood and screaming) have a good standard.  So
efficiency matters because of the possibility of an N↑2 success disaster.

JGM
*start*
00233 00024 USt
Date: 9 June 1981 2:00 pm PDT (Tuesday)
From: Ayers.PA
Subject: Re: Some Meta issues about any InterDoc standard
In-reply-to: Mitchell's message of 8 June 1981 5:41 pm PDT (Monday)
To: Mitchell

Sounds fair.

*start*
01023 00024 USt
Date: 9 June 1981 5:10 pm EDT (Tuesday)
From: Lampson.PA
Subject: Re: Some Meta issues about any InterDoc standard
In-reply-to: Mitchell's message of 8 June 1981 2:24 pm PDT (Monday)
To: Mitchell
cc: InterDoc↑

You should be aware that compactness was not a major consideration in the
design of the Interpress interchange encoding.  This encoding probably costs at
least a factor of two in space over would could be easily achieved, and much
more in some cases.

I think your ideas for translating exisitng documents into a proposed
representation are excellent.  Sil and Draw files (especially Sil files) might also be
candidates.

A document design with styles (which I strongly favor) raises a problem for
translation from existing documents which do not have styles.  This problem can
be solved surprisingly well by deducing the styles in some simple-minded way
from observed repetitions of the same properties.  This was in fact done about
three years ago by Bill Maybury for Bravo documents.

*start*
00453 00024 USt
Date: 9 June 1981 3:09 pm PDT (Tuesday)
From: Mitchell.PA
Subject: Re: Some Meta issues about any InterDoc standard
In-reply-to: Lampson's message of 9 June 1981 5:10 pm EDT (Tuesday)
To: Lampson
cc: Mitchell

Roger on the issue of styles, Sil, and Draw.  I am currently building a first draft
of a syntax for documents and intend it to include styles.  It should be ready in
about a week (he says with eternal optimism!).

Jim

*start*
02907 00024 USt
Date: 16 June 1981 3:21 pm PDT (Tuesday)
From: Mitchell.PA
Subject: Mitchell's whiteboard as of Friday, June 12
To: Horning
cc: Mitchell

Basically, the interdoc specifications can be viewed as having three levels. At
level 0, the lowest and most general level, there is a very general, 1980-style
S-expression syntax with no semantics.  At level 1, some simple semantics are
associated with various of the syntactic constructs (e.g., definitions,
meta-properties, internal references, node identity, and node types).  Most of the
specifications actually lie at level 2, where properties are specified in groups and
given semantics (e.g., Star documents, Star text documents, Star graphics, 860 text,
Tioga nodes, Tioga paragraphs, etc.).

Most of our discussion centered around the parse-tree representation of an
InterDoc script as a model.

Level 0:

In the basic document-expression (D-expr) syntax, the fundamental literals are
numbers, strings, type names, records, and sequences.  Numbers and strings are
taken directly from InterPress. 

A type name is simply an identifier.  A record has the form 
   record ::= [identifier] "[" literal* "]"

A sequence has the form
   sequence ::= [identifier] "{" literal* "}"
Conceptually, a sequence is supposed to be homogeneous, although there is no
way to specify this in the basic grammar.

We think of the components of a record or sequence as an ordered set of
properties associated with a node whose type is given by the (optional) identifier
prefixing the record or sequence.

Level 1:

We tried to identify a set of meta-properties of properties.  This set must
ultimately converge to a small, constant set.  Our first candidates are

inherited | overriding | local	-- no default; pick one
ordered | unordered			-- default is ordered
separate | concatenable		-- default is separate
integral | deletable | ignorable	-- default is integral; a deletable property is
one which caches information - it should be deleted if the node to which it is
attached is altered; an ignorable property is like a hint - an editor can safely
ignore it if it doesn't understand it, but if it does understand it, it must check
the hint's validity.

We noticed that one could use the level 0 syntax to describe D-exprs in which
the basic editable information was in a fringe of the parse tree or one in which
it was buried in the tree and not just in the fringe.

Example (a Laurel message):

TextDoc[
  origin[Laurel]
  margins{85 530}
  msgFields[from[<Mitchell>] to[<Horning>] date{1981 6 16}
     cc{<Lampson|Mitchell|InterDoc↑.pa>}
     Subject[<Mitchell's whiteboard as of Friday, June 12>]
     body[<Basically, the interdoc ...Tioga paragraphs, etc.).> <...> ... <...>]
     authenticated
     ]
  ~justified
  font[TimesRoman 10]
  ]

Let me know what I have forgotten, and let's get together tomorrow a.m. to
discuss this some more.

Jim

*start*
01882 00024 USt
Date: 16 June 1981 4:30 pm PDT (Tuesday)
From: Horning.pa
Subject: Re: Mitchell's whiteboard as of Friday, June 12
In-reply-to: Your message of 16 June 1981 3:21 pm PDT (Tuesday)
To: Mitchell
cc: Horning

Jim,

That summarizes most things pretty well.  Let me just mention a couple of points
where our models may not yet have matched:

-I tend to think of a script as containing four sorts of information:
	"content" (e.g., the character string itself)
	"structure" (e.g., hierarchical + internal links)
	"properties" (e.g., fonts, constraints)
	"definitions" (e.g., styles)

It feels like you are trying to subsume everything under either "structure" or
"properties," primarily by making content a property like anything else.  I was
trying to make "content" syntactically distinct by calling it the "fringe."  We
should at least discuss the relative merits of the two forms.

-Since Friday I have come to wonder how many of the meta-properties can be
handled by "definitions"/styles.  In fact, how many properties are to be treated
may well be a function of the environment (target editor and purpose for
editing), rather than the property, per se.  (E.g., for some purposes, it is fine to
display footnotes "inline" while editing; for others, they might be suppressed
entirely.  The choice is surely not intrinsic in the script or the property.)  If a
Style is a Property -> Property* mapping, then we might imagine for each editor
a small library of styles of dealing with foreign documents, or prefixing a
document with such a style.  This focusses attention on whether there are
properties that we expect to be widely understood (e.g., this node contains script
text that remains unparsed), rather than on meta-properties.

There is a Cedar coordinators meeting at 11, so we should get together enough
before that to have time for discussion.

Jim H.

*start*
05894 00024 USt
Date: 17 June 1981 5:18 pm PDT (Wednesday)
Sender: Horning.pa
Subject: Interdoc Proposal, 1 - Metaconsiderations
From: Horning, Mitchell
To: Interdoc↑

[Recall that at our last meeting Mitchell/Horning and Reid agreed to produce
definite proposals by today, so there would be a tangible basis for discussion and
determining areas of agreement and disagreement at our meeting Friday. 
Mitchell and I have decided to present separately our conclusions about
requirements and constraints on Interdoc, and a concrete proposal illustrating that
it is feasible to satisfy them.  This message presents the former.  You can expect a
companion message from Mitchell with the latter.  -- Jim H.]

Although we do not need (thank God) to define a "Standard Editor" nor
standards for editors, Interdoc will require us to take a fairly strong position on
what constitutes "an editable document."  This note tries to explore that question
at a fairly abstract level.

 If we restrict scripts to ASCII characters (or anything similar) then any editor
prepared to deal with arbitrary strings can be used to edit a script (much like ↑Z
Unformatted Get in Bravo), but it will not be easy to use an editor in this
fashion to transform valid scripts into valid scripts that correspond to intended
changes.  However, where a script is "mostly" understood by a particular editor,
dealing with the remaining parts as an uninterpreted string/D-expression may
not be a bad escape hatch.

 In line with the requirement that, for each editor, there be an "information
lossless" transformation to Interdoc and back, there must be provisions in Interdoc
for the representation of all kinds of information maintained by "reasonable"
editors.  We don't think it is reasonable be very constraining about what that
information will be; at best, we can hope to provide a general and open-ended
framework in which it can be placed.  In surveying the editors we know and
love/hate, some or all of them maintain information in the following four broad
categories, which we will discuss in turn:
	"content" (e.g., a character string, a set of spline coefficients)
	"structure" (e.g., word/sentence/paragraph ... , line/page/signature ...)
	"properties" (e.g., fonts, constraints)
	"definitions" (e.g., styles)
Interdoc should not attempt to replace these examples with complete
enumerations.  It must be prepared to accept scripts with, for example, completely
different hierarchies (e.g., macro/cell/chip/board ...).

 Content: Clearly text and graphics are common special cases, and we should
cater to them as well as Interpress does.  Indeed, in this area the intersection of
requirements with those of Interpress is so high that we should strive for
consistency.  (It's presumably not too late for small additions to Interpress, if we
discover a need for them.)  It may be sensible to restrict content to literals
(although not conversely!).

 Structure: A general structure, of which all the editors we know use special
cases, is the labelled directed graph.  (We can quibble about whether the content
should be at the nodes or on the arc labels.)  However, there are two
specializations of general graphs that will be so common in practice that they
should be treated specially:
	trees--most editors that support any structure at all have a "dominant"
		hierarchy that maps well into trees (although they use these trees
		to represent different hierarchies).  We need a good linear notation
		for these trees (D-expressions?).
		In the case of multiple hierarchies, the "dominant" one will
		certainly be the one used to control the scopes of properties and
		definitions (i.e., we consider some form of block structure to be a
		practical necessity).
	sequences--the most important, and most frequent, relationship between
		pieces of content is logical adjacency, which should be
		representable by textual juxtaposition in the script.  This implies
		a less compact notation for sets, where order is insignificant, but
		they are less heavily used.
Structure beyond that contained in the "dominant" hierarchy will need to be
represented explicitly.  My going-in position is that explicit "labels" in the script
will suffice for this purpose.  We are undecided whether such labels belong at
the syntactic level, or whether they can be pushed off onto "properties."

 Properties: Not much can be said in general.  It must be possible to
unambiguously parse scripts without knowing the meanings of any of the
properties it refers to.  There should be standard syntax for referring to
properties.  To first approximation, property names seem very much like "free
variables" in logic and lambda calculus.

There will be a small set of properties that every conforming parser/interpreter
can be expected to interpret; we may also need to define some standard
meta-properties that make it possible for an editor to deal reasonably with
properties it does not understand.

 Definitions: Either in the basic syntax, or at a very low level, we need to
include a mechanism for introducing definitions with restricted scope.  This
should have approximately the semantics (if not the syntax) of lambda
expressions, i.e., it should bind a property name to a property expression.  We
expect that scope will be (primarily) controlled by the tree structure of the script;
it is still a matter for debate whether standard Algol/lambda calculus rules (inner
bound variables are not affected by an outer binding) is adequate for all the
important cases.

We expect this mechanism to be used by sophisticated editors for such things as
styles.  Some scripts will come with a prefix containing non-standard property
definitions that are global to the document.  There may be standard libraries
containing definitions that allow complex properties to be edited in terms of
properties understood by simpler editors.

*start*
05311 00024 USt
Date: 18 June 1981 4:19 pm PDT (Thursday)
From: Mitchell.PA
Subject: A first cut at a document syntax - for your comment before I send it
To: Horning.PA
cc: Mitchell

I envision an InterDoc standard as having three levels.  Level 0, the lowest and
most general level, provides a general, 1980s-style S-expression syntax.  Level 1
associates some simple semantics with various of the syntactic constructs (e.g.,
definitions, scope, node identity, node types).  Most of the specifications actually
lie at level 2, where document "types" are developed and given semantics (e.g.,
Star documents, Star text documents, Star graphics, 860 text, Tioga nodes, Laurel
messages, etc.).

Level 0:

In the basic document-expression (D-expr) syntax, the fundamental literals are
numbers, strings, identifiers, labels (for internal references), records, and
sequences.  Numbers and strings are taken directly from InterPress. 

Rather than work out a set of syntax productions, I have produced an example of
an (imaginary) Laurel message embellished with paragraphs, fonts, etc.  I hope
the commentary to the right of the example will aid understanding.  The first
version of the document uses a number of abbreviating mechanisms for common
cases; these are listed in the notes following the example, and, for comparison, a
fully expanded version follows those notes:

TextDoc | Message			-- can be viewed as TextDoc or Message
   [
   justified←F				-- "←" means overridable default
   font←[face=TimesRoman size=10 style=n]	-- keyword notation
   margins←[2540 19050]			-- positional notation
   leading[x=1 y=1]
   MsgInfo
      [					-- keyword notation, "="s elided
      pieces[hdr=#1 body=#2]		-- internal references to labelled parts
      date{1981 6 18}
      from=<Mitchell.PA>			-- collected by Laurel; could use refs?
      subject=<A Sample Document Syntax>
      to=<Horning.PA>
      cc=<Mitchell,InterDoc↑.pa>
      authenticated				-- authenticated=T
      ]
   CONTENTS=
      [
      1: Section				-- "1" is the label of this section
        [
        CONTENTS=Paragraph[font[style=bold]]	-- distributed over sequence  
          {				-- a sequence of paragraphs all in boldface
          <Date: 18 June 1981 9:18 am PDT (Thursday)>
          <From: Mitchell.PA>
          <Subject: A Sample Document Syntax>
          <cc: Mitchell, InterDoc↑.pa>
          }
      2: Section
        [
        leading[y=6]				-- override outer y leading
        CONTENTS=Paragraph
          {
          <text of paragraph>
          <text of paragraph>
          <text of paragraph>
          <text of paragraph>
          }
        ]
      ]
   ]

Notes:

x←value means that x has that value in the absence of any surrounding
assignment of the form x=value

"=" is the default binding and means local value wins (Algol scope).

Can elide "=" when followed by "[" or "{".

x=T can be replaced by x; x=F can be replaced by ~x

The contents field of a node can be stated explicitly as "CONTENTS=" or can be
given as the last literal of a node (if last, it makes semantics, inheritance, etc.
easier for an editor ingesting a script)

When each field of a structured literal has a distinct type, the type names can
be used to identify the fields.  This has been used heavily in the example.


Here is the fully expanded version of the above example:

DOCUMENT=TextDoc | Message
   [
   justified←F,
   font←font[face=TimesRoman,size=10,style=n],
   margins←margins[left=2540,right=19050],
   MsgInfo=MsgInfo
      [
      pieces=pieces[hdr=#1,body=#2],
      date=date[yy=1981,mm=6,dd=18],
      from=from<Mitchell.PA>,
      subject=subject<A Sample Document Syntax>,
      to=to<Horning.PA>,
      cc=cc<Mitchell,InterDoc↑.pa>,
      authenticated=T
      ]
   CONTENTS=
      [
      leading=leading[x=1],
      1: Section=Section
        [
        leading=leading[y=1],
        CONTENTS=Paragraph[font.style=bold] 
          {
          <Date: 18 June 1981 9:18 am PDT (Thursday)>,
          <From: Mitchell.PA>,
          <Subject: A Sample Document Syntax>,
          <cc: Mitchell, InterDoc↑.pa>
          }
      2: Section=Section
        [
        leading=leading[y=6],
        CONTENTS=Paragraph
          {
          <text of paragraph>,
          <text of paragraph>,
          <text of paragraph>,
          <text of paragraph>
          }
        ]
      ]
   ]

Level 1:

We tried to identify a set of meta-properties of properties.  This set must
ultimately converge to a small, constant set.  Our first candidates are

inherited | overriding | local	-- no default; pick one
ordered | unordered			-- default is ordered
separate | concatenable		-- default is separate
integral | deletable | ignorable	-- default is integral

An integral property is one which cannot be thrown away without losing
important information (e.g., the textual content of a node).  A deletable property
is one which caches information - it should be deleted if the node to which it is
attached is altered; an ignorable property is like a hint - an editor can safely
ignore it if it doesn't understand it, but if it does understand it, it must check
the hint's validity.

See Jim Horning's message about InterDoc meta-thoughts for a discussion of scope
of properties, dominant structure, and definitions.

JGM
*start*
06333 00024 USt
Date: 19 June 1981 9:37 am PDT (Friday)
From: Mitchell.PA
Subject: A first cut at a document syntax
To: InterDoc↑.PA

I envision an InterDoc standard as having three levels.  Level 0, the lowest and
most general level, provides a general, 1980s-style S-expression syntax.  Level 1
associates some simple semantics with various of the syntactic constructs (e.g.,
definitions, scope, node identity, node types).  Most of the specifications actually
lie at level 2, where document "types" are developed and given semantics (e.g.,
Star documents, Star text documents, Star graphics, 860 text, Tioga nodes, Laurel
messages, etc.).

Level 0:

In the basic document-expression (D-expr) syntax, the fundamental literals are
numbers, strings, identifiers, labels (for internal references), records, and
sequences.  Numbers and strings are taken directly from InterPress. 

Rather than work out a set of syntax productions, I have produced an example of
an (imaginary) Laurel message embellished with paragraphs, fonts, etc.  I hope
the commentary to the right of the example will aid understanding.  The first
version of the document uses a number of abbreviating mechanisms for common
cases; these are listed in the notes following the example, and, for comparison, a
fully expanded version follows those notes:

TextDoc | Message			-- can be viewed as TextDoc or Message
   [
   justified←F				-- "←" means overridable default
   font←[face=TimesRoman size=10 style=n]	-- keyword notation
   margins←[2540 19050]			-- positional notation
   leading[x=1 y=1]
   MsgInfo
      [					-- keyword notation, "="s elided
      pieces[hdr=#1 body=#2]		-- internal references to labelled parts
      date{1981 6 18}
      from=<Mitchell.PA>			-- collected by Laurel; could use refs?
      subject=<A Sample Document Syntax>
      to=<Horning.PA>
      cc=<Mitchell,InterDoc↑.pa>
      authenticated				-- authenticated=T
      ]
   CONTENTS=
      [
      1: Section				-- "1" is the label of this section
        [
        CONTENTS=Paragraph[font[style=bold]]	-- distributed over sequence  
          {				-- a sequence of paragraphs all in boldface
          <Date: 18 June 1981 9:18 am PDT (Thursday)>
          <From: Mitchell.PA>
          <Subject: A Sample Document Syntax>
          <cc: Mitchell, InterDoc↑.pa>
          }
      2: Section
        [
        leading[y=6]				-- override outer y leading
        CONTENTS=Paragraph
          {
          <text of paragraph>
          <text of paragraph>
          <text of paragraph>
          <text of paragraph>
          }
        ]
      ]
   ]

Notes:

x←value means that x has that value in the absence of any surrounding
assignment of the form x=value

"=" is the default binding and means local value wins (Algol scope).

Can elide "=" when followed by "[" or "{".

x=T can be replaced by x; x=F can be replaced by ~x

The contents field of a node can be stated explicitly as "CONTENTS=" or can be
given as the last literal of a node (if last, it makes semantics, inheritance, etc.
easier for an editor ingesting a script)

When each field of a structured literal has a distinct type, the type names can
be used to identify the fields.  This has been used heavily in the example.


Here is the fully expanded version of the above example:

DOCUMENT=TextDoc | Message
   [
   justified←F 
   font←font[face=TimesRoman size=10 style=n] 
   margins←margins[left=2540 right=19050] 
   MsgInfo=MsgInfo
      [
      pieces=pieces[hdr=#1 body=#2] 
      date=date[yy=1981 mm=6 dd=18] 
      from=from<Mitchell.PA> 
      subject=subject<A Sample Document Syntax> 
      to=to<Horning.PA> 
      cc=cc<Mitchell InterDoc↑.pa> 
      authenticated=T
      ]
   CONTENTS=
      [
      leading=leading[x=1] 
      1: Section=Section
        [
        leading=leading[y=1] 
        CONTENTS=Paragraph[font.style=bold] 
          {
          <Date: 18 June 1981 9:18 am PDT (Thursday)> 
          <From: Mitchell.PA> 
          <Subject: A Sample Document Syntax> 
          <cc: Mitchell  InterDoc↑.pa>
          }
      2: Section=Section
        [
        leading=leading[y=6] 
        CONTENTS=Paragraph
          {
          <text of paragraph> 
          <text of paragraph> 
          <text of paragraph> 
          <text of paragraph>
          }
        ]
      ]
   ]

Level 1:

At this level we must talk about how and when properties are inherited.  In
general, this is a merging operation since many properties will be acquired from
"styles" by a level of indirection.  In fact, a style has the same syntax as any
node (except that it needs to be defined).

One might ask whether styles should be lambda expressions so that a document
node could particularize a style to better match constraints of its own.  I think
the answer is "no", for the following reason:  In general, a node will have a
number of styles, and we are interested more in combining the sets of properties
derived from the styles than by parametrizing each one, because after
parametrization and expansion we still have to somehow combine the properties. 
Let's just have one mechanism, the combining rule, rather than two.

We tried to identify a set of meta-properties of properties.  This set must
ultimately converge to a small, constant set.  Our first candidates are

inherited | overriding | local	-- default is inherited
integral | deletable | ignorable	-- default is integral

Normally, Algol scope rules hold for inheritance, and a local definition of some
property overrides a less local one.  However, it seems desirable also to allow
some properties in an outer scope to override a local definition.  Finally, some
properties are meant to be local only and not to be imported into subnodes in the
dominant hierarchy

An integral property is one which cannot be thrown away without losing
important information (e.g., the textual content of a node).  A deletable property
is one which caches information - it should be deleted if the node to which it is
attached is altered; an ignorable property is like a hint - an editor can safely
ignore it if it doesn't understand it, but if it does understand it, it must check
the hint's validity.

See Jim Horning's message about InterDoc meta-thoughts for a discussion of scope
of properties, dominant structure, and definitions.

JGM
*start*
07051 00024 USt
Date: 19 June 1981 2:52 pm PDT (Friday)
From: Mitchell.PA
Subject: A first cut at a document syntax - slightly revised
To: InterDoc↑.PA

I envision an InterDoc standard as having three levels.  Level 0, the lowest and
most general level, provides a general, 1980s-style S-expression syntax.  Level 1
associates some simple semantics with various of the syntactic constructs (e.g.,
definitions, scope, node identity, node types).  Most of the specifications actually
lie at level 2, where document "types" are developed and given semantics (e.g.,
Star documents, Star text documents, Star graphics, 860 text, Tioga nodes, Laurel
messages, etc.).

Level 0:

In the basic document-expression (D-expr) syntax, the fundamental literals are
numbers, strings, identifiers, labels (for internal references), records, and
sequences.  Numbers and strings are taken directly from InterPress. 

Rather than work out a set of syntax productions, I have produced an example of
an (imaginary) Laurel message embellished with paragraphs, fonts, etc.  I hope
the commentary to the right of the example will aid understanding.  The first
version of the document uses a number of abbreviating mechanisms for common
cases; these are listed in the notes following the example, and, for comparison, a
fully expanded version follows those notes:

TextDoc | Message			-- can be viewed as TextDoc or Message
   (
   justified←F				-- "←" means overridable default
   font←(face=TimesRoman size=10 style=n)	-- keyword notation
   margins←(2540 19050)			-- positional notation
   leading(x=1 y=1)
   MsgInfo
      (					-- keyword notation, "="s elided
      pieces(hdr=#1 body=#2)		-- internal references to labelled parts
      date{1981 6 18}
      from=<Mitchell.PA>			-- collected by Laurel; could use refs?
      subject=<A Sample Document Syntax>
      to=<Horning.PA>
      cc=<Mitchell,InterDoc↑.pa>
      authenticated				-- authenticated=T
      )
   CONTENTS=
      (
      1: Section				-- "1" is the label of this section
        (
        CONTENTS=Paragraph(font(style=bold))	-- distributed over sequence  
          {				-- a sequence of paragraphs all in boldface
          <Date: 18 June 1981 9:18 am PDT (Thursday)>
          <From: Mitchell.PA>
          <Subject: A Sample Document Syntax>
          <cc: Mitchell, InterDoc↑.pa>
          }
      2: Section
        (
        leading(y=6)				-- override outer y leading
        CONTENTS=Paragraph
          {
          <text of paragraph>
          <text of paragraph>
          <text of paragraph>
          <text of paragraph>
          }
        )
      )
   )

Notes:

I assume that an editor that ingests such a script understands all of the syntax
and low-level semantics and has definitions available to it for the rest (e.g.,
TextDoc; see discussion under Level 2, below).

x←value means that x has that value in the absence of any surrounding
assignment of the form x=value.

"=" is the default binding and means local value wins (Algol scope).

Can elide "=" when followed by "(" or "{".

x=T can be replaced by x; x=F can be replaced by ~x.

The contents field of a node can be stated explicitly as "CONTENTS=" or can be
given as the last literal of a node (if last, it makes semantics, inheritance, etc.
easier for an editor ingesting a script).

When each field of a structured literal has a distinct type, the type names can
be used to identify the fields.  This has been used heavily in the example.

#n is an internal reference to the node labelled with the UID n (i.e., preceded
by "n:")  This is the only way to link things within the document other than
the linking implicit in the hierarchy.

Here is the fully expanded version of the above example:

DOCUMENT=TextDoc | Message
   (
   justified←F 
   font←font(face=TimesRoman size=10 style=n) 
   margins←margins(left=2540 right=19050) 
   MsgInfo=MsgInfo
      (
      pieces=pieces(hdr=#1 body=#2) 
      date=date(yy=1981 mm=6 dd=18) 
      from=from<Mitchell.PA> 
      subject=subject<A Sample Document Syntax> 
      to=to<Horning.PA> 
      cc=cc<Mitchell InterDoc↑.pa> 
      authenticated=T
      )
   CONTENTS=
      (
      leading=leading(x=1) 
      1: Section=Section
        (
        leading=leading(y=1) 
        CONTENTS=Paragraph(font.style=bold) 
          {
          <Date: 18 June 1981 9:18 am PDT (Thursday)> 
          <From: Mitchell.PA> 
          <Subject: A Sample Document Syntax> 
          <cc: Mitchell  InterDoc↑.pa>
          }
      2: Section=Section
        (
        leading=leading(y=6) 
        CONTENTS=Paragraph
          {
          <text of paragraph> 
          <text of paragraph> 
          <text of paragraph> 
          <text of paragraph>
          }
        )
      )
   )

Level 1:

At this level we must talk about how and when properties are inherited.  In
general, this is a merging operation since many properties will be acquired from
"styles" by a level of indirection.  In fact, a style has the same syntax as any
node (except that it needs to be defined).

One might ask whether styles should be lambda expressions so that a document
node could particularize a style to better match constraints of its own.  I think
the answer is "no", for the following reason:  In general, a node will have a
number of styles, and we are interested more in combining the sets of properties
derived from the styles than by parametrizing each one, because after
parametrization and expansion we still have to somehow combine the properties. 
Let's just have one mechanism, the combining rule, rather than two.

We tried to identify a set of meta-properties of properties.  This set must
ultimately converge to a small, constant set.  Our first candidates are

inherited | overriding | local	-- default is inherited
integral | deletable | ignorable	-- default is integral

Normally, Algol scope rules hold for inheritance, and a local definition of some
property overrides a less local one.  However, it seems desirable also to allow
some properties in an outer scope to override a local definition.  Finally, some
properties are meant to be local only and not to be imported into subnodes in the
dominant hierarchy

An integral property is one which cannot be thrown away without losing
important information (e.g., the textual content of a node).  A deletable property
is one which caches information - it should be deleted if the node to which it is
attached is altered; an ignorable property is like a hint - an editor can safely
ignore it if it doesn't understand it, but if it does understand it, it must check
the hint's validity.

Level 2:

This is where it gets hard.  The bulk of the interchange standard will lie in the
definitions of node types, i.e., what types there are, what their properties are,
and what the meta-properties of those properties are.

See Jim Horning's message about InterDoc meta-thoughts for a discussion of scope
of properties, dominant structure, and definitions.

JGM
------------------------------------------------------------

*start*
15895 00024 USt
Mail-from: Arpanet host SU-SCORE rcvd at 22-JUN-81 0740-PDT
Mail-from: ARPANET site SU-AI rcvd at 21-Jun-81 2335-PDT
Date: 21 Jun 1981 23:35:13-PDT
From: reid at Shasta
To: InterDoc↑@Parc-Maxc
Subject: thoughts on InterDoc
Cc: reid, reid@Score
Remailed-date: 22 Jun 1981 0739-PDT
Remailed-from: Brian K. Reid <CSL.BKR at SU-SCORE>
Remailed-to: InterDoc↑ at PARC-MAXC

Here is the current draft of my thoughts on InterDoc. I am sorry to 
tell you that it has been done in Scribe rather than in Laurel or something
(I had to work from home) so that you cannot read it easily on your screens.
In case you would like to read it uneasily on your screens, I have included
here a draft Scribed in such a way as to remove all font information. You
can get the real thing by printing the file [Maxc]<Reid>IPthoughts.press,
making sure that you use Maxc's PRESS command to do it with unless you are
an FTP wizard. 

I might be a few minutes late for tomorrow's meeting, as I am going to
be driving back from Sunnyvale with a load of wire for Stanford and I
don't know how to predict the rush-hour traffic.

Brian


                             THOUGHTS ON INTERDOC
                                 BRIAN K. REID
                                 21 JUNE 1981

                               Table of Contents
1. Principles
2. The lowest Syntactic level
     2.1. The basic data types
     2.2. Basic structure
2.3. Syntactic representation
     2.4. Links
3. Some Semantics
     3.1. Default properties and inheritance
     3.2. Object utility properties
     3.3. Object origination properties
4. Definitions and Macros
5. Higher-level constructs


1. Principles
  The  Interdoc  format  is a language for the representation of documents. The
language itself is an extremely simple syntax for the representation of certain
concepts along with some simple rules for attaching semantics.   This  document
summarizes  my  current thinking about what Interdoc should look like. Remember
as you read this that Interdoc is not supposed to be intelligible to humans; it
is the job of a text editor (or prettyprinter) to interpret for you.

  At this stage in the design of Interdoc, we are thinking only in terms of the
interchange representation, which is to be entirely in the ISO  character  set.
ISO  looks  like ASCII, but has no "[", "]", "↑", or " " in it. Xerox character
sets also look like ASCII, but bungle  the  "←",  and  "`"  characters;  it  is
therefore not a good idea to use them either.

  This  outline represents my mental picture of what we have all been designing
in committee. Various pieces of  that  picture  were  painted  by  Horning  and
Mitchell,  and  I  have  fitted  them into my syntax but left them more or less
alone.

  This syntax looks a lot like some dialect of Lisp if  you  squint  your  eyes
right.  Please  don't  shy  away from it on that account. Regardless of how you
feel about Lisp as a programming tool, you as a computer scientist  must  admit
that  its expression language is unparalleled for the combination of simplicity
and precision that it provides.

2. The lowest Syntactic level
  At the lowest syntactic level we need to represent text characters, integers,
reals, and the relationships among them. Although we can  represent  characters
with  integers,  it  will  make  a more useful vestigial case for simple-minded
editors if a character-code representation is used, so we pick ISO.



2.1. The basic data types
  Let a scalar character be represented by an ISO alphanumeric, or by an equals
sign preceding an integer. Thus, "a" and "A"  are  scalar  characters,  as  are
"=97  "  and  "=65  ".  The  specific  set  of  characters that can be directly
represented by their ISO graphic should be fairly limited, so that quoting  and
nesting  problems  are  avoided.{The  "="  escape  convention  is borrowed from
Interpress, but I don't like the concatenation rule in Interpress because it is
not context-free; in the Interpress string <Alpha =24 25  25=>,  which  can  be
abbreviated  as  <Alpha  =24  25  25> it is not possible to jump in to a random
place in the middle of the  string  and  break  it  into  two  strings  without
searching  all  the  way back to the beginning of the string. So I propose that
this string be represented in Interdoc as <Alpha  =24  =25  =25  >.}  There  is
exactly one space after the escaped number.

  Let  a  scalar  integer  be  represented  the same way it is in your favorite
programming language: a sequence of decimal digits  optionally  prefixed  by  a
sign.  Let  a  scalar  real  be represented the same way it is in your favorite
programming language. I won't give a  rigorous  definition  here.  We  add  the
restriction  not  found  in  your  favorite  programming language that a number
(integer or real) must be followed by a space. This is so we can have a  simple
rule for constructing vectors out of scalars by ordinary concatenation.

  Let  a vector of characters, also called a string, be a group of zero or more
adjacent scalar characters surrounded by <pointy brackets>.  The pointy bracket
character can of course be represented with its numeric code, so  there  is  no
problem  putting that character (or any other) in a string. Thus <Hi mom!> is a
string, and so is <Hi  mom=33 >.

  Let a vector of integers be a group of zero or more adjacent scalar  integers
surrounded  by  {braces}.  Thus  {1  0 3333 67 } is a vector of integers. Let a
vector of numbers be a group of zero or more adjacent scalar numbers surrounded
by {braces}. Thus {1 0.0 3333 67 } is a vector of reals. (We might be  able  to
get  away  with the single construct "vector of numbers" and rely on properties
to denote integer-ness; must think further.)



2.2. Basic structure
  The Interdoc format must be able to represent data and structure.  The  kinds
of structure that we need to be able to represent are:

   - Regions. A region is a (possibly empty) piece of the document. All of
     the  document  data must be in some region. Regions are the mechanism
     for  representing   containment;   therefore   regions   nest.   Non-
     hierarchical nesting of regions is not permitted; e.g. something like
     the  Scribe  construct  @b[foo@i(baz]frozz)  is not permitted. (note:
     this restriction might be too limiting, but it is certainly a  useful
     one. Must discuss.)

   - Points.  A  point  is  a  marker  that  has  no intrinsic size; it is
     attached to some datum. Although a point could be  represented  as  a
     null  region  with  a certain kind of property, I believe that points
     are important enough to deserve special treatment.

   - Links. A link is a specification that some pair of {points or regions
     or links} has some directed relationship. I don't think we need to be
     able to link properties, q.v.  Although links  are  directed,  it  is
     possible  to build nondirectional links out of directional links, but
     not vice versa.

   - Properties. All regions,  points,  links,  and  properties  can  have
     properties.  A basic datum cannot have a property; but all basic data
     are in some region, and regions can have properties, so that is not a
     limitation.  A property is actually an  attribute/value  pair,  which
     makes  it  something  like  a  LISP property list that is one element
     long.



2.3. Syntactic representation
  Regions, points, and links must be represented in such a way  that  they  are
trivial  to  parse, and that properties can be unambiguously attached.  For the
purposes  of  this  representation,  we  must  introduce  the  concept  of   an
identifier.  No  semantics  at  all  attached;  an identifier is just a list of
alphanumeric characters (not necessarily even beginning with a  letter,  though
it  won't  hurt to add this restriction and it might save our ass later if some
sort of syntactic extension becomes needed. Avoid unnecessary generality.)  Let
the representation of a region be  a  parenthesized  list  beginning  with  the
identifier "region", and followed by anything at all:

    (region collection of regions, points, or basic data)

Thus, for example:

    (region <Hi mom!>)
    (region <Hi mom!> (region <nested region>) <back to =120 101= outer>)

  Marks are represented using exactly the same syntax as regions.  We can put a
mark  into  a  region  by  splitting the data to be marked into two pieces, and
placing the mark between them:

    (region <Hi > (mark zotchmark) <mom!>)

We can attach a property to a region by placing  a  property  mark  before  the
first datum in the region:

    (region (property color green) <Hi mom!>)

and in fact we can do the same thing with marks:

    (region (property spin up) <Hi > (mark (property color 7) zark) <mom!>)

We can attach a property to a property in the same way:

    (region (property (property 287 9) spin up) <Hi mom!>)

Syntactically  we could represent (property color green) as just (color green),
and let the definition of "color" as a property attribute be the  key,  but  we
don't  want  to force the processing program to do a symbol-table lookup on the
word "color" to discover that (color green) is just a property.

  In the interests of brevity, we should probably shorthand  this  notation  as
follows,

    (r (p (p 287) spin-up) <Hi mom!>)

though  it  will help our discussions on these matters if longer words are used
in examples.

  Even the dumbest sort of editor can  gracefully  skip  over  pieces  of  this
format  that  it does not understand. When the editor's import routine finds an
open parenthesis and an identifier that it does not understand. It drops into a
mode where it ignores all characters except open  and  close  parentheses,  and
stops  when  it  reaches  the  closing  paren  that  matches  the  one it found
initially. It is free to do this because the "(" and  ")"  characters  are  not
permitted in identifiers, numbers, or character strings.



2.4. Links
  Links  are  more  complicated than regions, marks, or properties because they
make a non-local reference. Part of the lowest-level  specification  for  links
must  therefore  be  some scheme for satisfying those references, in order that
there be no ambiguity about what they  link  to.  On  the  other  hand,  it  is
sometimes vital to be able to include an unbound link in a piece of a document,
so  that  when  it  is moved to a particular context its links will bind to the
appropriate places in its new context.  Unambiguousness  is  not  the  same  as
rigidity.

  Superficially,  though,  links  are  very  much like you'd expect them to be.
Ignoring for now all of the hard problems of non-local referencing, links  look
like  this:  A link has an origin and a destination. The origin of a link looks
like this:

    (region (link dest-name) <Hi mom!>)

which links the region containing the character string <Hi mom!> to a  symbolic
destination named dest-name.

  The destination part of a link is very similar, and looks like this:

    (region (hook dest-name) <Hi there, junior.>)

In essence, we provide a hook that matches the name referenced in some link.

3. Some Semantics
  The  semantics that are wired in to Interdoc have mostly to do with the rules
for handling and inheriting properties, for binding links,  and  for  providing
simple  rules  by  which  an  editor  can know which parts of a document it can
ignore and which parts it must be able to process.

  I am assuming the existence of a definition facility, though I haven't worked
out its details yet. I am imagining something of the form:

    (define color (property type property) (property inheritance local))
    (define subheading (property type region) (property style SH2))

These  define  a  property  named "color" and a certain kind of region called a
"subheading".



3.1. Default properties and inheritance
  (This  and  the  following  section  are  my  attempts  to  expand  upon  the
metaproperty  notion  in  the Horning/Mitchell note.)  We don't want to have to
re-specify the  complete  list  of  properties  attached  to  everything  in  a
document.  We  get  around  this  with  a set of defaults and a set of property
inheritance properties.

  When a property is  defined,  it  is  given  an  inheritance  property  named
"inherit",  whose  values  are  in  {inherited,  overriding, local}. Normally a
property would be inherited. An overriding  property  is  one  that  cannot  be
redeclared  in  anything  contained  inside the object given that property; for
example, a region might be defined as a figure region, and given  the  property
"keep  this  all  on  one  page".  That  property  would  be given the property
"overriding" so that nothing  nested  inside  the  figure  could  break  it  up
accidentally.  A  value  of  "local" means that this property value temporarily
overrides any inherited values, but that objects nested further inside this one
do not inherit that value. In California we spend a  lot  of  time  and  energy
worrying about property values.



3.2. Object utility properties
  The  property  named  "utility" can take on values from {integral, deletable,
ignorable}. An integral object is one that contains the  master  copy  of  some
data.  If  it  is deleted the document will be damaged and if it is ignored the
document will be incomplete. A deletable object is one that is a cached copy of
some derivative of other data. Deletable objects can always be regenerated, but
not  necessarily  without  expending  non-trivial  computation  resources.   An
ignorable object is one that you are free to ignore if you don't understand it,
but  that  you  are not permitted to delete.  Private henscratches left by some
editing program for its later re-use  would  be  ignorable  objects;  they  had
better  be  there, but they are useful only to certain programs. Ignorable non-
region objects include properties for color  of  text  (ignored  by  monochrome
printers)  and  marks  indicating  line  breaks  determined  by some particular
formatting program.



3.3. Object origination properties
  Some objects are placed in a document  by  a  specific  program,  and  marked
"ignorable"  so that all other programs can safely bypass it. Other objects are
placed in a document with the intention that they be looked at by  all  comers.
We  want  to  provide  a  mechanism  whereby  some editing agent can attach its
special stamp to an object, so that it and the world can know  how  the  object
came  to  be  there.  This is done with the "creator" property.  Values for the
"creator" property are arbitrary names, intended to be the name of the  program
(or  person,  in  the  case  of a reviewer adding annotations to something) who
claims responsibility. No attempt at validation or enforcement is to  be  made,
of course.

4. Definitions and Macros
  All  names  except the small set of built-in names (which we are not ready to
disclose to you) must be defined  either  by  a  (define...)  or  a  (macro...)
object.  It may be that the two end up being equivalent, except that I think it
will be safe for a macro object to have parameters (since we are so restrictive
w.r.t. delimiter syntax there will  be  none  of  the  usual  quoting  problems
associated  with  macros).  I'm not ready to provide details, but the rules for
expanding macros must be rigid and trivial, in order that macros be processable
by all readers. I'm pretty sure the MacLisp DefMacro function provides  exactly
the right qualities, but I can't find my MacLisp manual.

  A (define...) object works exactly like a Scribe @Define command, which is to
say  that it defines an aggregate property by combining other properties (which
can also be aggregated if you like). I can elaborate if people don't know  what
that means.

5. Higher-level constructs
  (watch this space)

*start*
01315 00024 USt
Date: 24 June 1981 4:07 pm PDT (Wednesday)
From: Mitchell.PA
Subject: Meetings 3a and 3b, 19 June 1981 and 22 June 1981
To: Interdoc.pa

Attendees: Bill Paxton (WHP), Robert Ayers (RA), Scott McGregor (SMcG),
Alan Perlis (AJP), Brian Reid (BR), Jim Horning (JJH), and Jim Mitchell (JGM)

Discussion at this (extended) meeting centered around proposals from Horning,
Mitchell, and Reid.

WHP:  Perhaps an Interdoc script should package all the text, bitmaps, etc. in one
area and all the structural information in a separate place in a script so that an
editor would never have to scan the contents (text, scanned images, etc.) in a
script on input or output, but only the structural information.  This suggestion
was not accepted because even the contents have to be in some ISO-compatible
form and have to be "scanned" anyway.

There was much discussion of parametrized macros versus combining styles.  The
only resolution of this was that we all agreed to call them "definitions".

JGM: It would be nice if one could have a general script-to-master converter
which would work so long as it had access to all the right InterDoc definitions
and InterPress dicitionaries. (slight worry about creeping grandiosity).

The next meeting is Friday, 26 June 1981, at 13:30 in the CSL Commons.

JGM
*start*
02828 00024 USt
Date: 22 Jun 1981 15:42 PDT
From: Horning at PARC-MAXC
Subject: Re: thoughts on InterDoc
In-reply-to: Reid's message of 21 Jun 1981 23:35:13-PDT
To: Reid
cc: Interdoc, Reid@Score

Brian,

I find myself largely in agreement with both your new material, and the
material you have incorporated from my earlier draft.  Let me just record the
points about which I have questions, while I still remember them.

I don't understand the business about requiring a blank after each number. 
Surely it is sufficient to require that the next character be non-alphanumeric,
and to allow a blank where this lexical requirement would otherwise be violated?

I have no strong preference between "point" and "mark"; both have some
undesirable graphic connotations.  But we should not switch haphazardly
between them.

I understood your link/hook example better than your abstract explanation as a
pair, which suggested to me that links would somehow exist outside the tree,
rather than within it.

I agree that we must distinguish between the name of a property and its value. 
Definitions are the mechanism for establishing connections between the two. 
Generally, we will be associating properties with regions "by name," within the
scope of some definition that supplies the value.  E.g.,
	(region emphatic <Boo!>)
rather than
	(region (property face bold) <Boo!>) .

I would rather not allow unbound links in Interdoc scripts.  Let us claim that the
semantics of merging two scripts is a problem for the text editor, not for
Interdoc.  E.g., some editors may interpret some properties as references to file
names, and perform some further linking on this basis, but each script should, at
a low level, be parsable on its own.  I think we could enjoin a many-one
link-hook association as part of our low-level standard, and require that hook
names be unique over entire scripts.

I'm still of two minds about inheritance.  It and various other "metaproperties" are
so context-dependent that I sometimes wonder whether they shouldn't simply be
handled by definitions, too.

[In California, property is seldom inherited, it is almost always sold at an inflated
value.]

Definitions will probably require parameters (sigh!).  As noted this morning,
some names will be "passed through" to Interpress, and interpreted as operators
(primitive or composed).  This is our ultimate escape hatch that allows editors to
traffic in unanticipated properties, as long as the printers have been told how
they are to be treated.

I think that the notion that a major chunk of the Interdoc semantics (inference
rules?) can be given by means of a uniform translation from scripts into
Interpress Masters (in the context of a given set of definitions) is a good one,
and may convert the 2N problem into a C + N epsilon problem.

Jim H.

*start*
01139 00024 USt
Date: 25 June 1981 10:49 am PDT (Thursday)
From: Ayers.PA
Subject: On "caches"
To: Interdoc
Reply-To: Ayers

Using the nomenclature of the last meeting, a "cache" is (semi-)private
information, placed in the interdoc source by a particular editor for its own
purposes; it is "accelerator" information that can be re-created if necessary, but
which is "true" if present.  A cache might contain computed line breaks.

What restrictions should be placed on caches?  One requirement is that an editor
has to delete the cache if he "invalidates" it, even though he does not know
what the data in the cache is based on.

  Suggestion one: a cache can only reflect local data -- that is data immediately
  within the region containing the cache.  This lets an editor delete the cache
  iff he alters the region.

  Suggestion two: a cache can only reflect data "here and below" -- data
  within the subtree rooted at the region containing the cache.  Now an editor
  has to delete all caches "above" a change.

Another thought: in any case, we should prohibit caches from containing, in
any guise, logical pointers.

Bob

*start*
00505 00024 USt
Date: 26 June 1981 10:48 am PDT (Friday)
From: Horning.pa
Subject: Re: On "caches"
In-reply-to: Your message of 25 June 1981 10:49 am PDT (Thursday)
To: Ayers
cc: Interdoc

Bob,

I think I am in favor of your Suggestion two, modified to prohibit all "non-local"
references (i.e., those outside the subtree).  This seems safe.  Global reference
caches will have to be attached to the global node, and invalidated on ANY
change by a non-comprehending editor.  Life is hard.

Jim H.

*start*
00260 00024 USt
Date: 26 June 1981 10:55 am PDT (Friday)
From: Reid.PA
Subject: Re: On "caches"
In-reply-to: Horning's message of 26 June 1981 10:48 am PDT (Friday)
To: Interdoc

I believe that it will be too restrictive to ban non-local references.


*start*
01387 00024 USt
Date: 26 June 1981 10:56 am PDT (Friday)
From: Mitchell.PA
Subject: Re: On "caches"
In-reply-to: Your message of 25 June 1981 10:49 am PDT (Thursday)
To: Interdoc

It seems to me that caching information will almost certainly involve logical
pointers, which I don't believe is an insuperable problem because they must stick
out syntactically anyway.

However, I am beginning to think that even considering a general mechanism
for caching information in an Interdoc script is a bad idea.  Here are my
reasons:  If the cached information is only understood by a few editors, it would
probably be better off being held in whatever private representations they use (I
intend here to distinguish between private encodings of the Interdoc standard,
which may have to have some familial resemblance to the standard, and private
representations of scripts, which have nothing to do with the standard).  If the
cached information is widely understood among different editing programs, then
it seems much less important that one worry about general invalidation schemes,
just as it is unimportant to find general ways of handling the side effects of
altering ANY widely understood information.

This opinion is uttered as a strawman for you to knock down.  I would like to
hear why we should bother about caches at all.

Yours in the interest of creeping simplicity,

JGM

*start*
00703 00024 USt
Date: 26 June 1981 11:11 am PDT (Friday)
From: Ayers.PA
Subject: Re: On "caches"
In-reply-to: Mitchell's message of 26 June 1981 10:56 am PDT (Friday)
To: Interdoc

"It seems to me that caching information will almost certainly involve logical
pointers, which I don't believe is an insuperable problem because they must stick
out syntactically anyway"

My point about hiding logical pointers (logical pointers; not "real" syntactic
ones) in the cached info was that one could, conceptually HIDE pointers there. 
For example, one could cache an integer n whose semantics is

  Of this regions siblings, exactly n are (rendered) taller than it.

That's what I wanted to ban.

Bob

*start*
01246 00024 USt
Date: 26 June 1981 11:15 am PDT (Friday)
From: Guibas.PA
Subject: "Caching" along
To: Interdoc

I agree with much of what JGM just said. The case for allowing private caches to
creep into InterDoc documents can only be made if it is in fact plausible that
several (at least more than one) InterDoc document editors will be able to use the
same cached information. With my current understanding of what might go into
such a cache, I believe this sharing to be very unlikely. Furthermore, in my
view, InterDoc is there mostly to facilitate interchange of documents among
different editors. I still expect that almost all documents in the world will be kept
around in private representations, and perhaps some in private InterDoc
encodings. As long as we are careful to make translations to/from the standard
relatively efficient, I see no great drawbacks to this position. Another way to say
this is that documents can be in different states of "fluidity". A document in
InterPress form is pretty rigid, a document in a private representation of some
editor is pretty fluid, and one in the InterDoc format is someplace in between.
JGM should appreciate this "degrees of binding" view.

Anyone else feel the same way?

	LJG

*start*
00607 00024 USt
Mail-from: Arpanet host CMU-10A rcvd at 29-JUN-81 1113-PDT
Date: 29 June 1981 1410-EDT (Monday)
From: Mary.Shaw at CMU-10A
To: Horning at PARC-MAXC, Mitchell at PARC-MAXC
Subject:  IDL and Diana documents
CC: David Lamb at CMU-10A
Message-Id: <29Jun81 141005 MS10@CMU-10A>

Jim and Jim,
I have asked David Lamb to send you the IDL and Diana documents.
IDL won't be ready for another month (or so), but Diana is available
now, and he'll send that soon.  Although the Diana document is
directed at Ada, it should provide enough of a flavor of IDL for
you to seen what's going on.
Mary