C2CLineNumbersDoc.tioga
Format of the line number stream
Stream is a header followed by a sequence of entries, followed by a trailer.
An entry is either a startentry or a stopentry
startentry := c-number startpos nextpos
c-number is line number in C file; c-number IN [1..65530]
Note that 0 is reaserved for the trailer, and, some further numbers at the end are reserved
startpos is corresponding start position in Mesa file; >0, must fit in 22 bits.
nextpos is corresponding start position + number of charactes in Mesa file; must fit in 22 bits.
startentry is 64 bits
Encoded as
16 bit c-number ! 1 bit 0 ! 1 bit reserved, 22 bit startpos ! 2 bit reserved ! 22 bit nextpos
stopentry := c-number startpos nextpos
Encoded as
16 bit c-number ! 1 bit 1 ! 1 bit reserved, 22 bit startpos ! 2 bit reserved ! 22 bit nextpos
(The first reserved bit is put aside for future denoting file for encoding on inlines...)
The trailer is 8 bytes, all zero.
The header is
16 bytes, "Positions 001 \000" (The 001 is a version number)
followed by module-name followed by version-stamp
module-name is number {info-bytes} {padding-bytes}
version-stamp is number {info-bytes} {padding}
The number is the number of bytes in info-bytes excluding padding.
padding is (number + 7) / 8 bytes, of undefined value.
For big endian target machines
A 22 bit number is encoded at 3 bytes: first byte use only 6 low order bits. Big endian byte order.
A 32 bit number is encoded in big endian byte order.
For little endian target machines
It is not yet defined; the experimental implementation will output bytes in same order as for big endian target machines.
Numbers describing C file lines are monotonic growing (non shrinking).
Numbers describing Mesa file positions are not necessarily monotonic.
An occuring position means that C2C did put code implementing the named position onto that C line. If several Mesa statements are started at the same spot, the line number stream will report several Mesa positions with the same C line number.
C2C keeps the right of moving simple initializations ahead and not issuing positions for it.
C2C keeps the right of issuing positions in the order of occurrence in the tree traversal.
C2C keeps the right of reordering code where necessary. Such reordering might be necessary to guarantee identical semantics of the generated C code and the Mesa source. However, such reordering is not necessary between separate non inlined statements and therefore might not occur if the front end generates position nodes only for save points.
The first line in the C file is numbered as line 1.
C2C generates startentries and stopentries correctly bracketed.
----------------------------------------------------------------
Date: Tue, 09 Oct 90 14:57:19 PDT
From: Christian P Jacobi:PARC:Xerox
Subject: C2C, line numbers, and, debugging proposal
To: Foote, Spreitzer, Theimer
Cc: Jacobi, Mimosa
Feedback, please.
Proposed Format for a line number output stream for C2C. The purpose of this line number stream is to represent a relative dense data structure which could be put directly into the object files. Removing the need for debuggers to inspect the C file. [Increase robustness, performance of building debugger table, and, reduce number of files actualy accessed for debugging (After temporary increase); First step on throwing C files away to save disk space...]
--------------------
Format of the line number stream
Stream is a header followed by a sequence of entries, followed by a trailer.
An entry starts with a number
If first number is > 0 and < 65535
then entry is pair of numbers
first number is line number in C file, second number (if>=0) is corresponding position in Mesa file.
If first number is <= 0, > 65535
Special case. Except for trailer not yet defined.
The trailer is 8 bytes, all zero.
The header is
16 bytes, "Positions 001 \000" (The 001 is a version number)
followed by module-name followed by version-stamp
module-name is number {info-bytes} {padding-bytes}
version-stamp is number {info-bytes} {padding}
The number is the number of bytes in info-bytes excluding padding.
padding is (number + 7) / 8 bytes, of undefined value.
Numbers are encoded in 4 bytes, signed, using the endian-ness of target.
Numbers describing C file lines are monotonic growing (non shrinking).
Numbers describing Mesa file positions are not necessarily monotonic.
An occuring position means that C2C did put code implementing the named position onto that C line. If several Mesa statements are started at the same spot, the line number stream will report several Mesa positions with the same C line number.
C2C keeps the right of moving simple initializations ahead and not issuing positions for it.
C2C keeps the right of issuing positions in the order of occurrence in the tree traversal.
C2C keeps the right of reordering code where necessary. Such reordering might be necessary to guarantee identical semantics of the generated C code and the Mesa source. However, such reordering is not necessary between separate non inlined statements and therefore might not occur if the front end generates position nodes only for save points.
The first line in the C file is numbered as line 1.
--------------------
Questions
1) Is it time to think about source positions in inline files?
2) Should we include information where statements end?
[2a) If so use entry with negative Mesa position (one's complement) ?]
3) Is anybody upset about the non perfect encoding of entries using 8 bytes? I thought simplicity to be more important then density.
Christian
--------
Date: Wed, 10 Oct 90 08:49:16 PDT
From: Mike Spreitzer:PARC:Xerox
Subject: Re: C2C, line numbers, and, debugging proposal
In-reply-to: "Christian P Jacobi's message of Tue, 09 Oct 90 14:57:19 PDT"
To: Christian P Jacobi
Cc: Foote, Spreitzer, Theimer, Mimosa
(1) Yes, since we are defining a new format, it is certainly time to think about ways that we might want it extended. I'm glad to see a format version number in the proposal, as this will make graceful extension easy and clean. About inlines: the stack the programmer wants to think about has one frame per procedure (and catch phrase) activation --- regardless of whether a procedure is declared inline or not. Thus, a given C statement might be implementing a subsequence of that sequence of stack frames. Therefore, we'll want to map a given C line number into a sequence of Mesa character positions.
(2) Several times I've wished I could tell where statements ended. But I'm not sure that's the object file's job. More important is just getting a parser that reports this information.
(3) Each entry will recieve significant fondling by Cirio: each entry is put in two tables (each one indexed for fast access in one of the two possible directsions). A few more instructions to decode a denser encoding would be a relatively minor execution time cost. Executing these extra instructions for a denser encoding would clearly be faster than waiting for the lengthier I/O of the mediocre encoding in your proposal, right?
Mike
--------
Date: 10 Oct 90 11:29:48 PDT
From: Mark Weiser:PARC:xerox
Subject: Re: C2C, line numbers, and, debugging proposal
In-reply-to: "Christian P Jacobi's message of Tue, 09 Oct 90 14:57:19 PDT"
To: Christian P Jacobi
cc: Foote, Spreitzer, Theimer, Jacobi, Mimosa, pavel, norman
I thinking encoding the numbers in the endianness of the target is not a good spec. Someday C2C might generate C code that runs on both endians, then we could use the same file for both--except for this spec. Another possibility: specify the order as always a particular kind, pay the conversion cost for the other type of machine. Proposed order: network byte order, same as a sparc (I think). Another possibility: use ascii representation. This wins for small files, where 1 or 2 or 3 bytes and a space win, or are no worse.
I like ascii representations in general--they extend more easily. For instance, if we define the format now to be <digits><space><digits>[(<optional additional info>)] then we can add ending positoins later without changing the format, as long as all the initial parsers are ready to ignore things in parens.
Suppose we did not use C as an intermediate form, or we wanted to find object code numbers from a file format of this type? It would be nice if this same format could work in general, and perhaps that header could say what was a map to/from. I think probably this proposal works fine for these other uses, but I just mentions these generalizations for food for thought.
Does this format work also for M3 and Scheme? For Scheme at least one will probably want byte position (because it wants to point to expressions). Is ending important there? Pavel?
-mark
--------
Date: Wed, 10 Oct 90 16:37:19 PDT
From: Alan J. Demers:PARC:xerox
Subject: Re: C2C, line numbers, and, debugging proposal
In-reply-to: "Mark Weiser's message of 10 Oct 90 11:29:48 PDT"
To: Mark Weiser
Cc: Christian P Jacobi, Foote, Spreitzer, Theimer, Jacobi, Mimosa, pavel, norman
Responses to Mark and Mike's responses.
1. Are we keeping the C files around or not? If so, there's an argument that the line number file should be endian-neutral if the C code is. On the other hand, if we're not keeping C files around, then the remaining information is going to be in target-dependent format anyway (in .o format or one of its descendants). So respecting the endian-ness of the target may be acceptable.
I don't believe we can assume generated C code will be target-independent, so in general we'd have to keep around more than one C file and associated information. I'd prefer not to have to deal with several groups of target machines sharing C (and source line and other) files. It seems to me all we save by doing this is a little disk space, and we add another level of complexity to our version control problem. After all, we can't avoid keeping object (and other) files per target ...
2. The charm of an ascii encoding is (a) it is human readable, and (b) it completely avoids endianness problems. I dispute Mark's claim that ascii representations are inherently more extensible than non-ascii ones. The issues are orthogonal: It is easy to design an extensible non-ascii representation, and it is easy to design a non-extensible ascii representation.
3. I don't believe compactness of encoding is much of an issue: Christian's representation is 8 bytes per line entry, which means that he gets 1000 entries per SunOS file system disk read.
4. If the representation is going to be endian, clearly the target endianness is a better choice than the compiling machine's endianness. But what a drag! An endian-neutral encoding would be better.
5. This information should eventually be in an extra subsection of the a.out (or coff or elf) file anyway. (note that's per target, but it's not that big anyway ...).
Al