[Ivy]<Sweet>Papers>SASPLOS>abstract.bravo!1

Mesa is a high level systems implementation language that has been used to develop large computer systems for several types of single user computers, all implementing the Mesa processor architecture. There has been a continued emphasis on compact object programs, made possible by a byte coded instruction set whose exact complement of instructions is well tuned to the typical Mesa program.

The Mesa architecture has undergone several evolutionary changes since its original implementation in 1974. The analysis descibed in this paper was made on a large collection of programs compiled for a version of the architecture dating back to 1978, which included virtual memory (32 bit pointers) for the first time. The goal of our analysis was to see how the statistical properties of current language usage could be exploited to generate more compact programs.

A large collection of programs was normalized to a canonical form, where each instruction was replaced by the most general instruction or set of instructions that had the same semantics. This reduced the set of opcodes from 240 to approximately 100. The total collection of normalized data consisted of 2.5 million bytes of instruction sequences. These sequences were analyzed and converted to a new instruction set by a collection of pattern matching and peephole optimization techniques.

We include statistics from the original normalized data that allow the reader to answer many interesting questions about language usage. There is a detailed description of the reasoning process used in deciding the exact set of "Jump Not Equal" instructions. This reasoning was representative of the entire analysis effort, and showed an improvement if 13% in the encoding of not-equal testing as compared to the 1978 instruction set, even though the new instruction set has four fewer opcodes dedicated to not-equal testing.

We also include statistics for the most frequent opcodes of the new instruction set; the analysis predicted a code size improvement of 12% for the new set, and subsequent compiling of a large sample of programs by a compiler generating this set bore out the prediction.

We conclude with advice to those who might undertake a similar analysis of some other instruction set, giving examples of pitfalls and implementation tradeoffs that we encountered.