MOCKINGBIRD: A COMPOSER’S AMANUENSIS
by
John Turner Maxwell III and Severo M. Ornstein*
ABSTRACT
Mockingbird is a computer-based, display-oriented, music notation editor. It is especially focussed on helping a composer to capture his ideas. It can play scores on the synthesizer as well as displaying and printing them in standard notational form. It can accept both graphical input and input played on a synthesizer keyboard attached to the computer. In the latter case the user must edit the music to turn it into standard notational form, and much of Mockingbird’s interest lies in the methods by which this conversion is accomplished. The editor is highly interactive, presenting the illusion that the user can reach in and move elements of the score around as desired. This illusion is supported by the fact that the detail of the score is always shown exactly as it might be printed.
* The authors are both members of the Xerox PARC Computer Science Laboratory. At the time the Mockingbird editor was constructed, Mr. Maxwell was completing his studies at the Massachusetts Institute of Technology. Mockingbird is the subject of his MIT Master’s thesis. Although the system design was a joint project, Mr. Maxwell did all the programming while Mr. Ornstein acted primarily in a supervisory/advisory capacity and designed the hardware interface to the synthesizer. The existence of this editor should not be construed as indicating any special interest on the part of Xeox PARC in music editors; specifically there is no "music laboratory" or "music project" within PARC.
INTRODUCTION
Mockingbird is a composer’s amanuensis, a computer program designed to aid a composer with the capture, editing, and printing of musical ideas. The purpose of Mockingbird is not to invent new music or suggest variations to the composer but simply to aid him in recording his own ideas by speeding up the notating process. Mockingbird is not a publisher’s aid, although it does print music; nor is it a performer’s aid, although it can play; it is strictly focussed on the composer’s need for a powerful scribe.
Mockingbird is an interactive music notation editor. It knows nothing about the rhythmic, harmonic, or melodic aspects of music except as they are represented in common music notation. To narrow the problem, we have concentrated on handling piano music; Mockingbird cannot presently handle orchestral scores or music for instruments that require their own notational devices.
It is somewhat surprising that no one has previously built such an obviously interesting system. We believe that there are two principal reasons: first, we had at out disposal, for the first time, an unusually powerful set of hardware and software facilities with excellent graphics capabilities, and second, we made a number of key decisions, discussed below, which allowed us to by-pass some extremely difficult problems.
OVERVIEW
Mockingbird is a software package written in Mesa [1,2], an experimental language developed at Xerox PARC. Mockingbird runs on a general purpose computer called the Dorado [3], which is an extremely powerful, experimental, single-user machine also developed at PARC. (The Dorado is now officially known as the Xerox 1132). It has a 60 nanosecond instruction cycle, a large memory (typically two to eight megabytes of RAM), and an 80 megabyte disk. It also includes a large, high-resolution bitmap display, a keyboard, and a special graphical pointing device called a mouse. As the user moves the mouse around on the table beside the keyboard, a cursor (pointer) moves around correspondingly on the display to indicate the mouse position. When Mockingbird is run, a score appears on the display with staffs, notes, beams, and so on. The mouse is used to point at particular elements of the score. It has three program-readable pushbuttons on its top which are used for issuing commands to Mockingbird - sometimes in conjunction with keyboard keys.
The graphics facilities, high speed, and large memory make the Dorado a particularly suitable tool for music editing. The only special hardware we provided was an interface to a Yamaha CP-30 electronic synthesizer. The Dorado can sense key positions and simulate key strokes on the synthesizer. Thus it can be used to "play in" music and to allow the computer to "play-back" music without having to synthesize the sound waveforms by program. The set-up is shown in Figure 1. Not shown is an experimental high-resolution, computer driven, raster-scan laser printer to which pictures can be sent over an Ethernet [4] connection from the Dorado.
In addition to these hardware resources, Mockingbird relies heavily on a general purpose graphics software package [5] which provides simple commands for displaying characters and drawing both lines and curves. In addition it provides a common interface for both displaying material on the screen and printing high-resolution hardcopy.
Mockingbird is designed to handle classical piano music notation. It knows how to display and play such things as notes, rests, accidentals, beams, chords, and ties. It knows how to display objects such as measure lines, time signatures, key signatures, and clef indications. It understands about the key of a piece and will propogate and suppress accidentals appropriately. It can also handle, both in the graphics and in the playing, some of the more esoteric devices such as n-tuplets, ottava, trills, grace notes, and mordents.
Despite its sophistication, the reader should remember that the Mockingbird editor is only a "research prototype". Many features are still missing and in general we did only enough to demonstrate feasibility. For example there are numerous notational devices which we never got around to incorporating such as rolled chords, staccato markings, fermata, and so on. Furthermore, although Mockingbird can handle ties, it cannot handle the more general slurs. Nor can it display text such as lyrics or tempo markings. We do not feel that the addition of further features would present any insurmountable problems or violate the premises upon which we proceeded.
IMPORTANT DECISIONS
We feel that Mockingbird’s success is largely dependent on the decisions we made in the following four critical areas.
Amanuensis vs. Automatic Transcriber We decided not to try to write a program that converted synthesizer keystrokes directly into a score. We made the decision for a number of reasons. First, we weren’t sure that it could be done for the class of music we were interested in. Rather than pursue that question, we wanted to produce a tool that worked. Second, we knew that an editor would be needed anyway, both to correct mistakes and to satisfy the composer who did not want to use the synthesizer keyboard to enter material. So instead of a recognizer, we built an amanuensis or scribe, which provides a human transcriber with powerful editing tools. Our strategy was to build the editing tools first and then work on automatic heuristics to augment the editing process. The editing tools can assist either in performing the conversion from played-input to score, or in entering scores graphically.
Data Structure We believe that one of the most important decisions we made was the choice of data structures. Mockingbird treats music simply as a sequence of events. This allows us to handle simultaneously raw, ("played in") material and more finished ("structured") material. It is furthermore convenient for presenting the material in its various external manifestations - displaying, printing, and playing.
User Interface In Mockingbird, rather than doing a lot of typing on the computer’s keyboard, the user operates directly on the picture of the music that appears on the screen. To do this, he makes heavy use of the mouse. Furthermore, there is a strong correlation between the internal representation of music, its visual display, and how it is played. All of the elements of the data structure are displayed, and everything shown on the screen corresponds to some part of the data structure. If the user moves something on the screen, the data structure is immediately updated to reflect it; if the data structure is changed, the screen is immediately repainted. Not only is the picture faithful to the data structure, but so is the synthesizer "performance". For example, if the user puts a trill marking on a note, Mockingbird will trill when playing it.
Voices Music is broken "vertically" into separate parts or voices. Such partitioning is obvious in multi-instrument music but it is also present in some single-instrument (e.g. piano) scores as an essential structural feature. The recognition of this fact, and its explicit representation in Mockingbird, greatly facilitates the editing and formating of scores. This topic is discussed further below.
THE EDITOR
Mockingbird consists of a number of functionally distinct parts integrated into one editor. The editor allows the user to record, edit, play, and print a single piece of music. Commands are issued by making selections and typing characters. Some commands are invoked by pointing the mouse at an object on the screen and clicking a button. When the user has finished with the piece of music, he can name it and file it away in the Dorado’s filing system. Later it can be retrieved by name.
A score appears on the display that looks much like a piece of sheet music. (see Figure 2). There are usually four to six staff sets (lines), each composed of two to four staffs. At the left of each staff there is a clef sign and an appropriate key signature. Scattered over the staffs are notes, chords, beams, measure lines and other symbols commonly found in music.
Only about a page of the score can appear on the screen at a time, so there are commands that allow the user to look at different sections. "Scrolling" will cause the current section of the score to be moved up or down so that neighboring lines can appear. The user can scroll from a single line to an entire page. "Thumbing" allows the user to jump to an arbitrary point in the score. To thumb, the user specifies approximately how far into the score he would like to be, and the program moves the display to that point. Both thumbing and scrolling are accomplished by moving the cursor into a special "scroll bar" at the left of the score and clicking one of the mouse buttons.
The score can be edited with Mockingbird just as documents are edited with word processors. The usual paradigm for making an edit is to "select" some portion of the music and then issue a command. The command will apply only to the selected portion. The display is updated immediately to reflect the changes.
SELECTION
Two primary types of selection are available to the user: he may select either a contiguous section of the score, or an arbitrary collection of individual notes. Both types of selections are made by moving the mouse over the desired objects while holding down a mouse button.
Section selection is typically used for gross editing operations that apply to a section of the score. It is indicated by video reversing the section (black to white and white to black). The section selected may be as small as a portion of a measure or as large as the entire score. It encompasses all of the notation that appears on all of the staffs.
Note selections are typically used for operations that apply only to notes. They are indicated by painting the selected note heads grey. An individual note may be selected by pointing at it with the mouse and clicking the left mouse button. Notes may be collected into a selected set for combined action either by a series of individual mouse clicks or by sweeping the mouse over the note heads while holding down the mouse button. Notes accidentally included in a selection may be deselected by using the middle mouse button.
Since selection persists after execution of a command, it is possible, with a single selection, to issue a succession of commands all of which apply to the same material.
VOICES
Figures 3a-3d illustrate what we mean by the term "voice". Figure 3a shows a section of a full score whereas Figures 3b-3d show only the notes belonging to single voices. Note that a voice is not restricted to one staff and that it may not always completely fill the rhythmic structure of a measure; there may be gaps. The essential point is that at any point within a measure, each voice which is present at that point will be represented by some element (note, chord, or rest) having a particular time value (quarter, dotted-eighth, etc.). Voices thus form parallel sequences of time-valued elements that uniquely fill sections of the rythmic structure. Normally voices will be assigned in such a way that the material in successive measures of a voice will be related in some thematic way, but such measure-to-measure consistency is not essential.
Voicing in a score is normally indicated by such things as beaming, stem direction and staffing. The human reader uses these clues in determining how to play the piece (i.e. when to play the notes). Although Mockingbird could try to infer voicing, just as a human performer does, we decided that it was better for the composer to have explicit control over its definition. The composer is therefore provided with suitable commands for assigning notes to voices.
Parsing of music into voices is important for a number of reasons. First is the fact that in order to know when to play a note, it is frequently necessary to know which voice it belongs to; the time a note is to be played is normally determined by the time of occurrence and duration of the prior note in its voice. Horizontal positioning on the page is generally non-linear with rhythmic time and is therefore not a reliable guide for when a note should be played. Vertical alignment with other notes, although a fair hint, is used only in the absence of other information (as, for example, when a voice starts up in mid-measure). The second way in which voicing is useful is that it permits checking - to verify that the rhythmic "time" of every measure is precisely filled by the notes and rests it contains (with due allowance for parallelism of voices). Finally, proper voicing provides essential information for the algorithm (mentioned below in the section on converting piano rolls) which guesses the time values of the notes in multi-voice music.
On the display screen, if a particular voice is designated for viewing, the notes of that voice appear in black while the rest of the score is displayed in a light grey for reference. The user can only access items within that voice; i.e. area selection will only select the notes in that voice and individual notes in other voices cannot be selected. Any editing commands issued when viewing a single voice will thus affect only that voice; the part shown in grey is not affected.
EDITING COMMANDS
Typical commands for editing the score include assigning note values, assigning notes to voices, transposing notes, changing their stem direction, changing their spelling, or changing the staff on which they appear. In addition various elements of the score (such as notes, measure lines, time, key, and clef signatures) can be individually deleted or picked up and moved to a new location with the mouse. There are also commands that group notes together with beams, chords, or slurs. Many of these commands work with either note or section selection. For instance, the user may transpose a single note down an octave or an entire voice up a fifth. The first action would be accomplished by selecting the note and issuing the transpose command. The second would be accomplished by first changing the view to look at just the voice desired, then selecting the entire score, and finally issuing the command.
The user may also rearrange large sections of the score with ease. (This is sometimes called "cutting and pasting"). To replace one section of the score with another, the user first selects the section to be replaced (called the primary selection), then the section to be copied (called the secondary selection), and finally issues the replace command. The primary selection does not have to be the same size as the secondary selection. If there is no secondary selection, the resulting operation is deletion. If the primary selection merely points at "empty space" in the score, the resulting operation amounts to insertion.
In addition to changing the structure of the music, the user can change the way it appears on the sheet. For instance, the number of staffs for each line can be changed on a line by line basis. The user can switch a staff’s clef in the middle of a measure, or designate a section to be displayed in ottava notation. Changes in key signature and time signature can also be inserted within the score wherever necessary.
The user can add new material to the score by picking up items from a pop-up menu. The menu appears under the cursor whenever a particular button of the mouse is held down (see Figure 4). The menu includes a note icon, a rest icon, clef icons, measure lines, and various markings such as trills, accidentals, etc. As the cursor is moved over the menu, its shape changes to correspond to the icon immediately underneath it. When the mouse button is released, the cursor retains the last shape. The user can then insert instances of that icon by pointing to a place in the score and clicking another mouse button. (Mockingbird automatically selects the inserted note or rest for the user’s convenience. This allows the user to immediately issue commands that affect the note).
In addition to the fundamental commands that allow the user to rearrange the score and change the structure of the music, there are also a number of fairly sophisticated commands that help the user produce a complete score. For instance, Mockingbird can check that all the measures add up to the correct metrical values. So if, for example, one of the voices were missing a rest, Mockingbird would complain by marking that measure with a stipple pattern on the display. Other such commands are discussed in the following sections.
THE SYNTHESIZER
The synthesizer is both an input and an output device. As an output device it can be used to listen to music stored by Mockingbird. This is especially helpful in proofreading scores. Mockingbird "reads" the score and plays it by simulating key strokes on the synthesizer. As the synthesizer plays the notes, a pointer tracks the performance on the displayed score. Mockingbird’s rendition will handle polyphonic music correctly, taking into account such things as grace notes, n-tuplets, trills, ottava, and metrenome markings. Thus it gives the composer the ability to listen to what he has written. Although the performance sounds a little mechanical, it is sufficient to catch erroneous note values and pitches. Moreover the music can be played at double speed for rapid scanning or half speed for careful listening.
As an input device the synthesizer is used to capture music played by the composer. As the user plays, Mockingbird "watches" the keys and records when every note was struck. Music in this form is displayed as a notehead time plot which we call a "piano roll", illustrated in Figure 5a. Mockingbird chooses default staffing and spelling. Figure 5k shows the final score for this same section of music.
In playing, the composer isn’t restricted to a single melodic line, nor must he follow a metrenome; he may play whatever he wants as freely as he wishes. Mockingbird captures his idea in rough form which, although a far cry from a standard score, nonetheless contains enough information to reconstruct his original intent. At this point, the composer could go on to capture more music, or he could start transforming the piano roll into a score by editing it as discussed below.
Raw piano roll can be mixed in freely with standard music notation, both on a measure-by-measure basis and within a single measure as seen in Figure 5e. All of the commands that apply to standard music notation can also be applied to the piano roll or to mixed sections. Thus the composer can rearrange material, put notes into different voices, specify the durations of the notes, and add structural elements such as beams and chords. The ability to mix piano roll and standard music notation gives the user a lot of freedom; he can work on the score in whatever order pleases him. Mockingbird will even play correctly across the boundaries of mixed sections of piano roll and standard music notation.
Another feature of Mockingbird is that it is possible to "play against" previously entered material. While Mockingbird plays, the user can play along with it on the synthesizer. The combined product will be heard as Mockingbird will record the new notes, simultaneously merging them in with what it is playing. This allows the composer to build a piece one voice at a time, or to lay in new material over an existing score. It also allows him to construct music that cannot be played by a person on a standard instrument.
CONVERTING PIANO ROLLS
Figure 5a shows some source material directly from the synthesizer. Such raw piano rolls are hard for humans to read; there is no key signature or time signature, there are no measures, chords, or beams to group things, nor have the notes been separated into voices. Because part of the process of composition includes specifying such syntactic structuring, it was necessary for Mockingbird to go beyond piano roll notation. For reasons mentioned earlier and discussed later on, the process of converting from piano roll to standard music notation is not handled automatically but involves the user. However, Mockingbird does provide a number of heuristics that assist him with the transformation.
Figure 5 shows a typical succession of steps in turning some piano roll input into a final section of score. As we discuss this process is important to remember three things: first, that the particular sequence of steps shown in the example represents only one order in which the job can be accomplished; second, at any point the composer can stop to enter further material that may occur to him; and third, that playing on the synthesizer is not the only method of entering music into the editor. At any point the composer can use the mouse and menu to add material since Mockingbird allows piano roll and standard music notation to co-exist.
The first step is one of alignment. In order to remove the inevitable imprecision in playing notes that should be simultaneous, the user can apply a heuristic that runs through the piano roll and aligns notes that occur very close to the same time. Figure 5b shows the results of that step.
Typically, the next step is for the user to provide key and time signatures, as shown in Figure 5c. Assignment of key signatures not only allows for proper display of the signature itself, but also enables Mockingbird to spell notes properly, following the usual rules for propagating the effects of accidentals within measures. Mockingbird’s spelling algorithm for accidentals does not follow the rules correctly; instead we use a simple algorithm that is roughly correct and provide means for the user to modify any spelling he wishes. Assignment of time signatures causes them to appear in the score and enables Mockingbird to check for certain timing violations (e.g., the assignment of four quarters to a voice in a three-quarter measure).
Next the user might enter measure lines. He can do this by picking up a measure line icon from the pop-up menu and depositing copies at suitable places into the piano roll. Alternatively, the user can tell Mockingbird to play the piano roll back on the synthesizer, and as it does so, he can "beat in" measure lines simply by striking the keyboard’s space-bar on the first beat of each measure. Errors can easily be corrected by moving, deleting, or inserting measure lines as appropriate. Figure 5d shows the result.
Next the user would normally assign notes to different voices. An individual note is assigned to a voice by first selecting the note and then indicating which voice it belongs to. More typically, collections of notes are simultaneously voiced by selecting them together and then issuing a single voicing command. Similarly, if desired, notes can be moved to a staff different from the default assignment - as illustrated by the bottom voice in Figure 5d.
At this point, the user can go through the score, manually assigning time values to the notes and designating chords and beams. Figure 5e shows this process partially completed. However, Mockingbird has a number of heuristics to help him with these tasks. After the user assigns notes to their proper voices and gives a time signature, he may ask Mockingbird to guess the time values of the notes, to group them into chords, and to assign beams. Although the heuristics used are only about 80% accurate, they save the user a lot of work. In addition, what remains is easier to deal with since it is in a more familiar form. Figure 5f shows the result of the heuristics applied to figure 5d. (If we had started with 5e, the heuristics would have taken advantage of the information that the user had already provided). Mistakes made by the heuristics can be found by inspecting the score or listening to it through the synthesizer. The user then fixes the mistakes and adds more structure. Figure 5g shows the resulting score. The combination of simple heuristics and easy editing is at the heart of the notion of an amanuensis.
By the time these steps are completed, the piano roll has become a score containing all the basic information. The only thing that remains to be done is some tidying up as discussed below.
JUSTIFICATION
A particularly powerful command is the one that "justifies" a sequence of measures within the score. (The user selects some area of the score and the justifier locates the nearest measure lines). Justification involves several things: making the voices consistent relative to one another; laying out the graphical elements of the score in an aesthetic arrangement; and making sure that each line of the score contains an integral number of measures. Justification is concerned only with the horizontal placement of objects; things like the height and tilt of the beams are outside of its domain. Furthermore, it doesn’t perturb structural elements such as stem directions and staffing.
The justifier starts by going through each measure and making sure that all of the voices are consistently ordered relative to one another. Two voices are inconsistent if separately they add up to the time signature but together they add up to more than the time signature. Figure 6a shows an example of such a situation. The justifier moves notes around to correct matters, as shown in Figure 6b.
Next, the justifier re-determines the horizontal placement of the graphical elements of the score. The horizontal spacing is based on the types of elements (measure line, note, clef sign, accidental, etc.), the voicing, and the need to keep things from overlapping. The user can also give a parameter that determines about how "dense" the justification will be. The justifier then squeezes things together as close as possible based on these constraints. Figure 5h shows the results of this step as applied to the section of score from Figure 5g.
Finally, the justifier stretches out the spacing in the material to make an integral number of measures fit on each line. The user can justify various sections of a piece with different densities as appropriate. Figure 5i shows the consequence of using a low density parameter on this section. Figure 5j shows a more reasonable density. The last step is to tilt the beams and add clef switches as shown in figure 5k.
If at any point the user is dissatisfied with the results of a justification, he can manually move items around in the score to improve the appearance. However, the justifier produces a surprisingly good layout, so that usually the only things left for the user to do are the grouping and tilting of beams, the addition of clef switches, etc. (i.e. adjustments that enhance readability). In fact, given the power of the justifier, one common style of use is to enter music a voice at a time, not worrying at all about the spacing between notes. Each time a line’s worth has been entered, the user justifies that and then goes on to the next line. The justification takes care of aligning the voices properly and producing a suitable layout.
The justifier is also helpful in determining page layout and page breaks. The user can indicate that a particular measure must fall at the end of the line. The justifier takes this into account when deciding how many measures to put on the line. With this feature the user can control how many pages the score will fill. He can also make sure that the end of the score falls at the end of a page.
DATA STRUCTURES
We felt that it was important to include a section on Data Structures because it took us such a long time to arrive at a final design. We will describe this design and explain why we think that it is the most suitable one for our purposes.
Our first design was a structured, hierarchical data structure which closely matched the formal structure we saw in music. But we ran into numerous problems with it because it didn’t match the needs of an editor. After much discussion, we settled on an unstructured, sequential data structure. This surprised us, because in the beginning we had thought that the hiearchical design was the obvious choice. However, experience has convinced us that the sequential design is vastly superior.
There are three considerations in designing a data structure: representational power, programming convenience, and performance. Representational power has to do with how much of the domain is covered by the data structure, and how easy it is to represent different aspects of the domain in the structure. Programming convenience has to do with how easy it is to write algorithms that deal with the data structure. This depends a great deal on what the algorithms do. In Mockingbird we are mostly concerned with playing, editing, and displaying the score (as opposed to structural analysis or automatic composition). Performance also has to do with the algorithms chosen. Even if a data structure is convenient, it may not be efficient. Sometimes there is a trade-off between structural complexity, memory-utilization, and speed. In Mockingbird, memory was plentiful and speed was critical; for an editor to be useful, the display must respond crisply to user actions.
There are many possible "hierarchical" data structures that might be used to represent music. We use the term loosely to describe a class of data structures that implicitly incorporate musical structure in the design. Thus one might imagine a data structure that had a separate part for each measure, or for each voice. A "sequential" data structure, on the other hand, is simply a sequence of undifferentiated entities. No attempt is made to incorporate musical structure into the design. Instead, it is up to the algorithms to determine the structure from the entities.
On the surface the hierarchical design seems obviously better. If the musical structure is built into the data structure, then one can guarantee a uniform interpretation over all of the algorithms. Not only that, but algorithms won’t have to derive the built in information.
Unfortunately, basing the data structure on the musical structure was too constraining. We wanted a uniform representation for both common music notation and piano rolls so that the user could mix both types of music freely. Although it might be possible to keep a separate data structure for piano rolls, it would make the algorithms for editing, displaying, playing, and justifying much more difficult. All of these algorithms needed to know what things were near one another. A simple example is redisplaying the score after a small edit has been made. For efficiency one would like to redisplay as little of the score as possible. But that requires knowing what is near the entity that was edited. In the hierarchical design an entity that is close physically may be logically far away. It might be in another measure, in another voice, in a different chord, or on a different staff. Enumerating all of the possibilities is incovenient.
In addition to all this, we kept finding exceptions. Many of the rules of notation that are presumably inviolable turn out to be violated when the composer finds the notation too constraining. Figure 7 shows several examples of this. A design that had fixed rules about the structure of music built into it wouldn’t be able to handle such exceptions. Even if the exceptions were ruled out, one would still have problems with the inconsistent structures that arise temporarily during editing. We wanted our design to be tolerant of such exceptions and inconsistencies.
A sequential design doesn’t have these problems. It allows piano rolls to be mixed with standard music notation because both are represented as ordered sequences. Finding things that are nearby is easy, because things that are near one another on the screen are near one another in the data structure. And finally, since the data structure is so unstructured, it is flexible enough to handle a wide range of exceptions.
Mockingbird’s "sequential" data structure is simply a sequence of events ordered by time. An event might be a measure line, a collection of notes, a time signature, a clef switch, or a switch in the number of staffs per line. The events that contain notes are called "syncs" because they synchronize all of the notes in the event. (That is, all of the notes in the sync are played or displayed together.) The notes may belong to different voices or chords, but they all have the same "time". The editor automatically syncs together notes that are very close to simultaneous whenever notes are entered or moved. Occasionally this will introduce an error, which can be fixed by the user.
Syncs are important because they keep simultaneous notes together while the score is being edited. Usually, if the composer plays several notes at the "same time", he wants them to stay together unless he explicitly says otherwise. Inserting a note before a sync shouldn’t break up the sync, even if one of the notes in the sync belongs to the same voice as the inserted note. If any of the notes in a sync move, they should all move. (The justifier sometimes violates this rule, but only when it is obvious that the notes have been incorrectly synced.)
There are three ways of measuring the "time" of an event: as seconds from the start of play, as beats from the start of the score, and as inches from the first measure line. Although these notions of time are very different, they co-exist nicely because the order of one is usually the order of the other. Thus if note A is displayed to the left of note B, it is most likely played before note B. In general we can therefore use the order of the notes as they appear on the display to determine the order in which they should be played. There are a few exceptions which must be handled properly; embellishments such as trills and grace notes are not always played in the order that they are displayed. Conversely, notes that are logically simultaneous may be separated slightly on the display in the interest of visual clarity.
Beams and chords are ancillary to the main data structure, since they are just horizontal and vertical parentheses which group notes together as visual aids for the human performer and aren’t otherwise crucial to the score. If all the beams and chords were removed from a score, it would affect its readability but not its playing. In Mockingbird, each beam and chord knows what notes belong to it, and each note knows what beam or chord it belongs to. In addition, chords have a stem direction and beams have a tilt and vertical position.
THE SHEET
So far we have been assuming that a score is a long sequence of measures that appeared on one line, but since music is printed on rectangular sheets of paper, this long line must be broken up into shorter lines. Rather than make our data structure more complex, we decided to use a separate data structure to map between the linear data structure and the two-dimensional piece of paper. This data structure keeps track of how long each line is, how many lines fit on each page, how much of the line must be devoted to a key signature, and which section of the score goes on which line. Only the displayer and the justifier need to make use of this representation of the sheet; all of the other algorithms manipulate the sequential data structure directly.
AUTOMATIC RECOGNITION OF PIANO SCORES
Why do we feel that the automatic recognition of piano scores is so difficult? It has been done for some simple pieces; can it be done for the class of music that we are interested in? In particular, can it be done for polyphonic music where the voicing is not known in advance?
To produce a proper score involves determining the time value of all notes. Although one traditionally thinks of "holding a note" say for a quarter, one doesn’t mean literally holding the key down. What is meant is that the next note in that voice is to commence one quarter after commencement of the note in question. Time values thus measure intervals rather than durations. (Although often the two are almost the same, in staccato playing it is not the case). So in order to assign time values, it is first necessary to separate the music into its component voices. This requires a thematic and harmonic understanding of the structure of the piece. Sometimes the "same" note may belong in two different voices, possibly even with two different time values. Rests and ties further complicate the picture as they are elements that appear in the score but are notable by their absence in the actual playing. Understanding the structure of the piece is a difficult and unbounded problem, not in line with our main interests.
A related problem involves determining which notes are to be chorded together. Often, though not always (see Figure 3), this is just a matter of separating the music between the two hands. But even for that simple case, there is no way, in general, to know which hand played a given note. Similarly the assignment of notes to staffs is a complex function of voicing, fingering, division between hands, and taste. Further complications include identification of complex n-tuplets, distinguishing grace notes, determining rhythm and detecting rhythmic changes, identifying measure lines, determing how many staffs to use, how to combine notes into beams, when to switch clefs or use ottava, in short - determining all of the complex structural notational devices that composers use to enable performers to read scores.
As we studied more and more scores, we found that complexity often gave way to ambiguity - that decisions about notation were often a matter of personal taste. The examples shown in Figure 7 illustrate some of the problems we encountered and indicate why we decided not to attempt automatic recognition.
CONCLUSION
Our intention in presenting this material is to encourage others to pursue similar endeavors. We hope that some of the things we have learned or demonstrated will be helpful in this. Music editing is already being done on home computers and while it will be some years before machines as powerful as a Dorado are in every living room, soon useful tools will become feasible - even on a home machine of modest cost. It would seem that a display of reasonable resolution and a mouse (or some similarly convenient pointing device) are prerequisites. But if one eschews the temptation to make "pretty" scores and sticks to providing a simple "cut and paste" editor of piano roll material, then a reasonable composing tool should soon become possible. The problem of storage and retrieval of snatches of material and full pieces would have to be addressed, but seems quite tractable.
ACKNOWLEDGEMENTS
Mockingbird was made possible by a fortuitous convergence of people, interests, and facilities. The environment at Xerox PARC in general, and within the Computer Science Laboratory (CSL) in particular, provided a hospitable environment for this work. The existence of and access to a Dorado was absolutely essential. Robert W. Taylor, director of CSL, provided us with this and all other necessary facilities and support. Will Crowther worked with us closely on the initial design and helped us get started in the right direction. John Warnock and Doug Wyatt provided us with the Cedar Graphics software package which Mockingbird uses. Gene McDaniel wrote special Dorado microcode for handling the Synthesizer and Mike Overton built the hardware interface. Last, and most gratifying of all, has been the enthusiastic support that we received from our colleagues, whose vicarious pleasure in seeing Mockingbird come to life cheered us along the way.
REFERENCES
1. "Early Experiences with Mesa", Geschke, C.M., Morris, J.H., and Satterthwaite, E.H. Xerox PARC Report CSL 76-6, Oct. 1976.
2. "Mesa Language Manual, Version 5.0", Xerox PARC Report CSL 79-3, Apr. 1979
3. "The Dorado: A High Performance Personal Computer" (Three Papers), Xerox PARC Report CSL 81-1, Jan. 1981
4. "The Ethernet Local Network" (Three Papers), Xerox PARC Report CSL 80-2, Feb. 1980
5. "A Device Independent Graphics Imaging for Use with Raster Devices" Warnock, J. and Wyatt, D. Computer Graphics VOL 16 # 3 July ’82 pp. 313-320. (Sigraph ’82)