NoteCards 1.3 An Introduction to the Internal Organization of NoteFiles Randy Trigg Frank Halasz Intelligent Systems Laboratory Xerox Palo Alto Reserach Center First written: 9/3/85 Randy Trigg Last updated: 9/4/85 Randy Trigg Updated for 1.3: 7/16/86 Frank Halasz This document provides a brief explanation of the structure of a NoteFile. It also describes how checkpointing, aborting, and recovering a NoteFile after a crash work. Finally, it describes the use of Inspect and Repair to repair a NoteFile with inconsistent links. (For further information see the document entitled "Inspecting and Repairing Notefiles.") 1. Background Concepts 1.1 Unique identifiers: UIDs All objects in NoteCards (i.e., cards, links, NoteFiles) are assigned a unique identifier called a UID. Each UID is a 112-bit number that is guaranteed to be unique across all time and space. UIDs are used in many places in NoteCards as keys for indexing and retrieving cards, links, and NoteFiles. 1.2 Card parts For storage purposes a note card is decomposed into 4 independent parts: substance (i.e., its content), title, property list, and links. Each of these parts is stored separately in the data area of the NoteFile [See Section 2 below]. When a card is saved, only those card parts that have changed are rewritten in the NoteFile. The substance of the card is stored on the NoteFile in a manner appropriate to its type. Thus a Text card's substance is a text stream and is written on the NoteFile exactly the way TEdit writes out text streams (i.e., text followed by "looks" information). In contrast, all titles are stored as strings and all property lists as standard Lisp lists. Storage of Links is described in Section 4 below. 2. NoteFile Structure: index and data areas A NoteFile consists of three parts, a header, an index, and a data area. The header and the index are fixed in size for each NoteFile. The data area follows the index area and grows as cards are added to or modified in the NoteFile. The NoteFile header contains the following information about the NoteFile: its UID, a number identifying its format, the checkpoint pointer [See Section 3], the size of the index, and a pointer to the next available index entry. (For the exact format of the NoteFile header examine the function NC.GetNoteFileHeader.) If the NoteFile header is destroyed, it cannot be automatically reconstructed. Careful hand manipulation of the NoteFile by a NoteCards wizard is required to recover a NoteFile with a bad header. The structure of the index and data area is shown in Figure 1 . The index contains a fixed number of index entries. Each index entry that is in use, contains information for one of the cards in the NoteFile. Specifically, an index entry contains 5 fields: a status character, the card's UID, and 4 pointers. The status character specifies whether the index entry is free (not in use) or contains information for an active or a deleted card. If the index entry is not free, the UID field contains the UID of the card refered to by this index entry and the four pointer fields contain the location in the data area of the 4 parts of its card: substance, title, links and property list. The number of index entries is fixed at NoteFile creation time. The deafult is 2000 entries. The number of index entries is automatically doubled by the NoteFile compactor if 75% of the entries are used. The compactor also frees (i.e., makes unused) all of the index entries that refer to deleted cards. In normal operation, NoteCards prints a warning whenever more than 90% of the index entries in a NoteFile are used. At this point, the NoteFile should be compacted to increase the index size. The data area contains the actual information about the card. Whenever you change, say, the title of a card, the new title is written at the end of the data area. The title pointer in the card's index entry is then updated to point to this new location in the data area. Thus, in general, a NoteFile's data area grows every time any part of any card is changed. The old information, now somewhere in the middle of the data area, is not removed. However, it is no longer directly accessible because there is no index entry that points to it. Thus, for most purposes this old information can be considered "dead space" in the NoteFile. The NoteFile compactor [See NoteCards 1.2K Reference Manual] rewrites the NoteFile, eliminating all such dead space. ((SKETCH a% figure% from% a% document VERSION 3 PRIRANGE (67 . 0) SKETCHCONTEXT ((ROUND 1 BLACK) ( GACHA 10 (MEDIUM REGULAR REGULAR)) (CENTER BASELINE) (LINE 18.0 15) NIL LAST (CENTER CENTER) ( NIL NIL NIL) T NIL NIL 1.0 NIL)) ((.05 12.0 (PRI 62)) (TEXT (120.0 . 240.0) ( "Index Entry for Card A" "(Index Entry #i)") 1.0 (CENTER BASELINE) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((43.0 243.0 154.0 12.0) (64.0 231.0 112.0 12.0)) BLACK)) ((0.0 52.0 (PRI 53)) ( WIRE ((128.0 . 208.0) (128.0 . 160.0) (232.0 . 160.0)) (ROUND 1 BLACK) (NIL (LINE 18.0 13.0 )) NIL 1.0 NIL (NIL ((232.0 . 160.0) (219.6363 . 155.9828) (219.6363 . 164.0172))))) ((0.0 4.0 (PRI 55)) (WIRE ((128.0 . 208.0)) (ROUND 1 BLACK) (NIL (LINE 18.0 15.0) T) NIL 1.0 NIL (NIL NIL))) ((0.0 56.0 (PRI 56)) (WIRE ((152.0 . 208.0) (152.0 . 96.0) (232.0 . 96.0)) ( ROUND 1 BLACK) (NIL (LINE 18.0 13.0)) NIL 1.0 NIL (NIL ((232.0 . 96.0) (219.6363 . 91.98278) (219.6363 . 100.0172))))) ((0.0 40.0 (PRI 57)) (WIRE ((184.0 . 208.0) (184.0 . 128.0) ( 232.0 . 128.0)) (ROUND 1 BLACK) (NIL (LINE 18.0 13.0)) NIL 1.0 NIL (NIL ((232.0 . 128.0) ( 219.6363 . 123.9828) (219.6363 . 132.0172))))) ((0.0 88.0 (PRI 61)) (WIRE ((208.0 . 208.0) ( 208.0 . 32.0) (232.0 . 32.0)) (ROUND 1 BLACK) (NIL (LINE 18.0 13.0)) NIL 1.0 NIL (NIL (( 232.0 . 32.0) (219.6363 . 27.98278) (219.6363 . 36.01722))))) ((.024 96.0 (PRI 48)) (TEXTBOX (24.0 200.0 192.0 24.0) (" Active: UID: | | | | ") 1.0 (LEFT CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((26.0 206.0 203 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 112.0 (PRI 49)) (WIRE ((240.0 . 176.0) (464.0 . 176.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.024 112.0 (PRI 48)) (TEXTBOX (240.0 176.0 224.0 24.0) ("Index Entry #N") 1.0 (CENTER CENTER) ( GACHA 10 (MEDIUM REGULAR REGULAR)) ((303.0 182.0 98 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL)) ) ((0.0 112.0 (PRI 49)) (WIRE ((240.0 . 200.0) (464.0 . 200.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.024 112.0 (PRI 48)) (TEXTBOX (240.0 200.0 224.0 24.0) ("...") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((341.5 206.0 21 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((.024 112.0 (PRI 48)) (TEXTBOX (240.0 224.0 224.0 24.0) (" Index Entry #3") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((299.5 230.0 105 12)) BLACK (ROUND 1 BLACK ) NIL (NIL NIL NIL))) ((0.0 112.0 (PRI 49)) (WIRE ((240.0 . 248.0) (464.0 . 248.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.024 112.0 (PRI 48)) (TEXTBOX (240.0 248.0 224.0 24.0) ( "Index Entry #2") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((303.0 254.0 98 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 112.0 (PRI 49)) (WIRE ((240.0 . 272.0) (464.0 . 272.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.024 112.0 (PRI 48)) (TEXTBOX (240.0 272.0 224.0 24.0) ("Index Entry #1") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) (( 303.0 278.0 98 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((.024 112.0 (PRI 8)) (TEXTBOX ( 240.0 296.0 224.0 24.0) ("NoteFile Header") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((299.5 302.0 105 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 112.0 (PRI 49) ) (WIRE ((240.0 . 296.0) (464.0 . 296.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((0.0 4.0 ( PRI 21)) (WIRE ((472.0 . 296.0) (480.0 . 296.0)) (ROUND 2.0 BLACK) NIL NIL 1.0 NIL NIL)) (( 0.0 60.0 (PRI 22)) (WIRE ((480.0 . 296.0) (480.0 . 176.0)) (ROUND 2.0 BLACK) NIL NIL 1.0 NIL NIL)) ((0.0 4.0 (PRI 24)) (WIRE ((480.0 . 240.0) (488.0 . 240.0)) (ROUND 2.0 BLACK) NIL NIL 1.0 NIL NIL)) ((.05 12.0 (PRI 25)) (TEXT (496.0 . 232.0) ("Index") 1.0 (LEFT BASELINE) ( GACHA 10 (MEDIUM REGULAR REGULAR)) ((496.0 229.0 35.0 12.0)) BLACK)) ((0.0 16.0 (PRI 32)) ( WIRE ((376.0 . 144.0) (376.0 . 176.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.032 68.0 (PRI 37)) (TEXTBOX (240.0 144.0 136.0 32.0) ("Card A Substance") 1.0 (CENTER CENTER) (GACHA 10 ( MEDIUM REGULAR REGULAR)) ((252.0 154.0 112 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 4.0 (PRI 28)) (WIRE ((480.0 . 96.0) (488.0 . 96.0)) (ROUND 2.0 BLACK) NIL NIL 1.0 NIL NIL)) ((.05 12.0 (PRI 30)) (TEXT (496.0 . 96.0) ("Data " "Area") 1.0 (LEFT BASELINE) (GACHA 10 ( MEDIUM REGULAR REGULAR)) ((496.0 99.0 35.0 12.0) (496.0 87.0 28.0 12.0)) BLACK)) ((.032 56.0 (PRI 41)) (TEXTBOX (240.0 80.0 112.0 32.0) ("Card A Title") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((254.0 90.0 84 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 84.0 (PRI 34)) (WIRE ((240.0 . 112.0) (408.0 . 112.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((0.0 28.0 (PRI 35)) (WIRE ((408.0 . 112.0) (464.0 . 112.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.032 56.0 (PRI 42)) (TEXTBOX (352.0 80.0 112.0 32.0) ("Card B Links") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((366.0 90.0 84 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 16.0 (PRI 36)) (WIRE ((312.0 . 144.0) (312.0 . 112.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.032 36.0 (PRI 39)) (TEXTBOX (240.0 112.0 72.0 32.0) ("Card A " "Links") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((251.5 128.0 49 12) (258.5 116.0 35 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 68.0 (PRI 31)) (WIRE ((240.0 . 144.0) (376.0 . 144.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((0.0 44.0 (PRI 33)) (WIRE (( 376.0 . 144.0) (464.0 . 144.0)) (ROUND 1 BLACK) NIL NIL 1.0 NIL NIL)) ((.032 76.0 (PRI 40)) ( TEXTBOX (312.0 112.0 152.0 32.0) ("Card B Substance") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((332.0 122.0 112 12)) BLACK (ROUND 1 BLACK) NIL (32800 NIL NIL))) ((.032 44.0 (PRI 38)) (TEXTBOX (376.0 144.0 88.0 32.0) ("Card A Title") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((378.0 154.0 84 12)) BLACK (ROUND 1 BLACK) NIL (32800 NIL NIL))) ((0.0 4.0 (PRI 23)) (WIRE ((472.0 . 176.0) (480.0 . 176.0)) (ROUND 2.0 BLACK) NIL NIL 1.0 NIL NIL )) ((0.0 80.0 (PRI 26)) (WIRE ((480.0 . 176.0) (480.0 . 16.0)) (ROUND 2.0 BLACK) NIL NIL 1.0 NIL NIL)) ((.224 152.0 (PRI 1)) (BOX (240.0 16.0 224.0 304.0) (ROUND 1 BLACK) NIL 1.0 ( NIL NIL NIL))) ((.032 60.0 (PRI 60)) (TEXTBOX (344.0 16.0 120.0 32.0) ("Card D Title") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((362.0 26.0 84 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((.032 112.0 (PRI 58)) (TEXTBOX (240.0 48.0 224.0 32.0) ( "Card B Substance") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((296.0 58.0 112 12) ) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((.032 52.0 (PRI 59)) (TEXTBOX (240.0 16.0 104.0 32.0) ("Card A Prop List") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((236.0 26.0 112 12)) BLACK (ROUND 1 BLACK) NIL (NIL NIL NIL))) ((0.0 4.0 (PRI 27)) (WIRE ((472.0 . 16.0) (480.0 . 16.0)) (ROUND 2.0 BLACK) NIL NIL 1.0 NIL NIL)) ((.05 12.0 (PRI 63)) (TEXT ( 248.0 . -24.0) ("Figure 1: Structure of a NoteFile") 1.0 (CENTER BASELINE) (GACHA 10 (BOLD REGULAR REGULAR)) ((129.0 -27.0 238.0 12.0)) BLACK)) ((.05 12.0 (PRI 65)) (TEXT (72.0 . 40.0 ) ("Legend") 1.0 (CENTER BASELINE) (GACHA 10 (BOLD REGULAR REGULAR)) ((51.0 37.0 42.0 12.0)) BLACK)) ((.024 44.0 (PRI 67)) (TEXTBOX (32.0 8.0 88.0 24.0) ("Dead Space") 1.0 (CENTER CENTER) (GACHA 10 (MEDIUM REGULAR REGULAR)) ((41.0 14.0 70 12)) BLACK (ROUND 1 BLACK) NIL (32800 NIL NIL)))) (22.0 -30.0 513.0 350.0) 1.0 8.0 As long as the NoteFile has not been compacted, all the old information can be accessed (and made to be "current") via the NoteFile Inspect and Repair facility. Inspect and Repair does this by ignoring the index and parsing the entire data area to produce a listing of all the information (both current and old) about a card that is stored on the data area. See the Inspect and Repair manual for more information. 3. Notefile checkpointing. As long as a NoteFile is open, its index area is cached in memory. When a card part is saved, the card part's information is written to the end of the data area but the card's actual index entry on the NoteFile is not updated with this new location. Instead, the appropriate pointer in the card's in-memory index entry is updated. Thus, the index in the NoteFile continues to point to the old information in the data area, while the in-memory index points to the new information. Thus while a NoteFile is open, its current state is distributed between the actual NoteFile and information cached in memory. The current index is cached in memory. For cards open on the screen (or cached in memory from the programmer's interface), the "current" card part information is contained in an in-memory cache. For all other cards, the current information is contained in the data area of the NoteFile pointed to by the in-memory index entry. If a machine crashes while a NoteFile is open, the information cached in memory is lost. A crash not only discards changes made to cards on the screen, but its also leaves the information stored on the NoteFile in an inconsistent state. For example, the index on the NoteFile may point to old information in the data area. This occurs because the new information (e.g., a new title) is written to the data area but only the in-memory index pointers (which are lost in the crash) have been updated to point to the new information. Checkpointing forces all of the in-memory information to be written onto the NoteFile. Specifically, checkpointing causes all open cards to be saved and the in-memory index to be written to the NoteFile's index area. Thus, immediately after checkpointing, the NoteFile itself contains its current (and consistent) state. If the machine were to crash at this point, no information would be lost and the NoteFile would be consistent. Checkpointing also writes a checkpoint pointer onto the NoteFile header. The checkpoint pointer contains the location of the end of the data area (i.e., the end of the NoteFile) at the time the checkpoint is done. As the NoteFile is used after the checkpoint, information is written in the data area past the checkpoint pointer but only the in-memory index entries are updated to point to this information. The on-file index entries still point to the information in the data area before the location referenced by the checkpoint pointer. Thus, a consistent NoteFile can be constructed from the index area and the all of the information in the data area located before the checkpoint location. This is essentially the NoteFile as it was at the time of the last checkpoint. (Note: one small exception is that changes to a card's size on the screen are actually written in the middle of the data area rather than at the end. Thus, truncating a NoteFile to its checkpoint location cannot "undo" the reshaping of a card.) When opening a NoteFile after a crash, the system will insure that the NoteFile is in a consistent state. It does so by truncating the data area to the last checkpoint location, saving the truncated information if requested by the user. This leaves the NoteFile in the state it was during the last checkpoint before the crash. Aborting a NoteFile does the same thing. It truncates the data area to the last checkpoint location, thereby eliminating all changes made to the NoteFile since the last checkpoint. It also discards the in-memory index. Thus, the NoteFile is left in the exact state it was after the last checkpoint. Finally, note that NoteFile Close forces a checkpoint. Therefore, aborts and recovery after crashes actually restore the NoteFIle to its state as of the last user-initiated checkpoint or close. 4. Storing links and reparing NoteFiles with inconsistent links The Links card part is divided into three subcomponents: ToLinks, FromLinks, and GlobalLinks. The ToLinks is a list of all links whose Source is the given card (i.e., that point from the card to some other card). The FromLinks is a list of links whose Destination is the given card (i.e., that point from some other card to the given card). Finally, the GlobalLinks is a subset of the ToLinks that includes only the Global links originating in the given card. Given this scheme, every link is stored on the NoteFile in three different places. First, if the link is Local it is stored inside the link icon which in turn is inside the Substance part of the link's source card. If the link is Global, it is stored in the GlobalLinks subpart of its source card. Second, the link is stored in the ToLinks list of its source card. Third, the link is stored in the FromLinks list of its destination card. These three records of the same link occasionally get out of synch, resulting in an inconsistent NoteFile. There are a number of symptoms of such inconsistency. For example, the ShowLinks display for card may indicate that the card is a destination for a link from some source card X while the ShowLinks display for X does not include a ToLink to that destination card. Occasionally, inconsistent links will also result in link icons that conatin the words "DELETED" or "FREE" when displayed on the screen. This usually means that the card at one end of a link was deleted, but somehow the links of the card at the other end were not updated. Such link icons cause NoteCards to break when you try to follow them. One function of the NoteCards Inspect and Repair facility is to resynchronize the three records for all links in the NoteFile. The inspector's third phase rebuilds the links as follows. First it removes all To and FromLinks for every card. Then it reads the substances for each card and recreates ToLinks and FromLinks by looking at the links found inside the link icons in the card's substance and in its GlobalLinks list. In addition, the links rebuilder phase of the notefile inspector can rebuild filebox contents from cards pointed to by the filebox, and the set of all link types from the list of all links. (LIST ((PAGE NIL (PAPERSIZE NIL) (0 0 612 792) ((TEXT NIL NIL (72 72 468 648) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD LEFT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY HELVETICA OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (540 756 288 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL))) (PAGE NIL (PAPERSIZE Letter FOLIOINFO (ARABIC )) (0 0 612 792) ((FOLIO NIL (PARALOOKS (QUAD LEFT) CHARLOOKS (SUPERSCRIPT 0 INVISIBLE OFF SELECTPOINT OFF PROTECTED OFF SIZE 10 FAMILY HELVETICA OVERLINE OFF STRIKEOUT OFF UNDERLINE OFF EXPANSION REGULAR SLOPE REGULAR WEIGHT MEDIUM INVERTED OFF USERINFO NIL STYLE NIL) FORMATINFO (ARABIC )) (540 756 288 36) NIL) (TEXT NIL NIL (72 72 468 648) NIL))))) (((((( ( ((( HELVETICA  HELVETICA  HELVETICA  HELVETICA  HELVETICA  TIMESROMAN :    "!' g .I”-í@p÷mŠ$ SKIO.GETFN.2 HELVETICA  ãȵ© J.¹A9  sºÐkJlzº