-- File: LogImplDoc.tioga Stateless versus stream style log reading We had hoped to have only one set of log reading routines, of a stateless style: given a log record ID, return the log record. This proved to be impractical for log reading during recovery. The reason is that the state of previous pages (in particular the version bit) changes the interpretation of the current page. Pushing this up to a high level, as is required by the stateless style of reading, complicates things for everyone. So we provide stream access for recovery and stateless access for carrying out intentions. Partial log records Log records may span multiple log pages. After a crash, the log may end on any page boundary. This means either (1) we must be prepared to carefully rewrite the last log page during recovery, or (2) we must be prepared to see partial log records when reading the log. (Consider, for instance, the backup system, which reads the log long after it has been discarded for recovery purposes.) Fortunately, any partial log record can be limited to one page, since it is perfectly safe to overwrite the following pages during recovery. So one solution to the problem is to always perform one-page lookahead in the log when doing a sequential read. This issue is closely related to the issue of part-full log pages. In the current design, each log force creates a part-full log page. This seems acceptable since the average loss is less than 128 words, while the typical transaction writes much more than this (and readonly transactions never force the log.) Also, if transaction rates are high, we can make the forcing transaction wait a short time or until the page to fills up before forcing it; generally the page will fill up first. Finally, this is also related to the use of low-latency stable storage. With this device we never need to write partially full pages to the log since we can write them to low-latency stable storage instead, writing the page only when it fills. Garbage past the end of log Since we don't control the order in which pages are written, it is possible for them to be written in other than ascending order. This means that a recovery program that did not interpret the definition of "end of log" sufficiently strictly might see nonexistent records past the end of log. For instance, it could happen that a checkpoint record was written past the end of log at the instant of a crash. This record would be found by a naive scan of the log, but it is not actually part of the log and it might be incorrect to base a recovery on the contents of this checkpoint. This explains the pains taken in interpreting the log in the case the restart file is lost.