VMDesignDecisions.tioga
January 17, 1983 4:14 pm
Some Design Assumptions and Decisions for a New Virtual Memory System
A read/write file system model is to be used
We have some experience with the mapped file model and believe that it does not offer any significant advantages over the more traditional read/write model for file access. A mapped file approach may reduce the total amount of backing storage required for virtual memory, but the code to implement it seems to be more complex. A read/write model tends to require more backing storage but reduces or eliminates some sources of implementation complexity (e.g., residency requirements/strategies for backing storage address tables, swap buffers, file size modification)
The entire virtual memory can be backed by a single disk file
The virtual memory is backed by a large, anonymous disk file. Our disks are large enough to accommodate a 24-bit address space (64K pages of backing storage; about 32 megabytes) on a Dorado and a 22-bit address space (16K pages; 8 megabytes) on a Dolphin. (22 bits should be acceptable on a Dandelion with an SA4000 or Quantum disk; perhaps even 23 bits would fit.)
In principle, the mapping from virtual page number to disk page number is a simple numerical calculation. In practice, one cannout assume that it will be possible to allocate a file of 16K-64K pages completely contiguously, since most disks have one or more bad spots. Thus, the mapping will be implemented with a (presumbably short) run table.
Coarse-grain synchronization of virtual memory operations is acceptable
A single monitor will synchronize all virtual memory manipulations. In particular, the monitor will be held during fault-handling. This means that during an I/O transfer initiated by either a page-fault or a swapout, no access to the virtual memory data structures will be permitted. This eliminates the need for selective locking of virtual address ranges at the expense of some potential parallelism. We believe the actual lost parallelism will be slight.
Simple "seek-ahead" provides adequate I/O overlap with fault-handling
Page faults are handled in a simple, synchronous way. There is a single PageFault process that is awakened by the occurrence of a fault, calls into the virtual memory system to swap in the requested page, and waits for the I/O to complete. No queuing facilities are provided at the disk level. A simple "seek-ahead" scheme is employed: when a disk request is submitted, it may include a disk address to which a seek is to be performed at the completion of the requested operation. The idea is that, if the PageFault queue has multiple requests in it, the disk address for the second faultee's page is supplied when the first faultee's page is requested. Therefore, when the I/O transfer completes, the software can restart the first faultee and initiate the I/O for the second faultee while the disk is already seeking to the required address. Since all of virtual memory is backed by storage on a single disk, there is no opportunity for overlapping transfers with each other. "Seek-ahead" would seem to provide most of the concurrency necessary for adequate performance.
Swap units are not a fundamental requirement
Pilot's swap unit notion introduces substantial complexity at the level of the swapper. We propose that the lowest level swapping facilities traffic in intervals of virtual memory, but that the PageFault process, at least initially, only request single pages at a time. Should it become useful to do so, additional data structures can be provided to permit the PageFault process to request appropriate swap units.
Multiple drives must be supported
The disk facilities must be able to support overlapped transfers to multiple drives, since this will be an important performance consideration for Alpine. However, the virtual memory system has no need for this facility; it is provided at the "disk channel" level.
A "real memory only" version of the virtual memory system is useful
It is useful to be able to write utility programs that massage the disk (e.g., formatter, scavenger) using the same virtual memory system as one uses for regular programs. The only difference is that there is no presumption that backing storage is available, and consequently the virtual memory system is constrained to operate within the available real memory and without swapping. This is the UtilityPilot model. Such a facility naturally falls out of the proposed design.
Cedar "safe" storage should be available at the lowest possible level
Adequate functionality should be provided in the virtual memory system to permit the file system to be a Cedar program. This simplifies the task of writing the file system and, in conjunction with the "real memory only" version of the system, permits utility programs to be Cedar programs as well, and hence more easily maintained.