The rAlpine file system Mark Brown, Karen Kolling, Ed Taft XEROX PARC Outline of talk Motivation for top-level system requirements Key design decisions to meet these requirements More detail on requirements and design Current status and future plans (Using Cedar) The rAlpine File System June 1983 Motivation for top-level system requirements Cypress DBMS (SIGMOD '81): Data Model Implementation (Entities, Rel'ships) Access Methods (Records, B-Trees, Hashing) File System (Files, Pages) BeginTransaction: PROC [] -> [TransID] CreateFile: PROC [TransID, PageCount] -> [FileID] ReadPage: PROC [TransID, FileID, PageNumber] -> [PageValue] WritePage: PROC [TransID, FileID, PageNumber, PageValue] EndTransaction: PROC [TransID, {commit, abort}] -> [{committed, aborted}] Transaction: a unit of consistency concurrent access to shared files transaction executes as if no other transactions present soft failure (main store error) hard failure (disk head crash) committed transaction persists, aborted transaction vanishes The rAlpine File System June 1983 Evaluation of file-level transactions Potential advantages of file-level transactions Make use of powerful workstation (processing, data caching) configure three ways: all local, file server, database server Easy to write new access methods (no locking or recovery issues) support "nonstandard" database applications Easy to write applications that mix database and raw file access Potential disadvantages of file-level transactions Loss of concurrency due to physical locking (if a problem, fix by adding locking and recovery concerns to access methods -- underlying transactions may still help) Volume of communication is not minimal (if a problem, fix by configuring as database server) Fine-grained access control requires trusted workstation software (if a problem, fix by configuring as database server) The rAlpine File System June 1983 Xerox PARC File Servers (ca. 1981) IFS Alto (Bcpl) Alto OS volume format Pup FTP protocol Problems: No transactions, whole-file transfers only Performance mismatch with Dorado workstations XDFS (Juniper) Alto/Mesa Specialized volume format To support shadow pages, pair-redundant pages Specialized communication protocol To support page-level access Problems: Poor performance (BeginTransaction 500ms, WritePage 110ms) Frequent crashes, slow crash recovery (>1 hour for 600mbyte server) Very limited amount of random update under a single transaction (<64 pages for 5mbyte file) The rAlpine File System June 1983 Top-level Alpine requirements Support Cypress (replace XDFS) Support whole-file transfers (replace IFS) The rAlpine File System June 1983 Major design decisions Implement transactions using a log Use Cedar RPC (Andrew Birrell, Bruce Nelson) for communication The rAlpine File System June 1983 Shadow pages versus Logs Shadow pages: XDFS, Cambridge FS, Felix, Cerise, Lorie's FS (System R), ... Logs: IMS, System R (Surveys paper), ... The rAlpine File System June 1983 Example: Write 10 pages to existing file Shadow Page Processing: For each page, recoverably allocate new page write page data to buffer pool Before commit, force all buffered page writes to disk recoverably allocate and write shadow file map entries log "intend to replace file map with shadow" records log "commit" record, force log to disk After commit, perform "replace file map with shadow" actions (includes freeing old copies of updated pages) after updated pages reach disk, log "complete" record Log Processing: For each page, log "intend to write new value on page" update (volatile) log map Before commit, log "commit" record, force log to disk After commit, perform "write new value on page" actions (write page data to buffer pool) after updated pages reach disk, log "complete" record The rAlpine File System June 1983 Shadow pages versus Logs (continued) Advantages of Shadow pages Fewer I/O transfers Write data once Write updated file maps, but may get several data pages per file map page Support versions File map structure can be made more general The rAlpine File System June 1983 Shadow pages versus Logs (continued) Advantages of Logs Cheaper I/O transfers Log writes are sequential I/O bandwidth not limiting -- seek time is Less allocation overhead Log is preallocated, shadow pages are not Support backup Need two copies after commit Allows extent-based file structure File map is more compact, less I/O overhead Updates preserve file contiguity Reads are faster (many more reads than writes) Less work before commit Better response time Do work in background Can be layered on existing file implementation Avoid writing file map, allocation map, disk driver, formatter, scavenger, etc. Can improve performance by buying hardware Instead of changing algorithms The rAlpine File System June 1983 RPC versus streams, messages, ... Advantages of specialized protocols Reduce communication overhead by deferring or piggybacking acks, etc. Important for slow communication lines Provide parallelism Send message without waiting for response Advantages of standardized RPC Reduce communication overhead by optimizing standard case at all levels Clients want abstraction in terms of interfaces and procedures Obtain parallelism by forking lightweight processes Useful separation of concerns File system implementors concentrate on providing a clean file abstraction. To modify system during development, they change interface and recompile stubs RPC implementors write stub compiler, remote binding, authentication, encryption, packet transport, ... The rAlpine File System June 1983 More detail on requirements and design Included in Alpine Access control Disk space accounting Reliability, availability configure to survive any single hard failure recover from soft failure: < 5 minutes recover from hard failure: < 2 hours Configure as workstation file system Separate activites, closely related to Alpine Directory system (path name -> [volume ID, file ID]) Pup FTP access Archive system Longer-term follow ons Location- and replication-transparent file access Excluded Continuous availability Guaranteed real-time response The rAlpine File System June 1983 More detail on requirements and design (continued) File Runs of pages Accepts random/sequential access hint at open time Concurrent operations within a transaction High water mark optimization Backup Nondisruptive, driven from log Access control Per-file access lists Uses Grapevine registration database Transaction, Lock, Log System R style Two-phase commit, optimizations for readonly and local-only cases FilePageMgr Implements buffer pool Different buffer strategy for random/sequential files Accepts read-ahead hint, notices chances for write-behind Isolates rest of Alpine from underlying file system (to be tested soon) The rAlpine File System June 1983 Current status and plans Current Status Server in operation since March About 10 regular users Used primarily for electronic mail databases Simple directory system adequate for current use Real backup not implemented -- copy important files to IFS daily Size of implementation program modules: [47 + 18 stubs + 8 workstation support] interface modules: [66 + 6 stubs + 7 workstation support] Plans Support more users Implement "real" backup Implement "real" directory system, FTP protocol Replace one IFS server with an Alpine server this year Implement archive system next year The rAlpine File System June 1983 What Alpine learned about Cedar Garbage collection allows cleaner interfaces Property: TYPE = {byteLength, createTime, ... } PropertyValuePair: TYPE = RECORD [ SELECT property: Property FROM byteLength => [byteLength: ByteCount], createTime => [createTime: System.GreenwichMeanTime], ... ENDCASE]; ReadProperties: PROC [...] -> [properties: LIST OF PropertyValuePair] WriteProperties: PROC [... , properties: LIST OF PropertyValuePair] Garbage collection allows cleaner implementations You can say what you mean, without worrying about storage allocation or intermediate results (But not when you care a lot about performance) Garbage collection requires object finalization Volatile data structures, tied together with REFs High concurrency; new REFs being handed out all the time When do you throw an object away? The rAlpine File System June 1983 What Alpine learned about Cedar (continued) Lightweight processes are useful Each in-call is a new process Many background processes to manage resources Remote interfaces are nice, but a file is an object Clients want a handy way to pass around a file Package up "conversation", interface, open file The rAlpine File System June 1983 The rAlpine File System June 1983 ΚZ–"slides" style˜titlešœΟsœ˜Ihead˜IcenteršΟl"˜"Mšž ˜ L˜˜Ibody˜,N˜/N˜&N˜N˜ —I pageBreakšΡdis%˜%—˜,Lšœ  œ˜Lšœ˜1Lšœ˜-šœ˜&NšœΟkœ˜&Nšœ  œ!˜1Nšœ  œ-˜;Nšœ  œ)˜8Nšœ œ8˜L—˜"šœ!˜!N˜8—šœ>˜>N˜<——OšŸ%˜%—˜%˜/šœ;˜;N˜=—šœ@˜@N˜+—Nšœ@˜@—˜2šœ+˜+N˜w—šœ&˜&N˜5—šœA˜AN˜5——OšŸ%˜%—šœ ˜"˜Nšœ ˜ Nšœ˜Nšœ˜šœ ˜ Nšœ*˜*N˜-——˜Nšœ ˜ šœ˜N˜-—šœ"˜"N˜—šœ ˜ Nšœ:˜:N˜CN˜[——OšŸ%˜%—˜L˜L˜L˜L˜*OšŸ%˜%—˜L˜L˜"L˜L˜>L˜OšŸ%˜%—˜L˜KLšœΠesœ˜(OšŸ%˜%—šœ(˜(šΟb˜˜N˜<—˜N˜Ί—˜ N˜•——š’˜˜N˜A—˜N˜&—˜ N˜‚——OšŸ%˜%—šœ ˜$˜˜N˜N˜I—˜N˜+——OšŸ%˜%—šœ ˜$˜˜N˜N˜*—˜N˜)—˜N˜—˜"N˜+N˜ N˜.—˜N˜N˜—˜.N˜O—˜*N˜——OšŸ%˜%—˜!˜#˜EN˜&—˜N˜)——˜N˜G˜>N˜3—˜N˜›N˜g——OšŸ%˜%—˜&˜N˜N˜˜N˜x—N˜$—˜-N˜4N˜N˜—˜N˜1—˜N˜N˜—OšŸ%˜%—šœ' ˜2˜N˜ˆ—˜N˜—˜N˜:—˜N˜P—˜ N˜Ξ—OšŸ%˜%—˜˜N˜Υ˜N˜r——˜N˜΄—OšŸ%˜%—˜˜,Nšœ  œ!˜/Nš œ œ œ œ œ‚ œ˜ΠNšœ œ œ˜ENšœ œ œ˜C—˜1N˜\N˜/—˜/Nšœ- œ˜1Nšœ œ˜8N˜!—OšŸ%˜%—šœ  ˜+˜ N˜K—˜3N˜.N˜/—OšŸ%˜%—˜OšŸ%˜%——…—"V&Ά