The
rAlpine file system
Mark Brown, Karen Kolling, Ed Taft
XEROX PARC
Outline of talk
Motivation for top-level system requirements
Key design decisions to meet these requirements
More detail on requirements and design
Current status and future plans
(Using Cedar)
The rAlpine File System June 1983
Motivation for top-level system requirements
Cypress DBMS (SIGMOD '81):
Data Model Implementation (Entities, Rel'ships)
Access Methods (Records, B-Trees, Hashing)
File System
(Files, Pages)
BeginTransaction: PROC [] -> [TransID]
CreateFile: PROC [TransID, PageCount] -> [FileID]
ReadPage: PROC [TransID, FileID, PageNumber] -> [PageValue]
WritePage: PROC [TransID, FileID, PageNumber, PageValue]
EndTransaction: PROC [TransID, {commit, abort}] ->
[{committed, aborted}]
Transaction: a unit of consistency
concurrent access to shared files
transaction executes as if no other transactions present
soft failure (main store error)
hard failure (disk head crash)
committed transaction persists, aborted transaction vanishes
The rAlpine File System June 1983
Evaluation of file-level transactions
Potential advantages of file-level transactions
Make use of powerful workstation (processing, data caching)
configure three ways: all local, file server, database server
Easy to write new access methods (no locking or recovery issues)
support "nonstandard" database applications
Easy to write applications that mix database and raw file access
Potential disadvantages of file-level transactions
Loss of concurrency due to physical locking
(if a problem, fix by adding locking and recovery concerns to access methods -- underlying transactions may still help)
Volume of communication is not minimal
(if a problem, fix by configuring as database server)
Fine-grained access control requires trusted workstation software
(if a problem, fix by configuring as database server)
The rAlpine File System June 1983
Xerox PARC File Servers
(ca. 1981)
IFS
Alto (Bcpl)
Alto OS volume format
Pup FTP protocol
Problems:
No transactions, whole-file transfers only
Performance mismatch with Dorado workstations
XDFS (Juniper)
Alto/Mesa
Specialized volume format
To support shadow pages, pair-redundant pages
Specialized communication protocol
To support page-level access
Problems:
Poor performance (BeginTransaction 500ms, WritePage 110ms)
Frequent crashes, slow crash recovery (>1 hour for 600mbyte server)
Very limited amount of random update under a single transaction (<64 pages for 5mbyte file)
The rAlpine File System June 1983
Top-level Alpine requirements
Support Cypress (replace XDFS)
Support whole-file transfers (replace IFS)
The rAlpine File System June 1983
Major design decisions
Implement transactions using a log
Use Cedar RPC (Andrew Birrell, Bruce Nelson) for communication
The rAlpine File System June 1983
Shadow pages versus Logs
Shadow pages: XDFS, Cambridge FS, Felix, Cerise, Lorie's FS (System R), ...
Logs: IMS, System R (Surveys paper), ...
The rAlpine File System June 1983
Example: Write 10 pages to existing file
Shadow Page Processing:
For each page,
recoverably allocate new page
write page data to buffer pool
Before commit,
force all buffered page writes to disk
recoverably allocate and write shadow file map entries
log "intend to replace file map with shadow" records
log "commit" record, force log to disk
After commit,
perform "replace file map with shadow" actions
(includes freeing old copies of updated pages)
after updated pages reach disk, log "complete" record
Log Processing:
For each page,
log "intend to write new value on page"
update (volatile) log map
Before commit,
log "commit" record, force log to disk
After commit,
perform "write new value on page" actions
(write page data to buffer pool)
after updated pages reach disk, log "complete" record
The rAlpine File System June 1983
Shadow pages versus Logs
(continued)
Advantages of Shadow pages
Fewer I/O transfers
Write data once
Write updated file maps, but may get several data pages per file map page
Support versions
File map structure can be made more general
The rAlpine File System June 1983
Shadow pages versus Logs
(continued)
Advantages of Logs
Cheaper I/O transfers
Log writes are sequential
I/O bandwidth not limiting -- seek time is
Less allocation overhead
Log is preallocated, shadow pages are not
Support backup
Need two copies after commit
Allows extent-based file structure
File map is more compact, less I/O overhead
Updates preserve file contiguity
Reads are faster (many more reads than writes)
Less work before commit
Better response time
Do work in background
Can be layered on existing file implementation
Avoid writing file map, allocation map, disk driver, formatter, scavenger, etc.
Can improve performance by buying hardware
Instead of changing algorithms
The rAlpine File System June 1983
RPC versus streams, messages, ...
Advantages of specialized protocols
Reduce communication overhead by deferring or piggybacking acks, etc.
Important for slow communication lines
Provide parallelism
Send message without waiting for response
Advantages of standardized RPC
Reduce communication overhead by optimizing standard case at all levels
Clients want abstraction in terms of interfaces and procedures
Obtain parallelism by forking lightweight processes
Useful separation of concerns
File system implementors concentrate on providing a clean file abstraction. To modify system during development, they change interface and recompile stubs
RPC implementors write stub compiler, remote binding, authentication, encryption, packet transport, ...
The rAlpine File System June 1983
More detail on requirements and design
Included in Alpine
Access control
Disk space accounting
Reliability, availability
configure to survive any single hard failure
recover from soft failure: < 5 minutes
recover from hard failure: < 2 hours
Configure as workstation file system
Separate activites, closely related to Alpine
Directory system (path name -> [volume ID, file ID])
Pup FTP access
Archive system
Longer-term follow ons
Location- and replication-transparent file access
Excluded
Continuous availability
Guaranteed real-time response
The rAlpine File System June 1983
More detail on requirements and design
(continued)
File
Runs of pages
Accepts random/sequential access hint at open time
Concurrent operations within a transaction
High water mark optimization
Backup
Nondisruptive, driven from log
Access control
Per-file access lists
Uses Grapevine registration database
Transaction, Lock, Log
System R style
Two-phase commit, optimizations for readonly and local-only cases
FilePageMgr
Implements buffer pool
Different buffer strategy for random/sequential files
Accepts read-ahead hint, notices chances for write-behind
Isolates rest of Alpine from underlying file system (to be tested soon)
The rAlpine File System June 1983
Current status and plans
Current Status
Server in operation since March
About 10 regular users
Used primarily for electronic mail databases
Simple directory system adequate for current use
Real backup not implemented -- copy important files to IFS daily
Size of implementation
program modules: [47 + 18 stubs + 8 workstation support]
interface modules: [66 + 6 stubs + 7 workstation support]
Plans
Support more users
Implement "real" backup
Implement "real" directory system, FTP protocol
Replace one IFS server with an Alpine server this year
Implement archive system next year
The rAlpine File System June 1983
What Alpine learned about Cedar
Garbage collection allows cleaner interfaces
Property: TYPE = {byteLength, createTime, ... }
PropertyValuePair: TYPE = RECORD [
SELECT property: Property FROM
byteLength => [byteLength: ByteCount],
createTime => [createTime: System.GreenwichMeanTime],
...
ENDCASE];
ReadProperties: PROC [...] -> [properties: LIST OF PropertyValuePair]
WriteProperties: PROC [... , properties: LIST OF PropertyValuePair]
Garbage collection allows cleaner implementations
You can say what you mean, without worrying about storage allocation or intermediate results
(But not when you care a lot about performance)
Garbage collection requires object finalization
Volatile data structures, tied together with REFs
High concurrency; new REFs being handed out all the time
When do you throw an object away?
The rAlpine File System June 1983
What Alpine learned about Cedar
(continued)
Lightweight processes are useful
Each in-call is a new process
Many background processes to manage resources
Remote interfaces are nice, but a file is an object
Clients want a handy way to pass around a file
Package up "conversation", interface, open file
The rAlpine File System June 1983
The rAlpine File System June 1983