*start* 02295 00024 US Date: 28 Jan. 1981 4:35 pm PST (Wednesday) From: MBrown.PA Subject: file server meeting, Thursday 29 January, 1pm To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown We'll meet in the ISL conference room (to the left as you pass the ISL coffee area). I want to use this message to indicate some topics that it might be useful to discuss at this meeting and its successors. Other suggestions are welcome. 1) A name for the system. The name "Sierra" was suggested, but not used, as the name for the Cedar language. I think it would be a nice name for a file server. It has the advantage that individual servers machines could be named by Sierra peaks, lakes, rivers, or what-have-you. 2) The division of responsibility between the system and the Cedar universal file system. For instance, one point of view says that a file server is not concerned with problems of locating files; to access a file you must communicate with the right machine and specify the right volume, as well as supply the right unique ID. In this view, file location is handled via a higher-level mechanism, perhaps some combination of location hints (Gifford's file references) and a database to handle exceptions. I favor this view. Providing a directory system for mapping string filenames to unique IDs leads to a responsibility question with a similar flavor. The division of responsibility issue will inevitably lead to a specification of the file server's interface, which is a primary goal of these meetings. Offhand it seems that the three capabilities the file server must offer to the UFS are (1) given volume and file ID, transfer whole file, (2) given volume and file ID and page number, transfer page, (3) provide transactions and a technique for maintaining remote caches of files or pages located on the file server. There are also other, less UFS-specific requirements such as backup, archiving, etc. 3) The internal organization of the system. Discussion of this should probably be deferred until the external interface has been examined. A general design goal that seems quite important is the ability to build a server with special properties, e.g. one that also includes a directory system or a database system or whatever. --mark *start* 00230 00024 US Date: 29 Jan. 1981 3:07 pm PST (Thursday) From: Kolling.PA Subject: Butler's paper To: MBrown Reply-To: Kolling cc: Kolling Do you have a pointer to the latest copy of Butler's and Howard's paper? Karen *start* 00802 00024 US Date: 29 Jan. 1981 3:54 pm PST (Thursday) From: MBrown.PA Subject: Re: Butler's paper In-reply-to: Your message of 29 Jan. 1981 3:07 pm PST (Thursday) To: Kolling cc: MBrown no, as far as I know there is no paper that explains the way the current algorithm works. the root-leaf idea is explained in the may 29, 1979 version (i think a copy of this is on butler's directory, or you can look at mine), but the notion of "compatible actions" in that paper is completely wrong. i have been working on mesa-like code for the algorithm; this is located on [ivy]CRSSCode.bravo. If you like, you can have a go at this; I will be glad to spend some time discussing it with you after you've looked at it. My own ideas on the algorithm are not as sharp as I'd like! --mark *start* 00548 00024 US Date: 11 Feb. 1981 4:08 pm PST (Wednesday) From: MBrown.PA Subject: sierra meeting, Thursday 12 February, 2:30pm, ISL conference room To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown (This meeting will actually be held whenever Popek's seminar is finished). Proposed agenda: (1) discussion of questions raised in the minutes of our last meeting, (2) (if time remains) I will try to present my current understanding of the algorithm for coordinating distributed transaction commit. --mark *start* 00461 00024 US Date: 16 Feb. 1981 5:27 pm PST (Monday) From: MBrown.PA Subject: memo on proposed BFS interface To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown [Ivy]Doc>BFSDesign.bravo, .press is a proposal for the BFS interface. The design is incomplete in many ways, but the multiple file system transaction stuff is worked out in considerable detail. This might be a focus for Thursday's meeting. --mark *start* 00376 00024 US Date: 17 Feb. 1981 9:14 am PST (Tuesday) From: Levin.PA Subject: A bas "BFS" To: Boggs, Kolling, Taft, MBrown, Birrell, Schroeder cc: Levin Before things get much further advanced, can we find a different name than BFS? I dislike acronyms, and this one is already attached to a distinct but similar entity. That's too many black marks for me. Roy *start* 00361 00024 US Date: 17 Feb. 1981 9:32 am PST (Tuesday) From: MBrown.PA Subject: Re: A bas "BFS" In-reply-to: Levin's message of 17 Feb. 1981 9:14 am PST (Tuesday) To: Levin cc: Boggs, Kolling, Taft, MBrown, Birrell, Schroeder I actually thought it was nice to reuse the same name. But I can appreciate the danger. How about "FileStore"? --mark *start* 00363 00024 US Date: 19 Feb. 1981 9:43 am PST (Thursday) From: MBrown.PA Subject: sierra meeting, today, 1:00pm, ISL conference room To: Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown I will be prepared to discuss the transaction facilities of the FileStore (formerly BFS) interface, if that turns out to be appropriate. --mark *start* 00961 00024 US Date: 19 Feb. 1981 9:07 pm PST (Thursday) From: MBrown.PA Subject: Version 1 of FileStore To: Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown [ivy]Doc>FileStoreDesign1.bravo is a revised interface. It should be somewhat clearer: all of the handles are gone, all procedures take unique ids as parameters. I have put copies of this memo in mailboxes. I am working on an implementation of the transaction stuff; my current version is [ivy]Doc>FileStoreTransCode1.bravo. It begins with a 3-page introduction to logging, recoverable transaction states, and the recovery process. Its quality is uneven, but the memo may be of some help in puzzling out the interface. We concluded at today's meeting that increased understanding of RPC will be required sooner than we thought, since it may impact the design of FileStore. Our next meeting is scheduled for 1:30 Monday at the usual place. --mark *start* 00502 00024 US Date: 24 Feb. 1981 9:50 pm PST (Tuesday) From: MBrown.PA Subject: Lock manager "concepts" document To: Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown [ivy]Doc>LockConcepts0.bravo is a document containing a distillation of a number of papers that I have read on the topic of locking. I have put copies of this memo in mailboxes. The memo promises a LockDesign0.bravo, containing a proposed interface; this may not show up until Friday. --mark *start* 00409 00024 US Date: 25 Feb. 1981 3:23 pm PST (Wednesday) From: MBrown.PA Subject: Alpine To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown Gloria Warner has approved the name change from Sierra to Alpine. I will distribute hardcopies of the draft lock manager interface later today. As soon as an directory is established it will be stored there. --mark *start* 00525 00024 US Date: 12 March 1981 11:31 am PST (Thursday) From: MBrown.PA Subject: New version of FileStoreDesign To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown I have distributed hardcopies of FileStoreDesign2.bravo. This version makes a few minor improvements in the Transaction section, but mainly is intended to flesh out the Files section (though this is still far from complete.) I am working on another memo, on buffering, that with luck will also appear today. --mark *start* 01094 00024 US Date: 16 March 1981 9:17 am PST (Monday) From: MBrown.PA Subject: Alpine meeting today, 1:30pm, CSL alcove. To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown Last week we had a preliminary discussion of RPC issues outside of the remote-binding area. We still need to answer questions like: (1) what is inside the box labeled "RPCImpl"? Do we expect to use stubs for most parameter and result munging in remote procedures? (2) how does a caller who expects a large, unstructured result (say a page) get access to the PupBuffer that holds it? (3) what are the restrictions on the size of argument and result records? must they fit into a single gateway-sized packet? If so, how to handle primitives that transfer runs of pages? (4) what is the relationship between RPC and a fast file-transfer protocol? We also discussed the topic of buffering, especially relating to fast file transfers. Since that meeting, two memos have appeared, dealing with internal file server buffering and with the "files" section of FileStore. --mark *start* 02662 00024 US Date: 17 March 1981 8:45 am PST (Tuesday) From: MBrown.PA Subject: McJones on the performance of disk commands on Dolphin To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder cc: MBrown --------------------------- Date: 16 March 1981 4:24 pm PST (Monday) From: MBrown.PA Subject: performance of disk commands on Dolphin To: McJones Reply-To: MBrown cc: MBrown We had a minor controversy in today's Alpine meeting about the motivation for runs of pages at the disk face (the motivation at higher levels is pretty clear.) I claimed that if transfers to consecutive sectors are expressed as separate disk commands on the Dolphin, a revolution of the disk is lost between the two transfers. Is this true? Or is the problem in the software overhead of setting up the second disk command, the storage for the disk command, or whatever? --mark ------------------------------------------------------------ Date: 16 March 1981 6:38 pm PST (Monday) From: McJones Subject: Re: performance of disk commands on Dolphin In-reply-to: Your message of 16 March 1981 4:24 pm PST (Monday) To: MBrown cc: McJones The problem is keeping up with the disk. There are something like 700 microseconds between sectors, and with the disk task taking up many of the memory cycles we found that with double buffering even stripped down programs couldn't compute the next IOCB fast enough to keep up with the disk. We could have computed a whole track's worth ahead of time, but that would have taken a lot of space, and by the way a fair amount of computing time. (Of course this computing time would often occur during a seek, and we might or might not have another user process to run at the same time...) Since the necessary computing is simple and regular, it turns out to be fairly straightforward to do in microcode. /Paul ------------------------------------------------------------ I take this to say that the microcode CAN interpret a new command in the inter-sector time, but the software CANNOT construct a new command in a sector time. Thus lots of short commands are definitely a loss, but it is ok to issue two medium-sized commands for consecutive runs and let the disk controller run them one after the other. This eases the contiguity requirements for disk buffers somewhat. We will eventually need more details on this point, in order to estimate whether the standard run size should be two pages, four pages, or whatever. I am a little surprised at the comment about the disk's interference with the emulator; I thought that the Dolphin had plenty of memory bandwidth. I wonder what impact the display has on this. --mark *start* 00653 00024 US Date: 18 March 1981 5:13 pm PST (Wednesday) From: MBrown.PA Subject: Outline of a posible FileStore implementation To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown I have distributed hardcopies of a memo entitled "Outline of a posible FileStore implementation." This may serve to clarify some of the constraints that I (unconsciously) placed on the FSBuffer interface design. Roy suggested that it may be helpful for me to give an oral presentation from this outline on Monday; it might then be useful to attempt to work out alternative design proposals to a similar degree of detail. --mark *start* 00416 00024 US Date: 23 March 1981 10:35 am PST (Monday) From: MBrown.PA Subject: Alpine meeting today, 1:30pm, CSL alcove. To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown I can talk about the "Outline of a possible FileStore implementation." We might also revive the discussion of RPC issues left over from two weeks ago (as outlined in my message of 16 March.) --mark *start* 00626 00024 US Date: 30 March 1981 8:51 am PST (Monday) From: MBrown.PA Subject: Alpine meeting today, 1:30 (or after picnic), CSL alcove. To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown Andrew and I were otherwise occupied last week, and no progress has been made on the universal file system (cache manager) interface. We might continue our discussion of RPC parameter functionality issues, such as the interface from stub procedures to RPC pup buffers (for large unstructured parameters or results). We might also discuss where to put in the provisions for encryption. --mark *start* 00739 00024 US Date: 31 March 1981 4:17 pm PST (Tuesday) From: MBrown.PA Subject: DBMS recovery memo To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder Reply-To: MBrown cc: MBrown I have distributed hardcopies of a draft memo entitled "Recovery in database management systems." It explores the possible relationships between a file system that provides transactions on files and a database system that provides transactions on tuples (or whatever.) Naturally I have a particular interest in seeing that Alpine is capable of participating as a useful partner in such a relationship. The memo is somewhat incomplete (I use < ... > to delimit unfinished sections) but I think that the essential points come across ok. --mark *start* 00302 00024 US Date: 6 April 1981 10:01 am PST (Monday) From: MBrown.PA Subject: Alpine meeting today, 1:30, CSL alcove. To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder cc: MBrown We can discuss the memo "Recovery in database management systems" that was distributed last week. --mark *start* 00334 00024 US Date: 20 April 1981 8:46 am PST (Monday) From: MBrown.PA Subject: Alpine meeting today, 1:30, CSL alcove. To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder cc: MBrown Maybe we'll discuss "file properties" such as creation date. I would prefer to start on time since I must be at Stanford at 3pm. --mark *start* 00830 00024 US Date: 27 April 1981 10:51 am PDT (Monday) From: MBrown.PA Subject: Alpine meeting today, 1:30, CSL alcove. To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder cc: MBrown As an outgrowth of last week's discussion of file properties, Ed Taft agreed to describe how archive and backup work on our present file systems. (These processes seemed to be the only ones that cared about dates other than create.) This might lead to a discussion of where archiving fits into the Alpine design. Next Monday is in May, hence we should be starting to work on Alpine again in earnest. It is time for those who want to make specific design proposals to take responsibility for writing something down, and then do it. This meeting seems like a good time to see who is going to do what, and roughly by when. --mark *start* 00976 00024 US Date: 8 May 1981 6:44 pm PDT (Friday) From: kolling.PA Subject: question To: MBrown cc: Kolling There's a paragraph in your dbms memo that says: "If all updates to an object are undo-logged, and log record IDs are used as update IDs, then the combination of object plus log has a very simple structure: the object points to the most recent update, which points to the previous (committed) update, etc. The entire history of the object can be found by traversing the log in this way. Because of locking, only the most recent update is possibly uncommitted and hence candidate for undoing. Undo pops the top element from the stack of log records." Actually, if the owner of the lock has been doing multiple writes to the same object, the "previous" update of interest is not necessarily committed, yes? You have to search back to find the most recently committed one, I think.....or do I not understand when things get written into this log..... *start* 00338 00024 US Date: 9 May 1981 3:49 pm PDT (Saturday) From: MBrown.PA Subject: Re: question In-reply-to: Your message of 8 May 1981 6:44 pm PDT (Friday) To: kolling cc: MBrown yes, you are right. but it would not be too hard to do it the other way, by having some (volatile) record of the previous committed update. --mark *start* 00385 00024 US Date: 11 May 1981 9:11 am PDT (Monday) From: MBrown.PA Subject: No Alpine meeting today To: Boggs, Kolling, Taft, Birrell, Levin, Schroeder cc: MBrown The subgroup that is to work on file system (as opposed to RPC) issues in depth has not made any progress yet. If we make significant progress before Friday, we'll try to call a meeting for Friday. --mark *start* 00386 00024 US Date: 15 May 1981 11:52 am PDT (Friday) From: Kolling.PA Subject: name To: mbrown cc: Kolling How about calling Get/SetLength Get/SetByteLength? Otherwise I will be forever confusing them with the (different functioning) Get/SetLengths from Juniper. Or maybe Get/SetSize should be called something that makes it easier to not confuse with Get/SetLength..... *start* 00977 00024 US Date: 15 May 1981 12:02 pm PDT (Friday) From: Kolling.PA Subject: name To: mbrown cc: Kolling I ran across your memo "Pilot development that would benefit alpine" when browsing thru alpine>doc. About the error logging: (1) I discussed this with Ed Taft et al at length when I upgraded Juniper's error handling. The belief is that soft errors are due to transient glitches, dust, etc., and that they aren't an indication that the page is any more likely to go bad than any other page. (2) However, a high frequency of soft errors on many pages indicates is that the drive and pack are not aligned; it is important to catch that and have maintenance adjust the drive. Misalignment can even cause what look like hard errors, but which aren't really. (3) Generally speaking the number of hard bad pages on a T300 is no more than 3 or so ever, barring a major catastrophe of some sort. Pages do not in general become hard bad during system use. *start* 00646 00024 US Date: 15 May 1981 12:10 pm PDT (Friday) From: Kolling.PA Subject: more info To: mbrown cc: Kolling What Juniper does is log the hard and soft errors. This is most obvious to Ron Weaver during the every other day Copydisk, as that sweeps each whole pack, and he watches for it. Juniper will replace a page that goes hard bad with a new page. However, it deliberately takes the system down if it has seen more than ten new hard bad pages since it came up, on the theory that maintenance should be called. This count can be overidden if necessary. There is also a similar limit on the total number of hard bad pages. *start* 00363 00024 US Date: 15 May 1981 12:21 pm PDT (Friday) From: MBrown.PA Subject: Re: name In-reply-to: Your message of 15 May 1981 11:52 am PDT (Friday) To: Kolling cc: mbrown well, size is standard usage from pilot, so i'd rather not change it. but saying "byte" as part of "setlength" seems fine. watch for it in FileStoreDesign3.bravo ... --mark *start* 00501 00024 US Date: 19 May 1981 6:04 pm PDT (Tuesday) From: kolling.PA Subject: ROUGH draft To: taft cc: mbrown, kolling Ed, there is an EXTREMELY ROUGH draft of what we talked about on [Juniper]Alpine>FilePageMgr.bravo. It is not fit to be taken seriously yet, but since I will be on vacation until Friday, I thought I would give you a pointer to it in case you wanted it. Feel free to either red pencil all over it or just ignore it and I will work on it again Friday. Karen *start* 01608 00024 US Date: 24 May 1981 7:39 pm PDT (Sunday) From: MBrown.PA Subject: FilePageMgr To: Kolling, Taft cc: MBrown I just realized a basic problem with using mapped files to implement FilePageMgr. The problem is that if files are mapped in "chunks" of 4 pages (say), a 1 page write at the FilePageMgr interface must result in a 4 page write to the underlying file. But this write is not guaranteed to be atomic: it may fail part way through. If it fails while writing the page that the client actually wrote, then no problem: the log contains the necessary information. But if it fails while writing a page that is being written redundantly, then the only recovery is from a backup copy of the file. This is not acceptable. Of course, the contents of the entire chunk could be logged. This seems like a very poor idea, both due to the volume of wasted logging and due to the lack of separation between the log manager and the file page manager (log manager now needs to be concerned with chunk alignment when writing, needs to predict when the first page of a chunk is written whether the rest will be, etc.) I conclude that we must restrict ourselves to single page swap units in the implementation of FilePageMgr, if mapped files are used. Since reading/writing runs is important for good performance, this is not a long-term solution, and some notion of "chunk" is probably still necessary in the interface. In the long run we'll have to find some way of getting Pilot to be more clever about doing only the I/O that we request, and still optimizing runs when possible. --mark *start* 01349 00024 US Date: 26 May 1981 7:21 pm PDT (Tuesday) From: Taft.PA Subject: Pilot implementation notes To: MBrown cc: Kolling, Taft Your notes are consistent with everything I know (which isn't much). I have the following amplifications and additions: 1. Yes, SimpleSpace.AllocateVM is used only during initialization, to set up the permanent spaces (Space.virtualMemory, Space.mds, etc.) The "real" VM allocator is buried inside Space.Crate (in SpaceImplA). 2. Pinning a region definitely prevents it from being written out. Note that the unit of pinning is an entire region, not just a swap unit. (The semantics of this are rather curious. Calling Apply with operation=pin swaps in and pins a single specified swap unit, and also pins any other swap units in the region that happen already to be in, but does nothing to swap units that are out. However, SpecialSpace.MakeResident operates on an entire space and performs the pin operation on all swap units in the space, so the extra precision available from Apply is apparently inaccessible.) 3. If any page of a swap unit is dirty, the entire swap unit is written out. 4. A space that has swap units is never dead! I interpret this to mean that Space.Kill is a no-op when applied to a space with swap units. (See the comments at the beginning of CachedRegion.mesa.) Ed *start* 01411 00024 US Date: 27 May 1981 10:09 am PDT (Wednesday) From: Taft.PA Subject: Pilot implementation notes To: MBrown cc: Kolling, Taft The funny staggering of swap units you suggested for dealing with the existence of leader pages is not necessary. It's perfectly ok to map a one-page space into page zero of a file, and another (larger) space starting at page one. Suppose the second space has uniform swap units of 4; then the swap units will map to pages 1-4, 5-8, etc. (That is, there are no alignment restrictions.) The restriction on changing the length of a mapped file is real: there is simply no communication between the FileMgr and the VMMgr for maintaining consistency in this case. But this is not as onerous as it might seem, at least for extending a file. The restriction is that there may not be a mapped space extending beyond the old or new end of file. If we always create spaces of the right size, so as not to extend beyond end of file, then we are not violating this restriction. (After extending a file, we can of course create new space(s) to map onto the extension.) Shortening a file is somewhat more problematical. But I think we can simply outlaw the illegal case. That is, shortening a file causes all spaces mapped onto (or overlapping) that part of the file to be deleted; and if there are any outstanding references to those spaces, an error results. Ed *start* 01901 00024 US Date: 27 May 1981 2:31 pm PDT (Wednesday) From: Taft.PA Subject: Pilot disk scheduler To: MBrown, Kolling --------------------------- Date: 27 May 1981 1:01 pm PDT (Wednesday) From: Taft.PA Subject: Disk scheduler removal To: Luniewski cc: Taft I see that the disk scheduler was removed from DiskDriverSharedImpl and replaced by a FIFO regime; your name is attached to this change. Was the scheduler removed because it didn't work, or was it simply to save resident memory on the Dandelion? (I'm working on the design of a high-performance Pilot-based file server, and I'm very concerned that absence of a disk scheduler will ruin performance.) Ed --------------------------- Date: 27 May 1981 1:49 pm PDT (Wednesday) From: Luniewski.PA Subject: Re: Disk scheduler removal In-reply-to: Your message of 27 May 1981 1:01 pm PDT (Wednesday) To: Taft cc: Luniewski The old elevator algorithm was removed for two reasons. First, removing it saved a noticable amount of memory which was very important for meeting our Rubicon Pilot working set goals. Second, Bob Ladner's measurements had indicated that the algorithm probably was not buying us anything - the speed of the disk and the compute time involved in handling a page fault were such that there was little idle time while a page was transferring. That is, if a client process ran while a disk transfer was in progress, Pilot would not complete the computation involved in handling the page fault until after the first transfer was complete. Thus, there tended to be only one disk request outstanding at any one time. Whether this argument applies on the Dorado, which I presume your file server will run on, is an open question as far as I know. (To the best of my knowledge, the code that was there did work before I commented it out.) /Allen ------------------------------------------------------------ *start* 00757 00024 US Date: 28 May 1981 12:15 pm PDT (Thursday) From: Kolling.PA Subject: comments on memo To: Kolling Theses are some notes I gave to mark: Do we want to mention the high water mark in the implementation section? About the pin/unpin calls, if the whole chunk is pinned/unpinned, is there any guarantee that the writes of the pages in that chunk happen in the correct order after unpinning? What about the size of chunks (i.e., fixed or variable), is that decided or punted for now? It should perhaps be mentioned as an issue. About the one page USU causing demand paging to happen one page at a time: are both ReadAhead and ReadPages going to do an Activate? So demand paging will only happen if Pilot has swapped some pages out? *start* 00731 00024 US Date: 11 June 1981 5:41 pm PDT (Thursday) From: kolling.PA Subject: two things To: MBrown cc: Kolling I've forgotten how locks work if the same transaction requests more than one lock on the same object. Will a write lock block a read lock? Will it block other write locks? Is there some other type of lock (exclusive?) that will block all other requests, including those from the same transaction? Otherwise I have to do a few special case checks to prevent a given transaction from making life difficult (allocation tries after remove, etc.) This is not hard, I just want to know if I should note it in the memo. Don't forget: your section of the Activity Report is due in about 5 days. Karen *start* 01046 00024 US Date: 12 June 1981 9:06 am PDT (Friday) From: MBrown.PA Subject: Re: two things In-reply-to: Your message of 11 June 1981 5:41 pm PDT (Thursday) To: kolling cc: MBrown If t requests a Read lock on e, is granted, then requests a Write lock on e, the Read lock is converted to a Write lock. If t then requests a Write lock on e, it will be granted immediately (since it already has it.) If this isn't what you want, you'll have to special-case. As you are discovering, the lock manager is not the solution to all synchronization problems; it is not unusual for clients of the lock manager to record the locks they hold in their own data structures, too, both to handle the sort of situation you just mentioned and to avoid lots of re-requests. E.g. on reading the first page of a file, we'll probably have to get some sort of lock on the whole file; thereafter we can just check in some data structure associated with the file to make sure we have a strong enough lock for the page access we are about to do. --mark *start* 00481 00024 US Date: 30 June 1981 2:58 pm PDT (Tuesday) From: MBrown.PA Subject: Alpine / RPC meeting To: Birrell, Boggs, Levin, Kolling, Nelson, Schroeder, Taft cc: MBrown Lets have a meeting tomorrow, after dealer, to share status on RPC and Alpine. I will sketch the Alpine design choices that Karen, Ed, and I have made since we last held a large-group meeting. Andrew/David/Bruce will describe their current thinking on RPC aspiration level and design. --mark *start* 05956 00024 US Date: 30 June 1981 4:10 pm PDT (Tuesday) From: Nelson.PA Subject: RPC advance info To: Taft, Kolling cc: Birrell Ed and Karen, To speed up the meeting tomorrow, I suggest you read through the following memo on how we propose to handle parameters. Mark has suggested that Ed, in particular, may have some objections. If he feels up to it, Andrew will also send you two a companion memo on how a typical remote call is handled. This should save even more hand waving tomorrow. Bruce -- Messages from file: [PARC-MAXC]RPC.;2 -- TUESDAY, JUNE 30, 1981 16:09:12-PDT -- Date: 29 June 1981 4:08 pm PDT (Monday) Sender: Nelson.PA From: Nelson, Birrell.pa Subject: Parameter Passing in Cedar RPC To: Satterthwaite, Morris, Horning, Rovner, Levin, Mitchell, MBrown cc: Nelson, Birrell, Boggs This memo describes a language change designed to allow Cedar RPC to support reasonably flexible parameter passing. The details of the modification may, of course, change when reviewed by the Cedar language committee. This memo is circulated now because, lacking a better idea, the absence of the modification will be a bottleneck in 4-6 weeks. Fairly prompt action on this proposal is necessary. We have been designing a remote procedure call mechanism for Cedar. Our emphasis is not on parameter functionality, but even the bare minimum we find acceptable requires a slight change to Mesa. The area that causes problems is when a client wants to pass large amounts of data. The typical client we're thinking about is Alpine, passing the contents of runs of disk pages. The two procedures that are interesting to consider are "ReadPages" and "WritePages". Both deal in runs of pages. Mark (reasonably) wants to have the data delivered by ReadPages end up in storage that Alpine provided (so that he can arrange contiguity, etc). That, and consideration of the efficiency of large argument/result records, makes it unreasonable for his data to be passed by value in the "obvious" way, such as having the declaration: ReadPages: PROC[file: FileID, startPage: PageID] RETURNS[buffer: DESCRIPTOR FOR ARRAY OF WORD] If that was the only way he could pass the data, Mark would edit the automatically generated stubs to do something different. The same is true if we allowed him to return a pointer/reference (and we allocated collectable storage int the user stub). We believe that it's not reasonable to build a large system if you need to hand-edit the stubs, and in any case it seems unreasonable that our first client should find the RPC facilities inadequate. In a local interface, the "natural" way for ReadPages to be declared would be: ReadPages: PROC [file: File, startPage: PageID, buffer: DESCRIPTOR FOR ARRAY OF WORD] In the RPC world, we could implement the effect wanted by allocating storage for "buffer" in the server's RPC stub, calling the server's ReadPages procedure, then copying the data back into the client machine's "buffer" storage. Analogously in the WritePages case, we'd copy the data from the user's storage into the server at call time. VAR parameters have some of the properties we'd want, but they would force us to copy at call time and at return time, whether or not it's needed. We can achieve the control we want by having qualifiers "VALUE" and "RESULT" for address-containing argument types. (We might need different keywords, though.) For example, in ReadPages, the buffer might be declared buffer: RESULT DESCRIPTOR FOR ARRAY OF WORD; for WritePages, it would be buffer: VALUE DESCRIPTOR FOR ARRAY OF WORD. For ReadPages, the RESULT specification forces call-by-result (as in AlgolW). Thus the client stub sends only the length of buffer as an argument, NOT the array itself. The server stub allocates this much storage, calls the real ReadPages to fill it, and sends back the entire buffer contents as a result. The client stub copies this result buffer into the client caller's buffer. Thus RESULT means do very little on the call, but copy back on the return. For WritePages, the VALUE specification forces call-by-value of the array as well as the descriptor. The client stub sends the entire buffer and the server stub allocates space for it and receives the entire buffer. This buffer is passed directy to the real WritePages. There are no results in this case. If a parameter says "VALUE RESULT", we copy the data in both directions. The fourth combination, when we copy the data in neither direction, corresponds to the use of the address as a "handle" (which is outside the safe language, but still useful). You can also achieve the effect of "handles" by using exported types, which is perfectly safe and doesn't cause RPC any problems. We need to consider the meaning when we have neither VALUE nor RESULT. We propose that that case means "VALUE RESULT". The argument for this default is mainly because that's in the safe language, whereas the "handle" semantics are unsafe. The exact details of syntax do not concern us. What we need, as RPC implementors, is a symbol table indication, available through RTTypes, of whether a parameter record component is , , or both, or neither. Right now, we are quite willing to restrict this info to the top level, as in AlgolW. We intend that VALUE and RESULT can be used to qualify any address-containing argument, including REFs, (LONG) POINTERs, (LONG) DESCRIPTORs, and (LONG) STRINGs. We imagine that the compiler would ignore VALUE and RESULT after pass three, at least for the time being. They would serve only as pragmats for RPC. Your comments and suggestions are welcomed. Please bear in mind that Cedar RPC is a modest project with emphasis on performance, rather than functionality, goals. The very pragmatic decision to use stubs is our opening declaration to this effect. No realistic comments will go unconsidered. Bruce and Andrew *start* 08146 00024 US Date: 1 July 1981 11:19 am PDT (Wednesday) From: Birrell.pa Subject: Background reading To: Levin, Kolling, Schroeder, Taft, MBrown cc: Nelson, Boggs This is a highly informal description of how we think simple RPC's go. It might help if you read it before tomorrow. Andrew --------------------------- Date: 29 June 1981 2:03 pm PDT (Monday) From: Birrell.pa Subject: Simple RPC's To: Boggs, Nelson cc: , Birrell This is intended to describe our present beliefs about how we should implement the fast cases of RPC. I think I haven't introduced anything we haven't discussed, but I might have. It's a long message. The simplest and fastest special case of RPC is where the call and result will each fit into a single packet, and where the user machine is on the same net as the server machine. I believe that everyone is now convinced that the correct way to build RPC is with "stubs". We believe the stubs should be automatically generated from the DEF's file, producing Mesa source which is compiled into a user-stub BCD and a server-stub BCD. The stubs are responsible primarily for packing and unpacking arguments and results into packet buffers. Separate software, the RPC-runtime package, takes these packets and implements the network protocol. The network protocol provides once-only call semantics in the absence of machine crashes. The sequence of events in the normal case of the simple case is as follows. The user calls a procedure imported from the appropriate interface. The interface was exported by the user-stub module, so this calls the appropriate procedure in the stub. The user-stub procedure allocates a packet buffer (in its local frame), and marshals the arguments into it. At bind-time, the user-stub obtained the network address and MDS base of the corresponding server-stub and an "index" hint (used by the server to validate procedure descriptors); [network-address, MDS, procedure descriptor, index] represents the remote procedure we want to call, and the user-stub places that in the packet. The user-stub then calls the RPC-User-runtime to transmit the packet. The runtime allocates a new serial number and places it in the packet, and places the PSB of the current process in the packet, and encapsulates the packet as an ethernet PUP packet, with PUP checksum. This procedure of the runtime has a condition variable in its local frame, which it registers in an array of condition variables (indexed by PSB) known to the microcode (see the handling of results). It then passes the packet to the microcode for transmission. This process then sits in the RPC-User-runtime, WAITing on that condition variable, expecting the result packet. If the result doesn't come, it retransmits the packet (marked to say that an ACK is wanted); eventually the call can be timed out. If an ACK comes (with correct serial number, of course), this process moves to a state where it's waiting on the condition variable expecting the result, but not retransmitting (still with possible timeout). When an RPC-call packet comes in to a server machine, the microcode recognizes it and handles it specially. In each MDS, there is a fixed conditions variable, "rpcCond", known to the microcode. There is a stock of RPC-server processes, whose idle state is waiting on rpcCond. The microcode places the packet in a known place and notifies rpcCond (in the appropriate MDS). One process waiting on rpcCond wakes up and takes the packet and checks the checksum. It takes the source host address and serial number, and looks them in a table. This table contains the last serial number received from that [host,PSB], so the RPC-server process can eliminate duplicate packets. If the packet requests an Ack, it sends one. (The packet may also ack an earlier result packet - see later). The RPC-server process takes the [procedure descriptor, index] from the packet and indexes a table to verify that this procedure descriptor is reasonable to be called from other machines. (Otherwise we never have any hope of offering secure servers.) Finally the RPC-server process calls the given procedure descriptor, giving it the packet. This call get us into the appropriate server-stub procedure. This is in the server-stub module, and imports the appropriate server-procedure. The server-stub unmarshals the parameters (into a state vector, probably, doing the right things with long argument records), frees the packet, then calls the server-procedure. The server procedure does its thing, then returns to the server-stub. The server-stub allocates a result packet buffer (in its local frame) and marshals the results into it. It then transfers to its caller (the RPC-server idle-loop procedure) without freeing its own frame (using TRANSFER-WITH-STATE). The RPC-server takes the packet, puts the caller's serial number and PSB into it, encapsulates it, and gets the microcode to transmit it. Before doing so, it registers a local condition variable in a "Waiting-for-ack" array, together with the calling [host,PSB]. After sending the result packet, the RPC-server process WAITs on its condition variable, hoping for an acknowledgement. After a timeout, the RPC-server retransmits the result (marked to say that an ACK is wanted). An acknowledgemnt can happen in two ways: after retransmission of the result, the caller may send an explicit ACK; alternatively, the caller may start a new call, which counts as an implicit ack. To notice these, the microcode treats a result-ACK the same as a call packet: if there is a process waiting for an ack from the originating [host,PSB], the microcode notifies that process instead of the idle ones on rpcCond. When the rpc-server process receives its acknowledgement, it frees the server-stub's frame (and therefore the packet buffer), de-registers its condition variable, then goes idle by waiting on rpcCond (but if the ack was implicit by an incoming call packet, it takes that packet and works on the new call). When the result packet comes in to the caller's machine, the microcode looks at the array of waiting caller PSB's. If one of them is waiting for this result (as indicated by the [PSB] in the result packet, it notifies that process. If no one is waiting, it notifies a fixed process (this happens if result packets get duplicated or retransmitted, or if result acks get lost). If the fixed process gets a packet, it sends an ack if requested, then frees the packet. When the caller process (which was WAITing in rpc-user-runtime) gets a packet, it checks the checksum, sends an ACK if requested, and checks the serial number (for old duplicate packets). When it gets the correct result packet, it returns that packet to the user-stub. The user-stub unmarshals the results, frees the packet, and returns to the user! For the "normal" case, this uses four procedure calls and two process switches. There is one packet in each direction. There is an additional packet and an additional process switch if we need explicit acking. Buffers are allocated from local frames. The data structures used are as follows. user-stub: [host, PSB, proc.desc., index] for each procedure user-runtime: last-serial-number ARRAY PSB OF CONDITION (for outstanding calls) process for handling stray packets server-runtime: array of [host, PSB, serial-number] (= connections) rpcCond, and queue of idle rpc-server processes array of acceptable procedure descriptors array of CONDITION, for un-acked results server-stub: none. The only extra complexity for the non-local network case is getting the routing info into outgoing packets. The multi-packet case needs more protocol design. The above can be used for single-machine inter-MDS calls, by short-circuiting at packet transmit/receive time. In that case, a packet can always be assumed to have been acked. Chuck Thacker has gently suggested that I give a Dealer on the progress of RPC. Next week sounds like a good idea, and this week is possible if we agree about the above. Andrew ------------------------------------------------------------ *start* 00266 00024 US Date: 7 July 1981 4:46 pm PDT (Tuesday) From: kolling.PA Subject: My kingdom for a JFFO. To: Taft cc: kolling Do you have any idea what the maximum number of groups is that anyone (rational) has specified for an access field on Ivy? Karen *start* 00198 00024 US Date: 7 July 1981 5:06 pm PDT (Tuesday) From: kolling.PA Subject: Also, To: Taft cc: kolling Are 62 groups being defined enough for Ivy, or do you really need more? Karen *start* 01133 00024 US Date: 13 July 1981 4:45 pm PDT (Monday) From: kolling.PA Subject: defaults To: MBrown, Taft cc: Kolling How important do you think it is that each Owner have associated with it a (user-settable) set of defaults to use for the file access lists on file creation, as opposed to just defaulting to: file read: owner and Alpine wheels file modify: owner file read and set these file access lists: owner and Alpine wheels Presumably this owner-customized default would only be of use to ftp/chat-type clients, since the UFS and db managers would not use Alpine defaults (I guess) but rather maintain their own. The reason for omitting it is that doing so removes all connection between the owner database and the file properties database, and if it is included then two places (the file properties and also the owner data routines) have to know about stuff like what the irreducible minimum is that a file access list can be set to (ex: for read and set access lists always the owner and Alpine wheels, for modify no one, etc.). I am inclined to omit the owner-defaults in the name of simplicity. Karen *start* 00264 00024 US Date: 13 July 1981 5:45 pm PDT (Monday) From: MBrown.PA Subject: Re: defaults In-reply-to: kolling's message of 13 July 1981 4:45 pm PDT (Monday) To: kolling cc: MBrown, Taft Yes, but I would default the file read list to World. --mark *start* 00952 00024 US Date: 13 July 1981 7:28 pm PDT (Monday) From: Taft.PA Subject: Re: defaults In-reply-to: kolling's message of 13 July 1981 4:45 pm PDT (Monday) To: kolling cc: MBrown, Taft I think it's essential that defaults be settable on a per-owner (or per-something) basis. Using the "owner" = "directory" analogy, I should point out that the ability to set defaults on a per-directory basis in IFS and Tenex is heavily used and extremely important. (Indeed, proper setting of the defaults is the reason that most users never need to directly manipulate protections at all -- a good thing, since most users don't understand how protections work anyway.) In any event, I don't see why there has to be any explicit connection between the Owner and FileProperties data basis, other than the ability for the Owner data base to be able to store uninterpreted entries whose contents are defined by the code implementing FileProperties. Ed *start* 01544 00024 US Date: 14 July 1981 9:18 am PDT (Tuesday) From: MBrown.PA Subject: Re: defaults In-reply-to: Taft's message of 13 July 1981 7:28 pm PDT (Monday) To: Taft cc: kolling, MBrown Your point about defaults seems correct, but may not imply that Alpine should implement the client-settable defaults itself. For me, a telling consideration is that an owner-based access control default is not logically associated with a particular FileStore, but is logically associated with the owner and hence updates to it should probably be coordinated across all FileStores. It is only a matter of time until the distributed directory system, file replication machinery, and other pieces of a universal file system come along and want to implement their on access control policies, and Alpine files will all have access control lists granting the "UFS", and no real individuals, access. I would like to avoid putting much effort into something that we hope to make obsolete so quickly. I am in favor of low aspirations for Alpine's internal databases in the near term. This means the simplest possible owner database (getting control of space allocation is essential) plus storing essential file properties (including some small number of RNames for access control) in the leader page(s). In the longer term, we should have (1) a more viable Cedar database facility, and (2) support from Pilot for a variable number of "file property" pages; either or both of these could change the cost of doing something more ambitious. --mark *start* 00295 00024 US Date: 15 July 1981 3:24 pm PDT (Wednesday) From: kolling.PA Subject: Access control To: MBrown, Taft cc: kolling I'm about to leave a new version of the access control memo in your mailboxes. This is what I will be coding from, unless I hear screeches of protest. KK *start* 04074 00024 US Date: 16 July 1981 5:33 pm PDT (Thursday) From: kolling.PA Subject: comments please To: taft, mbrown cc: kolling Two topics for your comments, please, or maybe we should have a short discussion: (1) My model of "FileStore", FileProperties, and AccessControl has been basically as follows, where by "FileStore" I mean some upper layer of the FileStore implementation. Two factors I considered here were isolating the FileProperties handling so it can easily be changed when Pilot file properties really exist, and isolating the knowledge of the structure of "access lists" in AccessControl itself: FileProperties: this simple module's function in life is to read and write file properties. I don't think it knows anything about the internal structure of a file's access lists. That is, it knows xxxx is a file property access list, but it doesn't know how to decode it into RNames or anything. It should be written with an eye to replacing it probably completely when Pilot file properties really exist. AccessControl: (In addition to keeping track of disk space), this module expects questions of the form: can client foo do bar to such and such a file? Therefore it will be making read requests to the FileProperties module to acquire access lists, which it will then peruse since it knows their internals. FileStore: this is a high level fellow. It must ask AccessControl questions about permissions, as it shouldn't be munging around decoding access lists, etc. itself. It will stash file permissions it has received in its open file table. Upon client requests to read or write file access lists, File Store asks AccessControl `okay for this client to set this file's access lists?' and then calls FileProperties to do the actual read or write. If I understand correctly, Mark, in contrast, thinks that the internal structure of a file access list should be known to FileStore, and its calls to AccessControl should be of the form, is client foo in RName bar (or is the client the owner, etc.). AccessControl never needs to call FileProperties directly. Code that was in AccessControl in my proposed implementation would live in FileStore in this one. Note that, in either case, AccessControl will be decoding access lists for owner data. I would like to reserve the right to have an owner access list be different in form from a file access list, but they will certainly be very similar. (2) Permissions: The permissions I suggested were as follows: owner lists: permission to allocate and deallocate space for this owner name. permission to read and set these access lists. file lists: permission to read the file. permission to modify (write over) the file. permission to read and set these access lists. Mark's suggested ones are: owner lists: permission to create or delete files for this owner name. permission to read and set these access lists. file lists: permission to read the file. permission to modify (including extend and truncate but not delete) the file. permission to read and set these access lists. Under the old permissions, the following checks would take place: open: file access of appropriate type. create: owner allocate. delete: file modify and owner allocate. set size: (cached) owner create and check space available. Under the new permissions, the following checks would take place: open: file access of appropriate type. create: owner create. delete: file modify and owner create. set size: just check space available. The old permissions have the disadvantage of a sight amount of extra code to keep a tiny cache of creators; the new permission means giving someone write access to a file means they can use up your space. On the other hand, you may want people to be able to append to a file but not create new files......It is unclear to me which of these sets is better. With either set of permissions, we think the default for "file modify" permission would not be owner, as I suggested, but rather creator (or creator plus owner?). Karen *start* 02104 00024 US Date: 17 July 1981 9:49 am PDT (Friday) From: MBrown.PA Subject: Re: comments please In-reply-to: kolling's message of 16 July 1981 5:33 pm PDT (Thursday) To: kolling cc: taft, mbrown (1) The "FileStore" box already deals with RName values (we have been calling them ownerIDs and clientIDs.) All that I was suggesting is that we define another type, perhaps called AccessList, whose implementation might be a record containing three bits for the three special groups "world", "owner", and "alpineWheels", plus a sequence of RNames. Only AccessControl knows the internal structure of this list. Assume that FileProperties is capable of associating such AccessLists with a file (with restrictions on the length of the RName sequences), and retrieving the AccessLists associated with a file. Assume that AccessControl is capable of answering a query of the form "is this RName contained in this AccessList?" Then in the FileStore implementation of OpenFile, say, we first call FileProperties to get the proper AccessList, then call AccessControl to check containment. In CreateFile, we first call the Owner database to get the AccessList, then call AccessControl to make the check. There is a small flaw in the above: the "owner" bit of a file access list can only be interpreted in the context of a file. Maybe this argues for the other organization, in which AccessControl calls FileProperties. I don't see a strong motive for making an owner access list to differ from a file access list, and in the absence of one I would incline to making them the same. (2) I don't feel very strongly either way on this issue. My prejudices for our environment generally favor greater freedom, hence I prefer a broader interpretation of "file modify" (to include the ability to allocate space). I am not sure whether file delete should require owner create, file modify, or both. My suggested "file modify" default is the "owner create" list for the owner of the file (at the time the file is created.) This would always include the creator, but might include others. --mark *start* 02287 00024 US Date: 17 July 1981 12:43 pm PDT (Friday) From: Kolling.PA Subject: Re: comments/ Ed, note esp. point 4. In-reply-to: MBrown's message of 17 July 1981 9:49 am PDT (Friday) To: MBrown cc: kolling, taft (1) Oh, I thought you were planning to have FileStore pull the access lists apart into their group RNames, owner fields, etc. before presenting them to AccessControl. The residual question is, should it be FileStore or AccessControl who asks FileProperties for the access lists. Since I don't know what you plan for the internals of FileStore I have no opinion on this, so let me know what you decide, so I can start coding AccessControl. As you pointed out, if FileStore will be doing the work, AccessControl does need a way to ask it "who is the owner for this thing?" (2) The problem with using the "owner create" list for the "file modify" default is that I am trying to keep the owner stuff separate from the file stuff. That's why I proposed not storing defaults for file access lists in the owner database. If we have to go snarfing into the owner database for file defaults, we might as well put in real defaults, which we can do by increasing the size of the owner records to two pages. An alternative is to interpret the owner bit to mean both the ownerName and then the ownerCreate list. This does mean that if you change the owner create lists, files with owner bits set will now be accessible to different people than they were before. This also means that AccessControl's "who is the owner for this thing?" call to FileStore needs to be prepared to supply all this data. (3) My hesitation about having the access lists have the same form for files as for owners, is that clients can call procedures to change these access lists. For files, they can change owner, world, etc. but for the owner data they cannot. World is never set for owner lists, and owner and Alpine wheels always are set. So we give them an argument which they can ask us to change but then we tell them they can never change it.....We could cart the same form of list around internally, of course, in case we later want to change our minds about setting world, etc. (4) Ed, what's your choice about the two proposed protection schemes in yesterday's message? KK *start* 00229 00024 US Date: 20 July 1981 4:54 pm PDT (Monday) From: Kolling.PA Subject: directory names? To: taft cc: Kolling Somewhere is there a list of the directory names on Ivy (other than by editing an ftp log)? Karen *start* 00272 00024 US Date: 20 July 1981 5:01 pm PDT (Monday) From: Taft.PA Subject: Re: directory names? In-reply-to: Your message of 20 July 1981 4:54 pm PDT (Monday) To: Kolling "List <*>!1" will give you the closest approximation to a list of directory names. Ed *start* 00710 00024 US Date: 22 July 1981 8:14 pm PDT (Wednesday) From: Kolling.PA Subject: Alpine legal archiving To: Boggs, Taft, Birrell, Levin, Schroeder, MBrown cc: kolling Any files put in the [Ivy] directory will be automatically put into the legal archives by elves. If from time to time you have info about Alpine which should be archived but which you would not normally preserve for posterity by storing it on (such as a mail file, for example), either store it on JustForArchiving> or send me a pointer to the file and I will cause it to be archived. Files on JustForArchiving> may be deleted after archiving. Any file format is acceptable. Thanks, Karen *start* 00739 00024 US Date: 22 July 1981 12:47 pm PDT (Wednesday) From: kolling.PA Subject: decision revisited To: MBrown cc: Kolling We decided a day or so ago that since we needed to go into the owner data base to get the owner create-access-list (to use to set the default who-can-modify-this-file-list on file creation), we might as well keep a real set of default access lists for each owner in the owner data base. How about instead forcing the CreateFile call to supply the access lists for the file? This is consistent with expecting a future distributed client to be keeping the defaults and in line with keeping things simple. It is already sort of specified that way in the most recent version of the FileStore memo. KK *start* 02672 00024 US Date: 23 July 1981 2:35 pm PDT (Thursday) From: Nelson.PA Subject: FYI--RPC interfaces, version 2 To: MBrown, Kolling, Taft cc: Birrell, Nelson, Boggs Andrew and I have settled on this for the (short) time being. Most of the private RPC interface is omitted; the client interface that you will use is given second. The import, export, and encryption routines you have to invoke explicitly, of course, with the values you want. But once this is done, calls through the SOURCE interface are remote, and transparent (except for RPCErrors). Comments welcome, if you have any. Andrew has backed off considerably from his position that multiple interface instances be supported by records of procedures (instead of interface records). Instead, we now favor the original handle approach--there are a static number of interface instances, multiplexed (if desired) by client-specified handles. See Andrew for more info on this. The dust has not settled. -- Components of the private the RPC runtime interface, probably to be called "RPC". -- TYPES. RemoteInterface: TYPE = RECORD [ type: STRING _ NIL, --Default is interface name, e.g, "FTP". instance: STRING _ NIL, -- Default is a UID; a client e.g. is "Ivy". version: VersionRange _ AnyVersion ]; VersionRange: TYPE = RECORD [ -- A closed interval, client specified. firstAcceptable, lastAcceptable: CARDINAL]; AnyVersion: VersionRange = [0,0]; -- Distinguished value that means use any. -- These next items are all a part of the authentication package, not RPC. ConversationID: TYPE = []; --On a per-principal-pair basis. EncryptionKey: TYPE = []; --Not yet specified. RPCError: ERROR [type: Problem]; Problem: TYPE = {[], spare1, spare2, spare3}; -- Components of the user-visible RPC interface. For each user interface SOURCE, the stub generator creates three files: SourceRPC The types and control procs are shown below; SourcRPCClient Client stubs; exports both Source and SourceRPC; SourceRPCServer Server stubs; exports part of SourceRPC. -- This is a SourceRPC interface. -- TYPES. RemoteInterface: TYPE = RPC.RemoteInterface; VersionRange: TYPE = RPC.VersionRange; ConversationID: TYPE = RPC.ConversationID; EncryptionKey: TYPE = RPC.EncryptionKey _ RPC.ClearKey; -- PROCEDURES. RPCError: ERROR = RPC.RPCError; Problem: TYPE = RPC.Problem; ExportInterface: PROC [interface: RemoteInterface _ [] ]; UnExportInterface: PROC; ImportInterface: PROC [interface: RemoteInterface _ [] ]; UnImportInterface PROC; SetEncryptionKey: PROC [conversation: ConversationID, key: EncryptionKey]; ClearEncryptionKey: PROC [conversation: ConversationID]; *start* 00656 00024 US Date: 23 July 1981 7:04 pm PDT (Thursday) From: Kolling.PA Subject: hashing To: mbrown cc: kolling I'm going to get the hash code by using your suggestion of packing the low 5 bits of each character three to a word and xoring them, with a file 4*the data size. This gives an exceedingly low (1.15) ratio of names per occuring codes and more importantly the pathological cases are only: maximum of names/codes = 3 and only a handful of them showed up in the 500 names. Making the file 2* the data size increased the maximum to 4, but also the number of cases shot up quite a lot. Disk space is cheap compared to disk reads. KK *start* 00368 00024 US Date: 23 July 1981 12:37 pm PDT (Thursday) From: Kolling.PA Subject: legal archiving To: Anderson cc: Kolling Be sure to read Martin's memo about legal archiving. Sometime in the next week or so, we will be dumping on you a whole nest of memos from Alpine to be put into the archives (there's no hurry about getting it done, though). Karen *start* 00410 00024 US Date: 23 July 1981 2:54 pm PDT (Thursday) From: Kolling.PA Subject: CRSSCode.bravo To: MBrown cc: Kolling There's a file called CRSSCode.bravo on Alpine, containing transaction-oriented code, with your name on it and Butler's. Perhaps an implementation of the Howard-Butler stuff? I couldn't tell from glancing briefly at it if it should be archived for Alpine. Should it be? KK *start* 01232 00024 US Date: 24 July 1981 2:25 pm PDT (Friday) From: Nelson.PA To: Kolling Sub: Forgot you Subject: Multiple RPC exporters To: (Alpine:) MBrown, Taft To: (Other:) Rovner, Satterthwaite, Schmidt, Levin cc: Birrell, Nelson Mesa permits the implementation of an interface to come from multiple exporters, each of which supplies components of the interface. Andrew (and I) have considered this situation in our proposed RPC world. Our conclusion is that restricting interfaces to be exported entirely from one host is a significant reduction in complexity, primarily for the RPC binder, but also for the stubs themselves. While we believe we can do multiple exporters at the cost of storage in stubs and hairer binding, we don't want to and are hereby sounding you out on this decision. There are basically three, not two, possible responses: 1. Single exporter is fine. No problem. 2. Single exporter OK for now, but multiple are probably needed down the road. Do single now, but leave the hooks needed for easy impl. of multiple (say, .5-1.0 man month of extra work). 3. Multiple exporters wanted from the start, even if it takes a lot longer. Other comments welcome too. Bruce and Andrew *start* 00440 00024 US Date: 24 July 1981 10:37 am PDT (Friday) From: MBrown.PA Subject: Re: hashing In-reply-to: Your message of 23 July 1981 7:04 pm PDT (Thursday) To: Kolling cc: mbrown sounds good. did you use a prime table size? did you compute the average number of probes per entry? I have a few more suggestions about hashing, e.g. how to do collision resolution; let's try to talk today before I leave (around 1pm). --mark *start* 00706 00024 US Date: 24 July 1981 11:49 am PDT (Friday) From: Kolling.PA Subject: Re: hashing In-reply-to: Your message of 24 July 1981 10:37 am PDT (Friday) To: MBrown cc: Kolling prime table size: yes. average number of probes per entry: I didn't quite do this. The average of the names that hashed to the same code was 1.15. The max was 3, and this only happened for (as I recall) something like 5 codes out of the possible 1999. Since the table is 4*data and the cases didn't cluster these numbers seemed like good handles on the average and worst cases. collision resolution: I was just going to search linearly. I realize this will slowly degrade the table. Is there a better way? *start* 01232 00024 US Date: 24 July 1981 2:25 pm PDT (Friday) From: Nelson.PA To: Kolling Sub: Forgot you Subject: Multiple RPC exporters To: (Alpine:) MBrown, Taft To: (Other:) Rovner, Satterthwaite, Schmidt, Levin cc: Birrell, Nelson Mesa permits the implementation of an interface to come from multiple exporters, each of which supplies components of the interface. Andrew (and I) have considered this situation in our proposed RPC world. Our conclusion is that restricting interfaces to be exported entirely from one host is a significant reduction in complexity, primarily for the RPC binder, but also for the stubs themselves. While we believe we can do multiple exporters at the cost of storage in stubs and hairer binding, we don't want to and are hereby sounding you out on this decision. There are basically three, not two, possible responses: 1. Single exporter is fine. No problem. 2. Single exporter OK for now, but multiple are probably needed down the road. Do single now, but leave the hooks needed for easy impl. of multiple (say, .5-1.0 man month of extra work). 3. Multiple exporters wanted from the start, even if it takes a lot longer. Other comments welcome too. Bruce and Andrew *start* 00686 00024 US Date: 7 Aug. 1981 4:32 pm PDT (Friday) From: kolling.PA Subject: space manager transaction To: mbrown cc: kolling When the system starts up, and FileStore calls either InitializeAccessControl or RecoverAccessControl, who should request the creation of the Space Manager transaction? Is it okay for AccessControl to do this, or would FileStore perhaps want to hand the transID to AccessControl in these calls for some reasons of its own? Does FileStore ever need to know the transID for SpaceManager, like perhaps might it somewhere want to wait for all current trans to complete or something? (It would have a long wait for the SpaceManager one....) Karen *start* 02381 00024 US Date: 7 Aug. 1981 5:41 pm PDT (Friday) From: MBrown.PA Subject: Re: space manager transaction In-reply-to: Your message of 7 Aug. 1981 4:32 pm PDT (Friday) To: kolling cc: mbrown "Managers" who write log records will have to be specially registered with FileStore somehow. The form of this registration is not yet designed, but I'm getting to it. One of the reasons for knowing the identity of managers is for handling "major" checkpoints, such as warm shutdowns, in a clean way. When the system is shut down, we'd like to give all of the managers a chance to clean out their buffer pools and do a final commit, and notify the system that they have no more interest in the contents of the log. This way the archive log does not have to contain any of the client-written log records, only the basic FileStore log records. Doing the shutdown can be delicate, since presumably we want to start no more transactions, but might have to do so for the purposes of cleaning up one of the managers ... Getting this right should be interesting. A somewhat related question is whether or not managers' exclusive locks on the files they manage have any special status. Having the lock manager choose one of these locks to break in order to break a deadlock cycle is probably wrong. In this case the lock manager must distinguish a manager's transaction from garden-variety ones. Offhand I see no reason why the manager can't create its own transaction. I am glad that you are forcing me to get into issues like these. But you should remember that in the interests of simplicity in the initial implementation, it may turn out to be desirable to perform all of the owner-DB writes under the transaction that requested the space change, not a shared one (even though this may reduce the potential concurency somewhat.) I am not sure precisely how much of your design would be impacted by this change (except that some parts, like supplying recovery procedures and interfacing to the system as a "manager", would just go away entirely.) My guess is that much of the complexity in your implementation goes into the volatile data structures required to support the deferred update, which should be nearly the same either way. Also, the file organization is not a function of the style of transaction usage, as long as you stick to one-page records. --mark *start* 00187 00024 US Date: 10 Aug. 1981 2:15 pm PDT (Monday) From: Kolling.PA Subject: [Ivy]Doc>AccessControlDesign5.bravo..... To: MBrown, Taft cc: Kolling .....now exists. *start* 00280 00024 US Date: 10 Aug. 1981 4:18 pm PDT (Monday) From: kolling.PA Subject: Re: [Ivy]Doc>AccessControlDesign5.bravo..... In-reply-to: Kolling's message of 10 Aug. 1981 2:15 pm PDT (Monday) To: Kolling cc: MBrown, Taft And also AccessControlReorganize.sil. *start* 02252 00024 US Date: 10 Aug. 1981 12:19 pm PDT (Monday) From: kolling.PA Subject: Re: space manager transaction In-reply-to: Your message of 7 Aug. 1981 5:41 pm PDT (Friday) To: kolling We ought to talk when you have some time..... (1) The space manager requests locks on "strange objects" sometimes in the name of itself and sometimes in the name of the "real" transactions it sees (remember, for example, that's how owner operations and space operations are mutually exclusive). Although the sm goes to trouble to request locks in order, etc. it would be possible for a deadlock to occur involving these locks, for example, as follows: client one does an allocation for owner foo (read lock on volatile owner foo) client two does an allocation for owner bar (read lock on volatile owner bar) client one tries to do a space operation for owner bar (requests write lock for owner bar) client two tries to do a space operation for owner foo (requests write lock for owner foo) Now, if the lock manager detects a deadlock and wants to break it, I don't see how the sm can survive a broken lock, I think the only acceptable thing for the lock manager to do is not just break the lock but abort the unlucky transaction. Otherwise, what is the sm supposed to do? Does it even know the lock has been broken, since the lock was gotten for the real transaction? If it doesn't know, and someone else gets into the structure, the best it can do is detect a "this should never happen" situation. If it does know, should it pretend the requested operation never happened? Abort the transaction itself? etc. Perhaps the lock manager needs a way to be told "this is a system internal lock" and breaking those should be trans aborts. (2) I have a highly concurrent design 98% fleshed out in gory detail. However, it does have a lot of locks and monitors. Perhaps all this concurrency is not worth the complexity. On the other hand, perhaps locks are cheap and it is. The way it is designed, people will almost never have to wait for disk io, just when they read in the owner record. When you have ten or 15 minutes, I'll run it past your nose. How many simultaneous transactions do we expect to see anyway? Karen *start* 00400 00024 US Date: 10 Aug. 1981 5:05 pm PDT (Monday) From: kolling.PA Subject: tomorrow To: mbrown cc: kolling I thought I'd spend tomorrow sketching out a "non-concurrent" algorithm. But, if the idea is that each transaction updates the owner data file, doesn't that mean that write access on that file has to be "world" ? (i.e., no access control on our accesscontrol....). Karen *start* 02080 00024 US Date: 10 Aug. 1981 6:47 pm PDT (Monday) From: MBrown.PA Subject: Re: tomorrow In-reply-to: Your message of 10 Aug. 1981 5:05 pm PDT (Monday) To: kolling cc: mbrown transactions do not have authentication associated with them; that is done at the time a file is opened. hence there is no inherent conflict here. if there were, we would be in trouble because it would be hard to write a directory system, or any sort of database manager, without using the shared write transaction idea. one reason for building transactions into the server is so that not all DBMSs have to implement them themselves! i haven't had a chance to go through your memo in detail yet, but for one thing, I think that your aspiration level for the owner database reorganization procedure is way too high. My idea was that this database might require reorganization once every couple of years. Hence there is no problem in taking the server down to do the reorg. Furthermore, I thought that reorg would simply consist of printing out the current state of the database (in ascii, using EnumerateOwners), setting the new hash table length (magic), formatting the new hash table (magic), and inserting the database elements one by one (by using CreateOwner.) Anything fancier than this seems like overkill to me. The measurement stuff that you include in your code will tell us whether reorg is required. i am also willing to tolerate a very rough and ready design for updates to the owner access lists, since these almost never happen; the updater is willing to wait for quite a long time. Another possibility that I am willing to live with is that the access lists are not maintained in a fully consistent way; e.g. client a might create a file for owner 1 using transaction t1, then t2 might change the owner access lists in such a way as to deny client a the right to create files. if t2 commits before t1 commits then t1 should abort, but in this case i don't care what happens. i am sure that we can simplify this. i'll talk to you about it on wednesday. --mark *start* 02918 00024 US Date: 11 Aug. 1981 12:24 pm PDT (Tuesday) From: kolling.PA Subject: Re: tomorrow In-reply-to: Your message of 10 Aug. 1981 6:47 pm PDT (Monday) To: MBrown cc: kolling "transactions do not have authentication associated with them; that is done at the time a file is opened." I thought the purpose of making "each transaction do the updates on the owner database file" in the simple case was that these updates are then tied to the commit for that transaction. The "openFileID" for a file maps to [client, trans, fileID, accessrights], according to Ed's sil drawing. If I just hand off the openFileID that some other transaction got for the owner file to trans t1, how does the system know that those updates are associated with t1? reorganizing the database: (1) At one point I thought you told me that reorganizing the database should not require taking the system down.....Since we need stops for the purposes of warmshutdown, deferredupdate, etc. stopping for reorganizing "on-line" might be "free", maybe or maybe not. (2) "I thought that reorg would simply consist of printing out the current state of the database (in ascii, using EnumerateOwners), setting the new hash table length (magic), formatting the new hash table (magic), and inserting the database elements one by one (by using CreateOwner.)": printing out the current state? You don't mean that some poor operator has to then hand insert 500 owners, right? I could abandon my one-file reorg scheme and instead do this as you suggest by using a second file as a buffer, and either replacing the old id in Pilot's root directory (courtesy of FileStore perhaps) or doing a copy over at the end. Fine by me. "i am also willing to tolerate a very rough and ready design for updates to the owner access lists": I'm not sure of the implications of this. I have to be careful of updates to the owner space quotas anyway, so all(?) of the mechanism would be there anyway. (As a side comment, I didn't plan to worry about FILE access lists changing out from under a client who already had obtained a permission-- that's FileStore's problem, if it cares, which it probably shouldn't, or perhaps locking prevents this from even happening.) As for OWNER access lists changing, the trans t1 that had obtained create permission would automatically block the trans t2 that wanted to change the owner access lists; otherwise I need extra code to let change owner access list thru while create in progress but block change space quota while create in progress. I think we need to settle in for a relatively long talk. It may be that a simple solution is best if it can provide enough features. What worries me somewhat about the scheme I propose in the memo is not the amount of code, but the chance of dropping a lock on the floor. Let's talk (after you've had a chance to go thru the memo?). Maybe after Dealer? Karen *start* 00817 00024 US Date: 12 Aug. 1981 10:34 am PDT (Wednesday) From: MBrown.PA Subject: Re: tomorrow In-reply-to: Your message of 11 Aug. 1981 12:24 pm PDT (Tuesday) To: kolling cc: MBrown you would have to open the owner database file with each transaction that needs to update that file. i assumed that we would have a program read the text file produced by dumping the owner database. i favor a text dump because you need a way of dumping it in text form for debugging anyway. every data structure should have an associated print routine. ignoring the online reorganization part of your design, i don't think that the rest is inherently too compilcated, but i think that we need to think about it some more to develop the right abstractions. yes, maybe i'll have some good ideas after dealer. --mark *start* 01985 00024 US Date: 12 Aug. 1981 12:37 pm PDT (Wednesday) From: Kolling.PA Subject: volume groups To: kolling About one FileStore per volume group and about volumes going offline: 1. Who handles attempts to read and write files not on an online volume, both for the case when "it is known" beforehand that the volume is offline and also when ooops it disappears out from under us as we are doing io? Should AccessControl be prepared to have its calls on FileProperties to read a file access list get volumehasgoneoffline or something? 2. Do we want a way for FileStore to tell AccessControl that a volume has gone offline so AccessControl will reject space change requests for that volume, and that it has come back online? (VolumeStateChange: PROCEDURE[volumeID: VolumeID, state: {online, offline}])? Or does FileStore never ask AccessControl to do stuff for offline volumes? 3. Some proposed changes to Access Control's interface for volume groups: (Some of this depends on who knows a volume is offline; also, who does various types of error handling, such as volumeNotInGroup(horrors).) (a) CheckAccessOwnerCreate: the volume is specified as a parameter (some default that means any in the group), if default, CheckAccessOwnerCreate will now determine the volume (the online one with the most free space as far as it knows). new error returns: volumeNotInGroup. (b) ChangeSpaceViaOwner and ChangeSpaceViaOpenFileID now expect a VolumeID. new error returns: notEnoughSpaceInOwnerQuota (was notEnoughSpace), notEnoughSpaceOnVolume, volumeOffline (assuming FileStore told us this?), volumeNotInGroup. (c) InitializeAccessControlDataBase: new parameters -- ownerFileVolumeID and volumeGroup: LIST OF VolumeIDs. (d) BeginRecoveryAccessControl new parameters -- ownerFileVolumeID and old parameter openRootFile has a new parameter ownerFileVolumeID. 4. I assume we don't add or remove volumes for a group. Let me know if this is incorrect. Karen *start* 00650 00024 US Date: 13 Aug. 1981 2:30 pm PDT (Thursday) From: kolling.PA Subject: notes on locks To: mbrown cc: kolling (1) In both AC designs, I ran into cases where I would like to have an exclusive lock which I could release. In the Simple Design this would be nice, for example, if someone is going after an owner who turns out not to be in the "correct" page, it could then harmlessly release that exclusive lock so as not to block someone who might want that page. (2) In the complex design, I also kept running into cases where I wanted to lock out my own transaction, although I got around those by redesign or hand checking. *start* 01777 00024 US Date: 23 Aug. 1981 3:09 pm PDT (Sunday) From: Taft.PA Subject: Multi-pack Alpine volumes To: MBrown cc: Kolling, Taft I finally got around to reading your memo. I agree with most of what you say. An advantage of multiple-pack volumes that you left out is that they allow you to store a file that is larger than the remaining free space on any single pack. In a system with many packs that is close to being full (example: Ivy), this is an important advantage. IFS requires each file to be entirely contained on one pack, and that is a real problem: the largest file you can store right now is about 4000 pages. This is a large file, but not unreasonably so. Of course, this problem can be dealt with by some sort of disk permuting program, but that's a lot of work. Ultimately, I think the right approach is a flexible one. A file server's "primary" Alpine volume should encompass as many packs as is appropriate, since it doesn't make much sense to operate the server with only part of the file store on-line. On the other hand, dynamically mountable Alpine volumes should ordinarily consist of a single pack so as to make them easily portable. (After all, not many file servers are likely to have more than one extra drive.) In the short term, I see little point in trying to support multiple-pack logical volumes at the Alpine level when Pilot doesn't provide that capability underneath. And I'm certainly not in favor of screwing around with Pilot in order to finish the implementation. So I think the simplest statement of your proposal is that Alpine volumes will be one-to-one with Pilot logical volumes; and in the short term this means that an Alpine volume is no larger than a disk pack, due to temporary Pilot limitations. Ed *start* 02532 00024 US Date: 25 Aug. 1981 4:56 pm PDT (Tuesday) From: MBrown.PA Subject: File Store document, version 5 To: Kolling, Taft cc: MBrown I've placed a hardcopy of this memo in your mailboxes. The changes are pretty major; see the change log in the memo. The document is getting rather large and it may be time to create an "Alpine concepts and facilities" document separate from the interface description. Such a document would be suitable for outside review, which we should get (I was thinking of having Mitchell, Schroeder, and Lampson read it) at some point before the implementation really gets rolling. I am now trying to flesh out the treatment of errors in the interface. I am in general agreement with the stylistic conventions suggested in Roy's memo "Guidelines for Signalling in Cedar" (ivy]Levin>Cedar>SignallingGuidelines.bravo.) But we need to figure out how to apply these ideas to our system. One general principle that seems useful: if an error detected by a procedure is likely to the the result of a sequence of calls outside the immediate scope of the caller, the error should be raised as an ERROR and not by a return code. Failure of credentials (except on Login itself) and transaction aborts are examples of this type of error. Abstraction failures represent a fairly well-delimited class. For instance, a create owner failure due to the owner database being full is an abstraction failure, as is a hard disk error. Is a SetLength or CreateFile failure due to out of disk quota an abstraction failure? I guess so. I am assuming that we shall not require SIGNALs in our interface, only ERRORs. Remote RESUME is too much to ask for right now. Andrew seems willing to provide remote ERROR in a timely fashion. A remote ERROR will unwind the remote call before we see the error locally. This means that what Roy calls "Client programming errors" (e.g. RName too long or containing an invalid character) must return enough information to diagnose the problem that the remote machine found. The type of error for which the correct interface style is unclear is "read page off end of file", "owner already exists in call to create owner", etc. Offhand I am inclined to report these as ERRORs although they do not fit neatly into any of Roy's categories (they come closest to being calling errors, but they clearly should be documented in the interface, which calling errors are not.) Maybe this means that I am wrong and these should really be reported by return codes. --mark *start* 00295 00024 US Date: 27 Aug. 1981 1:18 pm PDT (Thursday) From: kolling.PA Subject: AccessControl To: mbrown, taft cc: kolling The latest version of AccessControl, AccessControlDesign6.bravo, is in your offices. Its internals are drastically different from the previous versions. KK *start* 00173 00024 US Date: 28 Aug. 1981 12:42 pm PDT (Friday) From: kolling.PA Subject: [Ivy]Cedar>Grapevine> To: kolling is where the new rname defs etc. are. *start* 00487 00024 US Date: 28 Aug. 1981 1:29 pm PDT (Friday) From: kolling.PA Subject: owner file access To: mbrown cc: kolling As I recall, the owner database file is supposed to be writeable by AlpineWheels only for obvious reasons, and I was supposed to put an "AlpineWheels" clientname into the OpenFile call on it for the client transactions I want to have read and write pages of it. I don't see a way to do this in the FileStore interface, as Open just takes a transID..... *start* 00771 00024 US Date: 29 Aug. 1981 12:16 am PDT (Saturday) From: MBrown.PA Subject: Alpine overview To: Kolling, Taft cc: MBrown The time is approaching that we shall have to tell the rest of CSL what we think Alpine is. For most of the lab I think this can be accomplished with a short overview document and a Dealer. We should also have a few reviewers look at what we are doing in more detail; the overview document can be their introduction to the subject. I have drafted an overview document and put hardcopies in your mailboxes. I expect that I have some blind spots and that I have drawn some conclusions that may not represent the group's feeling, so I will refrain from circulating the document until each of you has suggested amendments. --mark *start* 03697 00024 US Date: 31 Aug. 1981 4:05 pm PDT (Monday) From: kolling.PA Subject: Re: Alpine overview In-reply-to: MBrown's message of 29 Aug. 1981 12:16 am PDT (Saturday) To: MBrown cc: Kolling 1. page 2 typo: "These access control facilites" 2. page 2: I'm not clear about what you're trying to say in the area starting "Alpine must support server configurations that survive any single failure of the disk storage medium .... This degree of redundancy should not be a requirement for all servers, however." I thought we could survive a single failure by playing back the log if we went back far enough. Or are you talking about some other mechanism? Or about having it live on a workstation so there is only one copy of the log? 3. page 3: "Archiving" My standard kvetch about geographically separated backup/archiving in case the building burns down. 4. page 3. "Our environment does not demand continuous availability of a file server; we can tolerate scheduled downtime during off hours....." When Juniper needed to find a down time for backup, no time was available in the 24hr slot that didn't impact somebody. Of course, replicated files solves a lot of this. 5. page 4. "Alpine uses Grapevine to implement file access control." And disk space control, too. 6. page 5: "The archive log **actually** includes only the log records that are **actually** relevant to media recovery." The archive log actually includes only the log records that are relevant to media recovery. 7. page 5: "A volume can be moved from one server to another. This operation requires a "volume quiesce"...This operation is supported by Juniper...." Is this the belief that a Juniper pack can disappear and the system keeps running? That's what I used to think too, until I saw the directory system fall apart when a pack went off line, and the system refuse to come up when pack 0 was not online, and.... I think Howard used to believe that too, but Jay didn't, and so in the implementation..... 8. page 5. "Two of a file's properties are its read list and its modify list. Each of these contains (at most two) RNames, such as (CSL^.pa ISL^.pa). An Alpine client may read a file if he is contained in (the closure of) one element of its read list. An Alpine client may read or modify a file (including its read and modify lists) if he is contained in (the closure of) one element of its modify list. Another of a file's properties is its owner. An owner is an entity indentified by an RName, such as "McCreight.pa" or "CedarImplementors^.pa". The disk space occupied by a file is charged against its owner's space quota." I suggest: One of a file's properties is its owner. An owner is an entity indentified by an RName, such as "McCreight.pa" or "CedarImplementors^.pa". The disk space occupied by a file is charged against its owner's space quota. "Two other file properties are its read list and its modify list. Each of these contains (at most two) RNames, such as (CSL^.pa Taft.pa), as well as fields indicating access to the owner and world. An Alpine client may read a file if he is contained in (the closure of) one element of its read list. An Alpine client may read or modify a file (including its read and modify lists) if he is contained in (the closure of) one element of its modify list. (I changed the example to contain an individual rname and included owner and world.) 9. page 5: typo: "Disk space quotas for owners are maintained on for volume groups rather than individual volumes." 10. page 5: awkward: "A client may request to create a file on a volume group". 11. page 6: "The memos [Ivy]Doc>FileStoreDesign*.bravo" *start* 01018 00024 US Date: 31 Aug. 1981 5:55 pm PDT (Monday) From: kolling.PA Subject: two algorithms To: mbrown cc: kolling In the cold light of day, I began to think again about the difference between your algorithm with the buffers on one list and a use bit, and mine with buffers in different states separated on different lists. I think I picked the latter because I found it clearer to think in terms of as well as because I was dealing with smaller lists, so I compared rough sketches of the code and, for the most likely cases, ignoring duplicate stuff and small things, this is the difference between the two: (I don't guarantee I haven't dropped something on the floor here, especially since my D0 booted while I was working the more detailed version of this out.) Mine Yours ----- ----- space: 1 extra word per owner buffer. GetPage: unlink. ReleasePage: unlink. Phase One: unlink link Since (un)linking is relatively expensive, I'm going ahead with your scheme. KK *start* 00540 00024 US Date: 31 Aug. 1981 6:27 pm PDT (Monday) From: kolling.PA Subject: filestoredesign5 To: mbrown cc: kolling I haven't had a chance to read this whole memo yet, but the reason there are separate calls to change the access lists and the space quota is not because the internals of managing the access to these fields is different, but because these tend not to be things that will naturally happen at the same time, so it avoids the user specifying fields he/she doesn't care about or changing them inadvertantly. KK *start* 02262 00024 US Date: 2 Sept. 1981 11:05 am PDT (Wednesday) From: MBrown.PA Subject: Re: Alpine overview In-reply-to: Your message of 31 Aug. 1981 4:05 pm PDT (Monday) To: kolling cc: MBrown 2. Some servers may not choose to replicate the log (use mirrored disks for log.) In this case, a crash that occurs after commit but before the intentions are carried out, and that destroys the single copy of the log, wipes out committed information in the system. Also, a workstation-based system will surely not take regular backups or save its log (though one can imagine fixing this if the transaction rate is low.) 3. Noted. This applies even more empatically to backup than to archive, since many files are never archived (I'm not sure how to archive a database.) 4. I find it hard to believe that Ron can't take the server down for 15 minutes at 6am. If pack copy is implemented carefully, this is about how long it should take (the disk can be read in 5 minutes at full speed, but you need a check pass, ...) Not all packs are copied every time backup is taken. There are alternatives that provide continuous availability, and someday we should implement one just for the experience, but I don't believe we'll do it next year. 5. Grapevine's contribution to disk space control is minimal, and I felt that it would add confusion to mention it here. 7. Right. What I meant was simply that you can spin down a Juniper pack at any instant, move it to another machine, spin it up, and it can recover to a transaction consistent state. It is another question whether you can find the files, etc. 8. I want to avoid confusion, and make the scheme sound as simple as possible. Now, "world" can be thought of as an RName, since if you allow "world", other RNames are not needed. (Grapevine uses "*" or something.) "owner" can also be thought of as an RName, but one that does not take up one of the two slots. We can introduce a notation for owner ("+"?), or just have the caller supply the owner RName and check for a match when setting up the access lists. The question would get a little more complex if we had a "change owner" call. We have not provided one yet, and I suggest we do until someone explains why it is required. --mark *start* 00328 00024 US Date: 2 Sept. 1981 12:17 pm PDT (Wednesday) From: Kolling.PA Subject: Re: Alpine overview In-reply-to: Your message of 2 Sept. 1981 11:05 am PDT (Wednesday) To: MBrown cc: kolling Yes, "Ron can take the server down for 15 minutes at 6am;" it was two hours at 6 am that had people gnashing their teeth. *start* 00373 00024 US Date: 2 Sept. 1981 12:26 pm PDT (Wednesday) From: MBrown.PA Subject: Re: Alpine overview In-reply-to: Your message of 2 Sept. 1981 12:17 pm PDT (Wednesday) To: Kolling cc: MBrown Right. We'll have to do better than two hours. Eventually disks may get so large that online backup is a necessity, but that day will not arrive for awhile. --mark *start* 01666 00024 US Date: 3 Sept. 1981 12:02 pm PDT (Thursday) From: Taft.PA Subject: Re: Alpine overview In-reply-to: MBrown's message of 29 Aug. 1981 12:16 am PDT (Saturday) To: MBrown cc: Kolling, Taft Some random comments: 2. Why a new file server? a) Juniper should at least be mentioned: why are we designing a new file server rather than just pushing Juniper into the Pilot/D-machine world? b) "Nearly all CSL application programs will deal primarily with database systems instead of file systems." I think this is overstating the case. I view a database as a way of organizing information, but a file as a way of storing the data itself (particularly if there is a lot of it). 3. Alpine's scope a) "(Note that an archive system would reduce the problem of disk space quotas for users.)" I don't understand this statement. b) Archiving: I agree with your conclusion. But I should point out that the Alpine file organization (low-level naming by unique IDs, and high-level naming by a location-independent UFS) eliminates a lot of the problems that would make it difficult to implement a satisfactory archive system for IFS. 4. Alpine's implementation strategies a) "The time required to restore lost information from log tapes is bounded by periodically copying entire volumes to backup packs..." Are the volumes taken off-line during this operation? If so, I don't like it! Can't we do some sort of on-line replication? 5. Alpine objects a) Having one log per file server (rather than per volume) seems to conflict with the notion that volumes are self-contained and (re)movable. Maybe one log per volume group would be better ?? Ed *start* 04841 00024 US Date: 3 Sept. 1981 4:41 pm PDT (Thursday) From: Taft.PA Subject: Access control design To: Kolling cc: MBrown, Taft Here are my comments on Alpine AccessControl design, version 6. "Some comments" 2. Having a separate access control list for privileged access to each Alpine server is a good idea. This is how we control gateways now (that is, there are groups called Twinkle.internet, SoEasy.internet, etc.), and it seems to work well. It nicely decentralizes adminstration of multiple servers. 5. There is a small argument to be made for separating modifications to a file's access control lists from all other file modifications. This permits an owner to write-protect a file from himself (to prevent blunders), or to give other clients write access without control over the access control lists themselves. I assume you don't want to associate any new access control lists with each file. so I propose the following changes: (a) only the owner or the file's creator is allowed to change access control lists; (b) it is permissible to exclude the owner from the modify list. (By the way, I assume that AlpineWheels is on all these lists implicitly and does not actually consume one of the two RName slots!) 12. "The owner data base file will be entered in the root directory." Meaning a Pilot Directory package root directory? I didn't think we were planning to use the Directory package on Alpine volumes. "What AccessControl expects from FileStore" "FileStore is assumed to cache the permissions ... in its open file table." I agree with this statement with regard to open files. But I believe that AccessControl should also do a certain amount of caching of [ClientID, FileID, access] triples (or something), so that successive openings of a popular file don't have to consult Grapevine every time. Not allowing the sum of all quotas to exceed the available space is nice in principle, but impractical. It leads to underutilization of available space (since many users don't use their entire quota); and in any event it's difficult to administer, especially in an environment where users don't pay for storage. (I do agree, though, that AccessControl should maintain a running total of quotas, so an administrator can determine how badly the available space is overcommitted.) "What AccessControl expects from Locks" (2) "... it cannot just break the lock, instead it must abort the unlucky transaction." I thought that in Alpine deadlocked transactions were always aborted. Am I mistaken? "Interface to AccessControl" IsClientValidForThisFileStore: I don't think this is structured quite right. I think we need a separate Authenticate operation that simply authenticates an RName (i.e., that verifies that the RName is valid, and that the client is who he says he is). Furthermore, this should probably be put in a separate interface, since the mechanisms for authentication are likely to change over time, as RPC and our facilities for authentication and encryption mature. The FileStore and/or RPC should do the authentication; and AccessControl should deal only with access control, not authentication. CheckAccessFile: there is a potentially nasty recursion here. If we read the access control list properties using the general mechanisms, then we have to open the file first; but opening the file requires access to be checked ... We can get around this either by some sort of trap-door in the implementation for reading properties, or by having AccessControl open the file while assuming the identity of some privileged client who can always open files without having to read their access control lists. EnumerateAllOwners: let's get together and settle on a convention for doing enumerations in Alpine. If enumerating a data structure locks it up, then passing in an enumeration procedure that is called for each item is equivalent to using your "continuationKey", and seems a lot cleaner. But I don't think that in general we can afford to lock up data structures while enumerating them, and the Pilot-style "stateless enumeration" may be preferable. One thing that's not clear to me from the implementation notes is whether the internal data structures are modified under the caller's transaction, under a private transaction, or under no transaction at all (i.e., manipulating raw FilePageMgr storage). Similarly, are the locks for manipulating file properties acquired under the caller's transaction or under a private transaction? General comment: there aren't any SIGNALs in the interface, though there are a number of places where it would be natural to use them (instead of return codes that the caller must test). Since AccessControl is an internal interface (not exported via RPC), I don't see any reason not to use SIGNALs in the conventional ways. Ed *start* 07753 00024 US Date: 3 Sept. 1981 6:22 pm PDT (Thursday) From: kolling.PA Subject: Re: Access control design In-reply-to: Taft's message of 3 Sept. 1981 4:41 pm PDT (Thursday) To: Taft cc: Kolling, MBrown There is yet another version of AC (design 7), which addresses some of your points. I wasn't planning to release it until I got further into the implementation, but it's on [Juniper]dumpaltodisks>accesscontroldesign7.bravo (in a state of flux) if you want to peek. 1. "Having a separate access control list for privileged access to each Alpine server is a good idea." I propose to do this by having FS tell me the list name when it does InitializeVolatileStructure. 2. "There is a small argument to be made for separating modifications to a file's access control lists from all other file modifications. This permits an owner to write-protect a file from himself (to prevent blunders), or to give other clients write access without control over the access control lists themselves." Yes, I left this out because I was tight on space for AccessLists. 3. I assume you don't want to associate any new access control lists with each file. So I propose the following changes: (a) only the owner or the file's creator is allowed to change access control lists; (b) it is permissible to exclude the owner from the modify list. Generally, I don't know the file's creator unless we keep this as a file property; there is probably room for this in the leader page. Just the owner seems restrictive.... I don't know what to do about this. Since we are just trying to put a vestigial implementation in, I vote to leave it as it is -- what do you both think? 4. "(By the way, I assume that AlpineWheels is on all these lists implicitly and does not actually consume one of the two RName slots!)" Yes, AlpineWheels, Owner, World: BOOLEAN. (See, Mark, I told you the overview memo should say that explicitly.) 5. "The owner data base file will be entered in the root directory." Meaning a Pilot Directory package root directory? I didn't think we were planning to use the Directory package on Alpine volumes. I thought the root directory avoided the use of the Directory package; am I wrong? 6. "FileStore is assumed to cache the permissions ... in its open file table." I agree with this statement with regard to open files. But I believe that AccessControl should also do a certain amount of caching of [ClientID, FileID, access] triples (or something), so that successive openings of a popular file don't have to consult Grapevine every time. Maybe. But an RName is a fat thing.... How likely are we to see open requests for the same file frequently enough so that we would catch it in the small cache we could afford?) 7. Not allowing the sum of all quotas to exceed the available space is nice in principle, but impractical. It leads to underutilization of available space (since many users don't use their entire quota); and in any event it's difficult to administer, especially in an environment where users don't pay for storage. (I do agree, though, that AccessControl should maintain a running total of quotas, so an administrator can determine how badly the available space is overcommitted.) Also, I really don't quite know the space used up since Pilot is snarfing some up as well. Most of the work is in keeping track of the sum of the quotas, rather than checking against the space, so it's close to immaterial in terms of code which decision we make, especially as I would like to keep hooks in in case we do do this later. Let's make a policy decision: shall we allow overcommitting the disk space or not? 8. "... it cannot just break the lock, instead it must abort the unlucky transaction." I thought that in Alpine deadlocked transactions were always aborted. Am I mistaken? Right, I meant I couldn't survive Juniper-type broken read locks. 9. IsClientValidForThisFileStore: I don't think this is structured quite right. I think we need a separate Authenticate operation that simply authenticates an RName (i.e., that verifies that the RName is valid, and that the client is who he says he is). Furthermore, this should probably be put in a separate interface, since the mechanisms for authentication are likely to change over time, as RPC and our facilities for authentication and encryption mature. The FileStore and/or RPC should do the authentication; and AccessControl should deal only with access control, not authentication. Currently all the authentication calls the system makes are localized in AC, except for this one from FS. Since authentication is so closely connected with AC, I prefer to keep my hands on it (i.e., make FS ask me instead of having a little hunk of code out in FS), but separating it out into a separate interface is probably okay, let me see what the code looks like when I get further along. Of course, it already is sort of in a separate interface, since I'll be calling that interface to do the authentication. 10. CheckAccessFile: there is a potentially nasty recursion here. If we read the access control list properties using the general mechanisms, then we have to open the file first; but opening the file requires access to be checked ... We can get around this either by some sort of trap-door in the implementation for reading properties, or by having AccessControl open the file while assuming the identity of some privileged client who can always open files without having to read their access control lists. No recursion, AC does stuff in its "own highly priviledged name" where necessary. 11. EnumerateAllOwners: let's get together and settle on a convention for doing enumerations in Alpine. If enumerating a data structure locks it up, then passing in an enumeration procedure that is called for each item is equivalent to using your "continuationKey", and seems a lot cleaner. But I don't think that in general we can afford to lock up data structures while enumerating them, and the Pilot-style "stateless enumeration" may be preferable. Addressed in new design, client gets to choose. 12. One thing that's not clear to me from the implementation notes is whether the internal data structures are modified under the caller's transaction, under a private transaction, or under no transaction at all (i.e., manipulating raw FilePageMgr storage). Similarly, are the locks for manipulating file properties acquired under the caller's transaction or under a private transaction? We threw out the separate transaction idea because it got too complex and people still had to block on writing to the log. I should have noted this is client transaction oriented in the memo, but I got snow blind. AC calls FS, not FPM. AC only reads file properties, in its own priviledged name, under the client's transaction. 13. General comment: there aren't any SIGNALs in the interface, though there are a number of places where it would be natural to use them (instead of return codes that the caller must test). Since AccessControl is an internal interface (not exported via RPC), I don't see any reason not to use SIGNALs in the conventional ways. Error handling is re-addressed in latest design, a la Roy's memo; I ERROR on my "programming" errors and on FS "programming" errors, but I still have "client-of-FS" things ("owner already exists", etc.) coming back as return codes since I guessed that would be easier for FS to decide how to send them back to the user, but that's certainly open to change if FS would prefer SIGNALS (or ERRORs). Time fleets: as I have started the implementation, it would be a good idea to wrap up the design changes..... Karen *start* 01375 00024 US Date: 4 Sept. 1981 8:42 am PDT (Friday) From: MBrown.PA Subject: Re: Access control design In-reply-to: kolling's message of 3 Sept. 1981 6:22 pm PDT (Thursday) To: kolling cc: Taft, MBrown I haven't had a chance to more than glance at this exchange of messages, but I note the following: 3. You could substitute "a user who is currently enabled to create files for this owner" for "creator", and I think the effect would not be far wrong. The extra degree of redundancy suggested by Ed seems like a good thing, and should probably extend to file deletion (i.e. it should be possible to make it impossible to delete a file without first setting some property to the correct state.) 5. By "root directory" I mean "root file of logical volume" in the current release. We can impose any structure we want on this file; it should be simple enough not to require the ComSoft directory package. For Trinity, there will be a primitive "root directory" facility that allows multiple "root files" without recourse to the ComSoft package. 7. I think that overcommitting quotas should be an administrative decision, not imposed by our implementation. We should make it easy for the administrator to tell what he is doing. 8. We don't support Juniper style broken locks; we'll use another mechanism to validate remote caches. More to come ... --mark *start* 01199 00024 US Date: 4 Sept. 1981 1:04 pm PDT (Friday) From: Kolling.PA Subject: Re: Access control design In-reply-to: MBrown's message of 4 Sept. 1981 8:42 am PDT (Friday) To: MBrown cc: kolling, Taft How about this for the file access problem: two access lists: default minimum --------------- ------- --------- read world owner, AW modify owner, OwnerCreateListWhenFileWasCreated null These control access to the file itself. Access to the access lists is also controlled by these lists, but in addition, we always also permit access to the owner, AW, and the current ownerCreateList members. I'm not sure what you mean by "The extra degree of redundancy .... should probably extend to file deletion (i.e. it should be possible to make it impossible to delete a file without first setting some property to the correct state.)" Currently file deletion, like truncation and extending, requires file modify priviledges, so this minimum null setting would protect against it. Is that what you want? There isn't much difference between deleting a file and scribbling all over it.... I'll put in some way to make overcommitting quotas an option. Karen *start* 00730 00024 US Date: 4 Sept. 1981 1:58 pm PDT (Friday) From: kolling.PA Subject: Re: Access control design In-reply-to: MBrown's message of 4 Sept. 1981 8:42 am PDT (Friday) To: MBrown cc: kolling, Taft Shoot. I just realized the following: In the old scheme, all FS had to do to read or write any file property was check its open file table, just like it handles read or write page requests. In the new scheme, if the open file table check says no, FS needs to recognize a file property which is an access list, so it can call AC to see if it can do it anyway. Do we want this extra stuff when all this is going to disappear anyway? This null minimum setting sounds nice, but does it ever really get used? Karen *start* 01586 00024 US Date: 4 Sept. 1981 3:21 pm PDT (Friday) From: Taft.PA Subject: Re: Access control design In-reply-to: kolling's message of 3 Sept. 1981 6:22 pm PDT (Thursday) To: kolling cc: Taft, MBrown 1. Caching access permissions: I didn't think this through or explain it very carefully. The case I'm really concerned about is a single client accessing many different files that have the same access control lists. If I retrieve 100 files that are all readable by CSL^.pa, I don't want Alpine asking Grapevine each time. IsMemberClosure is quite slow for large lists (I think testing for membership in CSL^.pa takes several seconds), and in any case we don't want to swamp Grapevine. I think the right thing to do is to implement a simple cache in front of IsMemberClosure. When IsMemberClosure[name, member] returns "yes", put the pair [name, member] in the cache. Before calling IsMemberClosure[name, member], check the cache for presence of that pair and just return "yes" if so. The cache can be of fixed size and relatively small (say, 25 or 50 entries); it should be managed LRU, and there should also be a timer that flushes the cache periodically (say, once per hour) so that the cache doesn't get too far out of sync with Grapevine. 2. I don't see any difficulty with FileStore giving special treatment to access control properties and calling AccessControl before modifying them (rather than just looking in the OpenFileMap). I think that being able to write-protect a file against yourself is sufficiently useful that we should do this initially. Ed *start* 01154 00024 US Date: 4 Sept. 1981 3:38 pm PDT (Friday) From: kolling.PA Subject: Re: Access control design In-reply-to: Taft's message of 4 Sept. 1981 3:21 pm PDT (Friday) To: Taft cc: kolling, MBrown 1. I thought you meant stashing the [ClientID, FileID, access] away. Yes, stashing [ClientID, accesslistname] is a lot more useful and quite desirable. I'll put it in. (By the way, IsMember is a pig particularly for lists of more than one level; Andrew explained why to me at one point but I forget. I think they btree down for each entry as opposed to a direct search thru the one level list. IsMember[CSLOnly^] + IsMember[CSLTemp^] goes like the wind compared to the equivalent IsMember[CSL^].) 2. Mark isn't around, so I don't know how he feels about the access stuff. 3. In rereading your memo, I see that I didn't properly answer your question about CheckAccessFile. In my head I had an arrow from AC to FP directly, but this is wrong, I think, or at least incomplete. I don't think I know the answer yet, because I don't know as much as I thought I did about how the various boxes interact, so I'm now looking at it. Karen *start* 00620 00024 US Date: 4 Sept. 1981 4:12 pm PDT (Friday) From: kolling.PA Subject: Here's Mark's solution for CheckAccessFile. To: Taft cc: mbrown, kolling FS.OpenFile makes an OpenFileID at the beginning of OpenFile. FS calls AC.CheckAccessFile[openFileID blah blah]. AC calls FileProperties.something to read the access list it wants[openFileID]. FP calls into FS to read the leader page, which makes the locking and looking in the log happen right. I'll go ahead with the extra stuff for the access lists as you suggest, unless Mark says differently. I think this resolves all the questions raised? Karen *start* 05668 00024 US Date: 4 Sept. 1981 5:15 pm PDT (Friday) From: MBrown.PA Subject: Re: OpenFileMap and LogMap, version 0 To: Taft cc: Kolling, MBrown I like the style, organization and general content of this memo very much. As usual, I have my own opinions on some of the details: OpenFileMap: 1. I think that an OpenFileMap.Handle should be longer than 2 words. Reason: we want it to be very difficult for a client to "guess" the Handle that someone may be assigned on opening a file; a Handle is a "capability" in the technical sense (not quite since a Handle only works if someone has actually opened a file; we don't open files implicitly when an unknown Handle is presented!) As you suggest, we can use one word of the Handle to index into the internal data structure. The other words can hold a random number generated at Open time. 2. The current FileStore interface includes a type called Handle, representing one client's session with one server. The design given there binds a Handle into the OpenFileHandle. This seems to imply that a FileStore.Handle, not a ClientID, should be stored in the OpenFileMap entry. I am assuming that the ClientID can be gotten by following the FileStore.Handle. I was also assuming that any Handle that corresponded to an entry in the OpenFileMap was valid; ValidateHandle would not take ClientID as a parameter. Your way of structuring things, which seems viable also, seems to imply that calls that take a FileStore.OpenFileHandle also take a FileStore.Handle, and that on each call we first validate the Handle, then check consistency between the ClientID in the Handle and the ClientID in the OpenFileHandle. My point (1) is an alternative solution: make the OpenFileHandle hard to forge, and just take it on faith that if you are passed an OpenFileHandle that looks valid, it is valid for the client who passed it to you. 3. A hash table does seem to be the preferred data structure. Since the entries are relatively large compared to the size of a pointer, hashing with separate chains for resolving collisions is a good method. Since you have freedom to choose hash codes, you can arrange for no collisions as long as the number of open files is less than the number of hash buckets. LogMap: 1. The log map is specific to files; a data manager that writes log records and defers updates will have its own version of the log map. 2. I don't understand why you assume that the log map represents only one state of a FileStore. I was assuming all along that for any page of a file there were two possible views: the view seen by all transactions that have done only reads, and the view seen by the single transaction (if any) that has done a write. I don't see that it is any harder to implement this view than the simpler one you suggest. I don't feel strongly that allowing reads to go "underneath" uncommitted writes is necessary, so if you can explain why it is harder, we can flush it. I thought it came for free with the deferred update philosophy we are using. My idea was: in response to the "prepare to commit" command, a FileStore would upgrade all update locks to write locks. Then the transaction commits. At the instant it commits (signified, in the no crash case, by a change of state in the volatile transaction representation), the log map has in effect changed: all readers will see the new data recorded in the log. Before the commit, only the writing transaction saw the new data. The write lock before commit assures consistency. After commit, locks on the updated pages are released (or downgraded to read in case of checkpoint.) After commit, reading one of the just written pages from ANY transaction will "hit" in the log map and will read the new data from the log. As a side effect, it will write the new data to FilePageMgr and then erase the log map entry. In no case will we allow two updates to the same page from different transactions to exist in the log map. One of the two must be committed, and we will force it through FilePageMgr before making the second entry. (This is a VERY unlikely case!) 3. Update before commit is a separate issue. If update is done before commit, then a log map entry might be made but it would be only for the purposes of undo or recovery. It would not "hit" on a log map lookup, since the truth is in the file already (as well as in the log.) 4. I feel that the log map should be implemented in virtual memory. The log map should have a reasonable size. It is a sign of TROUBLE if the server is deferring huge quantities of work (has a huge log map.) The optimization of writing "new" pages without deferring (the "file high water mark" idea) will allow us to avoid generating a huge log map for a file transfer. Database transactions are short and update a few pages (I would say 5 pages typical, 100 pages ok but unusual, we would never expect to process such transactions at a high rate.) The log map is consulted on each read operation. BTree lookups are slow (even if the pages are all in main store), so I feel we don't want to do one on every read. In particular, the readers of a file that has not been updated recently should not pay for lengthy log map searches. I have no concrete design proposal. I think a hybrid structure might be best: a hash table keyed on FileID, where the entries are balanced trees (red-black or whatever). Then a file with little update activity has a tiny tree to search, and a file with lots of update activity has a big tree to search, which only seems fair. I have a balanced tree package that could be adapted to the purpose with little work. --mark *start* 01120 00024 US Date: 8 Sept. 1981 1:33 pm PDT (Tuesday) From: MBrown.PA Subject: FS version 5 To: Kolling --------------------------- Date: 4 Sept. 1981 6:12 pm PDT (Friday) From: kolling.PA Subject: FS version 5 To: mbrown cc: kolling As you requested, I put off reading your latest FileStore memo to get started implementing AccessContol. Tonight I scanned the FS memo and found: 1. It believes DeleteFile needs owner "create/delete" permission. I think this is the way we had stuff set up originally and then you asked me to change it so Delete only needed file modify..... 2. ReadNextOwner needs the new param to specify locking the whole file or walking thru reading and releasing read locks (latter most;y avoids the noted problem of conflicts). 3. Note on WriteOwnerAccessLists that we OR in minimums without complaint and don't write "empty" lists. 4. Defaults are missing in places on this page. 5. I have some debugging routines (EnumerateOwnerDatabase, ReportAccessControlStatistics) but maybe they shouldn't be noted here. ------------------------------------------------------------ *start* 01828 00024 US Date: 8 Sept. 1981 2:14 pm PDT (Tuesday) From: MBrown.PA Subject: Re: Access control design To: Kolling cc: Taft, MBrown Here are more thoughts on some of the points discussed in last week's exchange of messages. "What AccessControl expects from FileStore" "AccessControl should maintain a running total of quotas, so an administrator can determine how badly the available space is overcommitted." Such a running total is a resource that is difficult to maintain in a transaction-consistent way without serializing all transactions that update it. We have gone to some trouble in the design to avoid such bottlenecks. I suggest that the administrator should run a little program that enumerates the owners and gathers whatever statistics it desires. Since the server is not keeping this total, it cannot enforce the "do not overcommit quotas" restriction, but we seem to agree that overcommitting quotas is ok anyway. "Some comments" Capability to delete files: the design that I have advocated is that file lengthening/shortening should be treated as equivalent to any other file modification, and require only file modify permission (even though the length-changing operations debit or credit the owner database.) But file deletion should require the same capability that the creator had. This is impractical since we cannot store away the contents of the owner create list at the time a file is created, so we compromise and simply require the client doing the deleting to be on the owner create list at the time of the deletion. Ability to protect file against owner modify: as Ed says, this seems worth the trouble to do from the start. The solution of allowing the modify minimum to be null, and allowing access list modify only to members of owner create, seems fine to me. --mark *start* 00793 00024 US Date: 8 Sept. 1981 2:34 pm PDT (Tuesday) From: kolling.PA Subject: Re: Access control design In-reply-to: MBrown's message of 8 Sept. 1981 2:14 pm PDT (Tuesday) To: MBrown cc: Kolling, Taft Sigh, am I tired of reorganizing this memo on a daily basis and then having to start over on the code from scratch..... The way it stands now, null permission for file modify can be set, and foo = {read/modify} permission for the access lists themselves is file foo + (owner, AlpineWheels, ownerCreateListNow). Unless there is an uproar, that's how I plan to leave it. I will reflect file deletion requiring owner create permission, but why a client needs special permission just to throw away the leader page after having previously thrown away the entire file..... Karen *start* 01569 00024 US Date: 8 Sept. 1981 2:53 pm PDT (Tuesday) From: Taft.PA Subject: Handles, authentication, etc. To: MBrown, Kolling cc: Birrell, Boggs, Nelson, Taft I've talked with Andrew about how an RPC implementor can determine who called it. The problem, briefly stated, is that if RPC is doing the connection management and the secure communication (if any), then the implementor (the guy called via RPC) will have trouble identifying of the client who called it -- it knows only that the call originated from SOME authentic client! One way to deal with this is for the implementor, at authentication time, to manufacture and pass to the client some sort of capability (it must be more than a simple handle, since it must be difficult to forge). However, since RPC is doing connection management and is already maintaining ConnectionIDs for its own purposes, it seems straightforward to make this mechanism available for client/implementor identification as well. A simple (though slightly inelegant) implementation is this: if a procedure in a remotely exported interface is declared as taking an argument of type RPC.ConnectionID, then when that procedure is called, the caller's ConnectionID argument is ignored. Instead, the RPC on the callee's side substitutes the ConnectionID corresponding to the (secure) connection over which the call came. It's an open issue whether the ConnectionID should be generated by the implementor or the RPC. Speaking as an implementor, I would rather generate it myself; but it's not really that important. Ed *start* 00738 00024 US Date: 8 Sept. 1981 3:16 pm PDT (Tuesday) From: kolling.PA Subject: Re: Access control design In-reply-to: MBrown's message of 8 Sept. 1981 2:14 pm PDT (Tuesday) To: MBrown cc: Kolling, Taft Quotas: the only transactions that need be serialized to maintain a consistent handle on the quota space left are the ones that do AddOwner or ChangeOwnerSpaceQuota. These transactions are rare birds. If the quota-space-left is kept in the first page of the owner database file or somesuch and such a transaction locks that page exclusively, I think this serialization will happen properly with no other effort, and this will give the system administrator more flexibility than our not supplying this feature. Karen *start* 00447 00024 US Date: 8 Sept. 1981 3:19 pm PDT (Tuesday) From: Taft.PA Subject: Re: Access control design In-reply-to: kolling's message of 8 Sept. 1981 2:34 pm PDT (Tuesday) To: kolling cc: MBrown, Taft I'm with Karen on this one: why distinguish between the capability to delete a file and the capability to destroy all the information within it? I think having modify permission also control deleting the file is quite adequate. Ed *start* 00249 00024 US Date: 8 Sept. 1981 3:34 pm PDT (Tuesday) From: kolling.PA Subject: authentication To: mbrown cc: kolling I'm confused about rpc authentication and how this affects clientIDs, etc. Sometime when you have a few minutes..... *start* 00574 00024 US Date: 8 Sept. 1981 4:06 pm PDT (Tuesday) From: kolling.PA Subject: AccessLists To: mbrown cc: kolling I noticed that in FS version 5, CreateFile etc. take an AccessList, which is a LIST of RNames. Is that what you plan to hand off to me or will this be changed to allow the client of FS to say something like: [owner, AlpineWheels, world: BOOLEAN, xxxxx: LIST OF RNames] or somesuch? If you plan to leave it like this, how shall AC recognize "world", etc.? Let me know what you decide about quotas and DeleteFile before you leave. Thanks. *start* 00536 00024 US Date: 8 Sept. 1981 4:14 pm PDT (Tuesday) From: MBrown.PA Subject: Re: Access control design In-reply-to: kolling's message of 8 Sept. 1981 3:16 pm PDT (Tuesday) To: kolling cc: MBrown, Taft blush, I overlooked that. yes, the contention is not too bad over the allocation totals. i don't feel strongly about the delete protection thing. since Ed likes your design, go with it. i wasn't trying to throw a monkey wrench into your design, it just seems that way! now i understand it all much better. --mark *start* 00639 00024 US Date: 8 Sept. 1981 4:19 pm PDT (Tuesday) From: MBrown.PA Subject: Re: AccessLists In-reply-to: Your message of 8 Sept. 1981 4:06 pm PDT (Tuesday) To: kolling cc: mbrown I was hoping that there would be some conventional way for a client to say owner and world, just as there is in Grapevine (the "*" RName, etc.) This would avoid having to learn some tiresome record constructor ... I don't think that a client gets any say about AlpineWheels, does he? If so, I would favor having him say it in a way that is compatible with the way for owner and world. See what Ed thinks, then do what seems right. --mark *start* 03110 00024 US Date: 8 Sept. 1981 4:39 pm PDT (Tuesday) From: Birrell.pa Subject: Re: Handles, authentication, etc. In-reply-to: Taft's message of 8 Sept. 1981 2:53 pm PDT (Tuesday) To: Taft cc: MBrown, Kolling, Birrell, Boggs, Nelson Some slight modifications of your description.... For "ConnectionID", Bruce and I have been using the less over-used term "ConversationID". It isn't really a connectionID anyway, because there can be multiple independent calls simultaneously in a single conversation (through a single interface). Our analog of "connectionID" is really [calling-host, calling-process]. Since the conversationID appears as an argument of the procedures in the DEFs file, the caller will need to provide one. I propose to use this value as the conversationID for the call (which implies to both hosts which encryption key to use), and then to pass the conversationID as argument to the call to the implementor. So the argument given by the user isn't "ignored". Some extra advantages of this scheme are that implementors can use the conversationID instead of a handle to locate instance data (particularly if we can allow the implementor to allocate the conversationID), and that within a particular interface not all procedures need to be encrypted. It also means that a single conversation can use multiple DEFs files. The problem with allowing the implementor to allocate the conversationID is that in the RPC runtime I need to map conversationID to encryption key (in the caller and the callee host). This requires that the conversationID be unique within a single host (mds?). If the implementor wants 32 bits, I'd need an extra 16 for uniqueness (on the wire, not as client-percieved arguments), which extends our already over-large per-packet length overhead. The advantage of allowing the implementor to allocate them is that it gives the implementor much freedom: he can use pointers, array indexes, or hash keys depending on what is best for the particular application. I think the advantages outweigh the disadvantages, but bear in mind that there is very little chance these days that we'll be able to squeeze 256 words of argument (or result) into a single packet. Packet length may also become the dominant performance factor on a Dorado using the 3MB Ethernet (but not the 10MB Ethernet). Bruce is still considering whether this scheme causes any undue problems for the stub generator; otherwise, it seems to be the best we have. On an independent topic, Bruce and I now think that providing (almost) the full semantics of SIGNALs and ERRORs is not difficult, so it should appear in an early second version (or possibly first version). The "almost" is because, if a signal/error is raised by the implementor in the server and that signal/error isn't declared on the DEFs file, we have no reasonable way to represent the signal on the calling host. Therefore we propose to translate any such signals/errors into a single ERROR declared publicly in the RPC runtime. This agrees with the exception methodology advocated by Roy and me, anyway. Andrew *start* 04796 00024 US Date: 8 Sept. 1981 5:17 pm PDT (Tuesday) From: Taft.PA Subject: Re: Handles, authentication, etc. In-reply-to: Birrell's message of 8 Sept. 1981 4:39 pm PDT (Tuesday) To: Birrell cc: Taft, MBrown, Kolling, Boggs, Nelson Thank you for correcting my misrepresentations. I've changed my mind about wanting to be able to assign ConversationIDs myself. I now realize that all the state information kept by the Alpine server is associated with either a transaction or an open file, identified by an ID or handle passed as a separate argument. So the ConversationID will be used only for validation and not for lookup; its only required property is local uniqueness. Therefore, I'm happy to let RPC be the source of ConversationIDs. With regard to SIGNALs and ERRORs, my present feelings on the subject are summarized in the attached message (which is a reply to a message from Mark which, for fairness' sake, I've excerpted also). If I understand your position correctly, I think we are having a violent agreement. In any case, I certainly agree that the only SIGNALs that cross machine boundaries should be ones declared in the remote interface. ------------------- Date: 25 Aug. 1981 4:56 pm PDT (Tuesday) From: MBrown.PA Subject: File Store document, version 5 To: Kolling, Taft cc: MBrown I am now trying to flesh out the treatment of errors in the interface. I am in general agreement with the stylistic conventions suggested in Roy's memo "Guidelines for Signalling in Cedar" (ivy]Levin>Cedar>SignallingGuidelines.bravo.) But we need to figure out how to apply these ideas to our system. One general principle that seems useful: if an error detected by a procedure is likely to the the result of a sequence of calls outside the immediate scope of the caller, the error should be raised as an ERROR and not by a return code. Failure of credentials (except on Login itself) and transaction aborts are examples of this type of error. Abstraction failures represent a fairly well-delimited class. For instance, a create owner failure due to the owner database being full is an abstraction failure, as is a hard disk error. Is a SetLength or CreateFile failure due to out of disk quota an abstraction failure? I guess so. I am assuming that we shall not require SIGNALs in our interface, only ERRORs. Remote RESUME is too much to ask for right now. Andrew seems willing to provide remote ERROR in a timely fashion. A remote ERROR will unwind the remote call before we see the error locally. This means that what Roy calls "Client programming errors" (e.g. RName too long or containing an invalid character) must return enough information to diagnose the problem that the remote machine found. The type of error for which the correct interface style is unclear is "read page off end of file", "owner already exists in call to create owner", etc. Offhand I am inclined to report these as ERRORs although they do not fit neatly into any of Roy's categories (they come closest to being calling errors, but they clearly should be documented in the interface, which calling errors are not.) Maybe this means that I am wrong and these should really be reported by return codes. --mark ------------------- Date: 26 Aug. 1981 5:18 pm PDT (Wednesday) From: Taft.PA Subject: SIGNAL, etc. In-reply-to: MBrown's message of 25 Aug. 1981 4:56 pm PDT (Tuesday) To: MBrown cc: Kolling, Taft Your suggested treatment of SIGNALs across a remote interface seems like the only reasonable one from the point of view of robustness of the server -- independent of whether or not RPC is eventually able to support SIGNALs in all their generality. I think you are effectively proposing that RETURN WITH ERROR be the only allowed use of SIGNALs across a remote interface. It seems like the standard Mesa semantics of RETURN WITH ERROR are exactly what we want. That is, it unwinds the callee's stack; but catching the SIGNAL, passing parameters, etc., are done in the conventional way from the client's point of view. Can we get the RPC guys to give us these semantics, perhaps even with the same syntax? As you say, for debugging purposes this treatment may be somewhat of a nuisance. But I think there is an easy solution. The top-level procedures in the server (i.e., the ones exporting the remote interface) will have to catch all client programming errors and abstraction failures coming up from below and turn them into RETURN WITH ERROR on remotely exported SIGNALs. The trick is to have the top-level procedures catch these errors conditionally, based on some global debugging switch. If you turn on the debugging switch, these errors will not be caught, and the server will land in CoPilot with the server's stack still intact. Ed ------------------- *start* 00420 00024 US Date: 8 Sept. 1981 5:44 pm PDT (Tuesday) From: Taft.PA Subject: OpenFileMap and LogMap, version 1 To: MBrown, Kolling cc: Taft This version addresses all Mark's comments and also incorporates the latest thinking about RPC client validation. There are no other important changes. I propose next to produce a complete interface, and also to think about a possible TransactionMap interface. Ed *start* 01455 00024 US Date: 8 Sept. 1981 5:49 pm PDT (Tuesday) From: Birrell.pa Subject: Re: Handles, authentication, etc. In-reply-to: Taft's message of 8 Sept. 1981 5:17 pm PDT (Tuesday) To: Taft cc: Birrell, MBrown, Kolling, Boggs, Nelson But you presumably still need to be able to map ConversationID's into user names? I may not have been clear about exceptions. What I think is not difficult is to provide is precisely the local semantics of SIGNALs and ERRORs, including use of RESUME and the order of events for unwinding. (I propose to do this by a call-back style of arrangement.) The only variation will be the handling of signals/errors coming out of an interface but not defined in that interface, which will all be converted into RPCSignals.UndefinedError. If the caller ends up in the debugger with an uncaught signal, the server call stack will still exist (subject to the extent to which you employ RETURN WITH ERROR). Of course, the server may apply a timeout to the call quite rapidly . . . . It's possible that I'm wrong about the easiness of this, so as a fall-back position the SDD work provides an existence proof of supporting ERRORs only, with the slightly strange unwind sequence. One thing I'm not sure about is whether RPCSignals.UndefinedError should be raised in the server (where it will presumably be uncaught) or dutifully passed back to the caller and raised there. Maybe we need a debugging switch. Andrew *start* 01260 00024 US Date: 9 Sept. 1981 3:18 pm PDT (Wednesday) From: kolling.PA Subject: FS interface To: Taft cc: kolling There's one thing we need to clear up about the FS interface. AC carries around AccessLists that look something like: owner, world, AlpineWheels: BOOLEAN, plus two RNames. FS defines an AccessList as a LIST of RNames. So we need some way for the FS client to say (a) world, (b) owner, and (c) although we may not document this in the interface, AlpineWheels. Mark pointed me to the Grapevine conventions and asked me to run them past you. What do you think of allowing the FS LIST to contain the "distinguished" names Owner, World (also *), and AlpineWheels as well as the two other RNames. Caps don't matter, of course. Note that we do need the AlpineWheels field because it is typically turned off for file modify to prevent wipe outs. The FS stuff should probably note that the following minimums are ORed in, and the defaults are as follows (disregarding AWs): File access: default minimum read world owner modify owner null File access list access is setting of appropriate file access plus owner and current owner-create list. Owner access: default minimum create owner owner read/set owner owner Karen *start* 01694 00024 US Date: 9 Sept. 1981 4:30 pm PDT (Wednesday) From: Taft.PA Subject: Re: FS interface In-reply-to: Your message of 9 Sept. 1981 3:18 pm PDT (Wednesday) To: kolling cc: MBrown, Taft I agree with the notion that clients of FileStore (and perhaps of AccessControl also) should traffic in lists of RNames uniformly, and that the optimized internal representation of certain RNames should be hidden. I also have no problem with introducing distinguished "RNames" that are interpreted by Alpine rather than being passed on to Grapevine -- though we should limit the extent to which we do this, since such "RNames" can't be members of Grapevine groups. I think the "uniform" approach can be pushed a bit further, as follows: instead of having a distinguished RName "Owner", how about letting the FileAccessList.owner bit stand for the actual owner's name. For example, if the owner of file Foo is "Taft.pa", and I ask to change the read access list of Foo to ["Taft.pa", "Kolling.pa"], then "Taft.pa" should be represented internally by setting the owner bit, and only "Kolling.pa" should actually be stored as a string. Obviously, the inverse thing should happen when converting a FileAccessList to a list of RNames. A similar argument can be made for AlpineWheels. I think we've agreed that each server will have an independently-settable RName for local wheels. Suppose we have a server named "S" whose wheel RName is "S.Alpine". Then a reference to "S.Alpine" in a client-supplied list of RNames should operate on the FileAccessList.alpineWheels bit. There is no similar argument with respect to "World" or "*", since they don't stand for any particular RName. Ed *start* 00467 00024 US Date: 11 Sept. 1981 7:27 pm PDT (Friday) From: kolling.PA Subject: AccessControlDesign7 To: Taft cc: MBrown, kolling AccessControlDesign7.bravo incorporates the latest changes and lives on Alpine>doc. It has solutions for all the problems I know about (I think), except two I am currently ruminating on (a slight problem with releasing locks, and all the handshaking that goes on during read/write properties that are access lists). Karen *start* 01092 00024 US Date: 14 Sept. 1981 9:04 am PDT (Monday) From: Taft.PA Subject: AccessControl To: Kolling cc: MBrown, Taft It seems somewhat redundant for AddOwner to take the space quota and access lists as arguments given that there also exist ChangeOwnerSpaceQuota and ChangeOwnerAccessList procedures. A cleaner factoring of function would be for AddOwner simply to create the new owner with zero quota and empty access lists. The client may then call the ChangeXXX procedures as necessary. (I don't think there are any access permission problems here: the client must be an AlpineWheel to call AddOwner and therefore will certainly be able to call ChangeXXX.) Also, is there enough flexibility in the AccessControl design for us to consider associating additional information with each owner entry? I'm thinking that it might be worthwhile to have some default file properties. That is, when a file is created for an owner, the file's properties (e.g., access control lists) are initialized to values specified in the owner entry rather than to system-wide defaults. Ed *start* 00753 00024 US Date: 14 Sept. 1981 10:50 am PDT (Monday) From: Kolling.PA Subject: Re: AccessControl In-reply-to: Taft's message of 14 Sept. 1981 9:04 am PDT (Monday) To: Taft cc: Kolling, MBrown I think it would be a bother to have to call two different procedures to do one thing, i.e., create a new owner. Presumably AddOwner and ChangeOwnerSpaceQuota will not duplicate code unnecessarily. We kicked around tucking file defaults in the owner entries at one point but decided against it, as I recall for two reasons: (1) not tying the owner and file stuff together any more than is necessary (already unfortunately we have crept away from this objection somewhat), and (2) it would bump the owner record space over one page size. Karen *start* 00309 00024 US Date: 14 Sept. 1981 11:19 am PDT (Monday) From: Kolling.PA Subject: Re: AccessControl In-reply-to: Kolling's message of 14 Sept. 1981 10:50 am PDT (Monday) To: Kolling cc: Taft, MBrown Also, having two calls is less optimal internally, since I have to snarf up the owner page twice. *start* 02261 00024 US Date: 14 Sept. 1981 12:04 pm PDT (Monday) From: Kolling.PA Subject: some access control problems To: mbrown, taft cc: kolling 1. Awhile back I asked for the ability to release read locks on pages of the owner database file, for use by enumerations in the unfrozen mode. There is a problem, I think, with that however. Since locks are in the name of the transaction, suppose an AlpineWheel (in AlpineWheel mode) tries to do something directly with the owner database file and calls an unfrozen enumeration at the same time. Theoretically he/she could be depending on data that gets written over. What to do? (See related question #3, which might give us a way out.) 2. When AC goes after a page of the owner file, it first write locks it, then gets it in some fashion (from buffer or read), then looks to see if it is really the page it wants (it might not be because of collisions, etc.). If it isn't the desired page, AC steps to the "next" page, leaving a write locked page behind it, etc. This is normally fine, as the owner file is set up to be sparse. But if the owner file gets really full, searches for non-existent owners or searches for empty pages will lock many pages. (a) We could say, tough bunny, the system administrator should call ReorganizeOwnerDatabase. (b) Alternatively, I could simply have a call on Locks to release the write lock because I promised I didn't dirty the page. I could avoid getting zapped by an intervening reorganize (no other writes can cause me any trouble) by not releasing the lock on page n until I have page n+1. (I don't want to get a read lock initially and then upgrade it, because of efficiency, interactions become more complex.) If it weren't for the problem in #1, I would vote for (b), because it only takes a few lines of code and two years from now system performance will degrade and no one will realize why it is happening. As it is, I'm not sure which is preferable. 3. Also, what will OpenFile do if it sees an open request with all the same parameters (if the clientName is the same, is the FS.Handle returned by FS.Create the same?)? I'm just about to read Ed's recent memo and your comments on it, so perhaps you have already answered this..... Karen *start* 02215 00024 US Date: 14 Sept. 1981 12:21 pm PDT (Monday) From: Taft.PA Subject: Re: AccessControl In-reply-to: Kolling's message of 14 Sept. 1981 11:19 am PDT (Monday) To: Kolling cc: Taft, MBrown I think it's silly for us to consider efficiency in this case -- after all, how often do you change the owner data base? This particular issue is unimportant enough that I'm not going to push it any more. But in general, I think the interfaces exported by Alpine should be lean and compact: provide one way to do every required function, and let the client implement combinations of functions as he sees fit. On another topic, I've discovered a problem with the way EnumerateAllOwners is specified. I presume you intended that the FileStore remote interface would include ContinuationKey as an exported type and initialContinuationKey as an exported variable. (Actually exporting the definitions from AccessControl would introduce an uncomfortable compilation dependency.) Unfortunately, RPC doesn't support exported variables, so there's no way for the client to get his hands on initialContinuationKey except by some sort of LOOPHOLE. It seems to me that in this case the ContinuationKey can be done away with altogether; instead, use the owner name itself as the key. That is, replace the current EnumerateAllOwners with something like: GetNextOwner: PROCEDURE [ ... previousOwner: OwnerName _ NIL ... ] RETURNS [ ... nextOwner: OwnerName, ... & its properties ... ]; To start the enumeration, the client passes previousOwner=NIL; and GetNextOwner returns nextOwner=NIL when passed the "last" owner in the data base. Of course, this means that the implementation has to effectively look up previousOwner in order to find the right place in the enumeration. I can't imagine that efficiency will be an issue when enumerating the owner data base; but if it turns out to be, there are some obvious optimizations (e.g., the implementation can remember the name of the owner that it gave out most recently, along with its physical position in the data base). This is the Pilot-style "stateless enumeration", which I have come around to for various reasons that I will be happy to explain. Ed *start* 00389 00024 US Date: 14 Sept. 1981 12:30 pm PDT (Monday) From: Kolling.PA Subject: Re: AccessControl In-reply-to: Taft's message of 14 Sept. 1981 12:21 pm PDT (Monday) To: Taft cc: Kolling, MBrown I'll chance over to your form of Enumerate; I had it set up with the continuationKey to avoid the double read, but caching the last thing will almost always avoid that extra read. *start* 00375 00024 US Date: 14 Sept. 1981 12:32 pm PDT (Monday) From: Kolling.PA Subject: Re: AccessControl In-reply-to: Taft's message of 14 Sept. 1981 12:21 pm PDT (Monday) To: Taft cc: Kolling, MBrown My objection to spliting add owner is not so much the efficiency, as the fact that I think the current calls directly correspond to the things a user will want to do. *start* 00920 00024 US Date: 14 Sept. 1981 12:43 pm PDT (Monday) From: Taft.PA Subject: Re: some access control problems In-reply-to: Kolling's message of 14 Sept. 1981 12:04 pm PDT (Monday) To: Kolling cc: mbrown, taft 1. "Unfrozen" enumeration seems like a bad idea, for the reason you state. Perhaps it should be flushed. If the client REALLY doesn't care about the consistency of the enumeration with respect to anything else (as may sometimes be the case), he can do it under the "nil" transaction. 2. I concur with your strategy (b). Releasing locks based on internal knowledge about the state of the abstraction is perfectly ok -- I just don't want such a capability exported to the client. 3. Each OpenFile creates a distinct handle. I don't think I'd want it any other way -- otherwise there would be horrible confusion if a single client ran multiple programs that happened to open the same file. Ed *start* 00375 00024 US Date: 14 Sept. 1981 12:49 pm PDT (Monday) From: Kolling.PA Subject: Re: some access control problems In-reply-to: Taft's message of 14 Sept. 1981 12:43 pm PDT (Monday) To: Taft cc: Kolling, mbrown Yes, but releasing locks for 2 has the same problem as releasing locks for unfrozen enumeration; in fact, it is worse, because it releases write locks. *start* 02265 00024 US Date: 14 Sept. 1981 1:02 pm PDT (Monday) From: Nelson.PA Subject: ConversationIDs To: Birrell, MBrown, Taft, Kolling, Boggs cc: Nelson I have now read through the msgs from last week. Andrew's final one stated our position accurately: From the RpcRuntime's point of view, (resumeable) signals and callback procs are already handled. From Lupine's view, the hair has not settled, but it looks promising. Only transparent syntax and semantics are planned, not anything funny. The speed of signals, however, is another matter. On conversationIDs, as I understand it, Lupine's job is simply to look for one in every client argument list. If there is one, in addition to being marshaled in the normal way, it will also be passed to the client's RpcRuntime at the start of the call. I propose the following syntactic arrangments: Assume that the ID is called Conversation.ID (separate interface). Lupine will allow any number of TYPE defs such as SecureChannel: TYPE = Conversation.ID; RpcChannel: TYPE = SecureChannel; If an interface proc has an arg list of the form GetPage: PROC [ ..., channel: RpcChannel, ...] RETURNS [...]; then the 'channel' arg will get passed to the RpcRuntime as the ID. Note that it need not be the first arg, altho this leaves open what happens if there is more than one--either error, or use the first (prob. error). Or we could decide that it must be first. Also note that it is not allowed to be embeded in anything; it must be a top-level argument. If desired, we could get more mileage from IDs by declaring them like Conversation.ID: TYPE = RECORD [ id: PRIVATE ..., handle: REF (or LONG POINTER) ]; where 'handle' is for the client. Not clear exactly what this buys; Lupine can't send handle^ because the referent type is unknown. Perhaps this is a bad suggestion. On RpcSignals.UndefinedError, I believe it should pass through by default, but there should be an RPC.ExportInterface optional argument that can override this. Go for local and remote transparency when in doubt, but leave the needed hooks for robust server operation. I would also rename this ERROR to RpcSignals.UncaughtRemoteSignal. Less confusing to passersby (and me) who stare at the debugger. Bruce *start* 00433 00024 US Date: 14 Sept. 1981 1:06 pm PDT (Monday) From: Kolling.PA Subject: mind slip To: taft,mbrown cc: kolling "of course" (now that I think of it, sigh), the transaction where an AlpineWheel munges the owner file will get screwed up regardless of whether or not AC releases the locks, since the locks don't protect the client and AC from each other anyway, so I don't suppose it matters from that point of view. *start* 00442 00024 US Date: 14 Sept. 1981 1:10 pm PDT (Monday) From: Kolling.PA Subject: no transaction To: taft, mbrown cc: kolling I haven't been able to find the latest documentation on the "nil" transaction. Are we actually going to support this? If so, I assume AC should be prepared for calls with some distinguishing NIL trans and what it should do with these is carry them out immediately (i.e., action + phase 1 and 2)? Karen *start* 00710 00024 US Date: 14 Sept. 1981 1:35 pm PDT (Monday) From: Taft.PA Subject: Re: some access control problems In-reply-to: Kolling's message of 14 Sept. 1981 12:49 pm PDT (Monday) To: Kolling cc: Taft, mbrown I claim that allowing AccessControl to release locks internally (yes, even write locks) is qualitatively different from providing such a capability to the clients. Maintaining the consistency of the AccessControl abstraction is the only thing that matters. The hash probe path by which the implementation locates an owner record is not part of the abstraction, so it doesn't need to be locked. You need to lock only those records that you actually read out and give to the client. Ed *start* 00472 00024 US Date: 14 Sept. 1981 3:16 pm PDT (Monday) From: kolling.PA Subject: releasing write locks is a no-no. To: taft, mbrown cc: kolling Here's an reason AC can't release writelocks whilst searching down a hash chain: suppose the page it is releasing the lock for (because it thinks it isn't interesting) is actually a page it wrote previously in this transaction. Gorp. So I guess we'll just have to degrade as the owner database files become full. *start* 01390 00024 US Date: 14 Sept. 1981 5:34 pm PDT (Monday) From: Taft.PA Subject: FileStore memo(s) and defs files To: MBrown, Kolling cc: Taft I've revised the FileStoreDesign memo and split it into two parts: "FileStore public interfaces", intended for clients, and "FileStore interface internals", intended for implementors. These documents are stored on [Ivy]doc>FileStore6.bravo and FileStoreInternal0.bravo. I've also split up the interface into several pieces (for reasons described in "interface internals"). The current partitioning is quite lopsided and may not be quite right, but we can shuffle things around easily enough. Also, I have produced Mesa defs files, and stored them on [Ivy]defs>. Included are two semi-hypothetical interfaces, Authenticate and RPC, intended to be implemented by Grapevine and RPC respectively. (Semi-hypothetical in that I have talked with Andrew about them, but they haven't actually been adopted by the RPC design group, and no doubt will change.) The set of defs files is described by AlpineDefs.df; we should endeavor to keep this DF file up-to-date so we can quickly update our individual disks with the latest stuff. These defs files are intended to be compilable now, but I haven't yet set up a Cedar world on my machine, and the Cedar compiler won't run on top of vanilla CoPilot. That's my next task. Ed *start* 01831 00024 US Date: 14 Sept. 1981 5:40 pm PDT (Monday) From: Taft.PA Subject: FYI: Alpine assumptions about RPC and Grapevine To: Birrell, Nelson, Boggs cc: MBrown, Kolling, Schroeder, Taft --------------------------- Date: 14 Sept. 1981 5:34 pm PDT (Monday) From: Taft.PA Subject: FileStore memo(s) and defs files To: MBrown, Kolling cc: Taft I've revised the FileStoreDesign memo and split it into two parts: "FileStore public interfaces", intended for clients, and "FileStore interface internals", intended for implementors. These documents are stored on [Ivy]doc>FileStore6.bravo and FileStoreInternal0.bravo. I've also split up the interface into several pieces (for reasons described in "interface internals"). The current partitioning is quite lopsided and may not be quite right, but we can shuffle things around easily enough. Also, I have produced Mesa defs files, and stored them on [Ivy]defs>. Included are two semi-hypothetical interfaces, Authenticate and RPC, intended to be implemented by Grapevine and RPC respectively. (Semi-hypothetical in that I have talked with Andrew about them, but they haven't actually been adopted by the RPC design group, and no doubt will change.) The set of defs files is described by AlpineDefs.df; we should endeavor to keep this DF file up-to-date so we can quickly update our individual disks with the latest stuff. These defs files are intended to be compilable now, but I haven't yet set up a Cedar world on my machine, and the Cedar compiler won't run on top of vanilla CoPilot. That's my next task. Ed ------------------------------------------------------------ There is a section at the beginning of each of the two memos describing how the client and server (respectively) intend to make use of the Authenticate and RPC facilities. *start* 00261 00024 US Date: 15 Sept. 1981 10:36 am PDT (Tuesday) From: Taft.PA Subject: FileStore defs files To: MBrown, Kolling cc: Taft ... have all been compiled successfully. The errors were all minor, so I have not bothered to print new listings. Ed *start* 00796 00024 US Date: 15 Sept. 1981 11:37 am PDT (Tuesday) From: Kolling.PA Subject: FileStore6.bravo To: taft cc: mbrown, kolling I've only skimmed this, so here are a few top of the head comments: Access control: 1. Are we going to allow the "distinguished names" Owner and World, in addition to *? 2. "A client who is the owner of a file is permitted to change the file's access control lists". So can anyone in the file modify access list and in the owner create list (I know this is awkward, but it is what fell out of our discussion...) Owner data base: 3. Clients may be interested in setting the owner create lists. Also, there are more than two pieces of info associated with the owner name. Also owner RNames do not have to be Grapevine recognizable (i.e., FooDocs). *start* 00299 00024 US Date: 15 Sept. 1981 12:11 pm PDT (Tuesday) From: kolling.PA Subject: AlpineEnvironment: RNames? Ropes? To: taft cc: mbrown, kolling Are we actually going to define owner names and client names as RNames? I think Mark wanted me to use Ropes; I'm a little fuzzy about this. *start* 01659 00024 US Date: 15 Sept. 1981 2:45 pm PDT (Tuesday) From: Taft.PA Subject: Re: FileStore6.bravo In-reply-to: Kolling's message of 15 Sept. 1981 11:37 am PDT (Tuesday) To: Kolling cc: taft, mbrown AccessControl: 1. I don't have any objection to having "Owner" be a distinguished RName equivalent to the owner's name itself. Remember, though, that when the client reads the property list back, "Owner"it will come back as "Taft.PA" or whatever. This seems perfectly ok to me. 2. I guess we have a misunderstanding. The whole point of the more complicated semantics for changes to access lists is so that an owner can give other clients modify access to the file without giving them modify access to the file's access lists. So I can give you modify access to one of my files, but that doesn't enable you to give Mark modify access to that file. You are right about anyone in the file's owner's create list being able to modify the access lists. This isn't in the statement you quoted (under "access control" in section 4) because the owner data base hasn't yet been described. However, the complete semantics (as I understand them) are given with the definition of WriteProperties, under "file properties" in section 5. (I'll number the subsections in the next revision of the memo!) Owner data base: I'll make OwnerName be a distinct type to emphasize that it is not necessarily a Grapevine RName. Ropes vs. strings vs. CedarXString vs. REF TEXT vs.....? I'm fuzzy about this too. Wouldn't it be nice if we could just say STRING and let the compiler produce whatever underlying representation seems to be in favor this week. Ed *start* 01726 00024 US Date: 15 Sept. 1981 4:21 pm PDT (Tuesday) From: kolling.PA Subject: Re: FileStore6.bravo In-reply-to: Taft's message of 15 Sept. 1981 2:45 pm PDT (Tuesday) To: Taft cc: Kolling, mbrown 1. Ref the definition of WriteProperties, under "file properties" in section 5: no way do we allow write owner. Owner is set once and for all at CreateFile. 2. "I guess we have a misunderstanding. The whole point of the more complicated semantics for changes to access lists is so that an owner can give other clients modify access to the file without giving them modify access to the file's access lists. So I can give you modify access to one of my files, but that doesn't enable you to give Mark modify access to that file." I thought the whole point was to prevent inadvertant overwriting of "one's own files" by being able to set the file modify list to Null, and therefore we needed to allow the access lists themselves to be modified by more than the people in the modify list. I have no problem with someone in the modify list handling that priviledge off to someone else. On the other hand, your way is easier to implement. I'll change over to your way, unless Mark objects. So my present belief is that FS will do the following: on read access list request: look in its openfiletable for permission. on write access list request: call AC.CheckWriteAccessFileAccessLists. Naturally, it doesn't have to do the latter if it is in CreateFile and has createfile permission. 3. I also see ClientNames, maybe these should be = whatever OwnerNames are. 4. Do we want elements of AccessLists to be RNames? (I use them in my cache [ClientName, element of access list].) KK *start* 00286 00024 US Date: 15 Sept. 1981 4:43 pm PDT (Tuesday) From: kolling.PA Subject: by the way.. To: taft, mbrown cc: kolling I'm keeping an in flux copy of AccessControlDesign8.bravo [Ivy]Doc>DRAFT> and will move it to [Ivy]Doc> if it ever stabilizes. Karen *start* 00942 00024 US Date: 15 Sept. 1981 4:54 pm PDT (Tuesday) From: Taft.PA Subject: Re: FileStore6.bravo In-reply-to: kolling's message of 15 Sept. 1981 4:21 pm PDT (Tuesday) To: kolling cc: Taft, mbrown 1. Actually, I think moving a file from one owner to another (without destroying and recreating the file) is a useful operation. In any event, I don't think AccessControl needs to make any special provision for this. Can't FileStore simply call ChangeSpaceViaOpenFileID[ ..., -nPages], then change the owner property, then call ChangeSpaceViaOpenFileID[ ..., +nPages] ? 2. You're right: I've invented a new justification for treating access control lists specially. 3, 4. I'll change AlpineEnvironment as follows: ClientName: TYPE = RName; OwnerName: TYPE = STRING; -- whatever flavor of string we adopt Elements of AccessLists are always Grapevine RNames, and I don't see any point in defining a new type for these. Ed *start* 00377 00024 US Date: 15 Sept. 1981 4:55 pm PDT (Tuesday) From: kolling.PA Subject: transID? To: mbrown cc: kolling Mark, in the buffer pool I have a field for the transID for each buffer. You said I maybe shouldn't use the real TransID here, but some index into something instead that was shorter and equivalent. What exactly is it that you want me to use? Karen *start* 00504 00024 US Date: 15 Sept. 1981 5:06 pm PDT (Tuesday) From: kolling.PA Subject: Re: FileStore6.bravo/changeowner In-reply-to: Taft's message of 15 Sept. 1981 4:54 pm PDT (Tuesday) To: Taft cc: kolling, mbrown IF FS does the work, I have no objection. It also has to check the create file permission for the new owner. The order should be: CheckAccessOwnerCreateThenAllocate[ ..., +nPages] Change the owner property. ChangeSpaceViaOpenFileID[ ..., -nPages] this catches no-nos first. *start* 00286 00024 US Date: 15 Sept. 1981 5:06 pm PDT (Tuesday) From: MBrown.PA Subject: Re: transID? In-reply-to: Your message of 15 Sept. 1981 4:55 pm PDT (Tuesday) To: kolling cc: mbrown Call it a TransactionMap.Handle, I'll get around to defining it later in the week. --mark *start* 00739 00024 US Date: 15 Sept. 1981 5:15 pm PDT (Tuesday) From: MBrown.PA Subject: Re: FileStore6.bravo In-reply-to: Taft's message of 15 Sept. 1981 4:54 pm PDT (Tuesday) To: Taft cc: kolling, mbrown 3, 4. Maybe we should define AlpineEnvironment.String = Rope.Ref, and then define ClientName = OwenerName = AlpineEnvironment.String. The point is, we want to traffic in whatever Cedar standardizes on, I think. A Cedar veneer over GrapevineUser is already available. Alternatively, we could stick with STRING (hopefully LONG STRING), but I don't see the advantage in this. 1. So to change owner, what credentials are required? Owner create to both old and new owner? Or just modify file plus create for new owner? --mark *start* 00437 00024 US Date: 15 Sept. 1981 5:20 pm PDT (Tuesday) From: kolling.PA Subject: Re: FileStore6.bravo/changeowner again In-reply-to: MBrown's message of 15 Sept. 1981 5:15 pm PDT (Tuesday) To: MBrown cc: Taft, kolling right, I forgot, the client doing change owner also needs file modify permission for the original owner (change owner is the equivalent of delete). (Of course, he needs file modify to do the write owner.) *start* 00325 00024 US Date: 15 Sept. 1981 5:30 pm PDT (Tuesday) From: kolling.PA Subject: errors vs. return codes To: taft, mbrown cc: kolling Who's working in the FS box? Do you want AC to ERROR or use return codes for client-of-FileStore errors such as ownerAlreadyExists on AddOwner, insufficientPrivilege, etc.? KK *start* 01570 00024 US Date: 15 Sept. 1981 6:58 pm PDT (Tuesday) From: Taft.PA Subject: Re: errors vs. return codes In-reply-to: kolling's message of 15 Sept. 1981 5:30 pm PDT (Tuesday) To: kolling cc: taft, mbrown Well, I seem to be working in the FileStore box right now. My preference is to have ERRORs for unexpected events, particularly if they are uncommon, but to have return codes for everything else. So, for example, a procedure like CheckAccessFile, whose main purpose is to check an access control list and return a yes/no answer, should return "yes" or "no", but should raise ERRORs for unexpected events such as no such file, can't talk to Grapevine, etc. It's often difficult to distinguish between caller errors and client-of-caller errors. However, in the case of FileStore, it seems to me that this distinction is almost irrelevant. FileStore can't allow any ERRORs from internal interfaces to escape to the client, since the client has no way to catch them. So it has to catch all exceptions arising from internal interfaces anyway, whether they are reported by ERRORs or by return codes. FileStore then deals with the exceptions in its own way (raising its own ERRORs when appropriate, etc.) The advantage of ERRORs, of course, is that FileStore can ENABLE just once to catch them anywhere in a particular context, rather than having to check the result of each call. The disadvantage is that ERRORs are quite slow when they occur. These considerations are what led me to the guidelines I suggested at the beginning of this message. Ed