File [_CD6_]<alpine>Doc>Mail>alpinejan82to5apr82.mail!1

*start*
00432 00024 US 
Date: 4 Jan. 1982 5:52 pm PST (Monday)
From: Kolling.PA
Subject: recovery
To: mbrown, taft
cc: kolling

If we can lose the data on a pack permanently, then recovery has to operate on a
per volume basis, otherwise losing a pack permanently will also permanently
block access to all the other volumes (just on that server? or in the world?) that
were involved with transactions touching that pack.....

Karen

*start*
02253 00024 US 
Date:  6-Jan-82 16:04:13 PST (Wednesday)
From: MBrown.PA
Subject: Recovery in the face of volume offline
To: Kolling, Taft
cc: MBrown

I've been giving this some thought, and I feel that the approach I suggested in our meeting is basically ok (taking multiple passes over the log until each volume has been online at least once during a successful recovery.)

From the log's point of view, the REAL goal of recovery is to allow log space to be re-used.  So we come to the question, how does the log tell that all the volumes have been recovered, if no single recovery finds all volumes online?  Recall that our current approach is to write a "completed transaction T" record to the log at some time after all updates of T have made it to disk (or conversely, all premature updates of an aborted transaction T have been undone.)  In order to fully automate the recovery process, this record would have to be expanded to include a list of the volumes for which T is complete (or not complete.)  (In an extreme design, we write such a record for each disk write under a transaction; breaking it down by volume is a step in this direction.)

I am not convinced that even this level of additional complexity is warranted.  Instead, the system can be provided, at restart, with a list of volumes that have already been recovered.  This list might even by typed in by the operator.  Any recovery action that references a volume in this list is considered to succeed without attempting to execute the action.  When the system fails in a recovery action due to a volume being offline, it adds that volume to an internal list and proceeds, types the name of the volume at the end of recovery, but does not write any "completed transaction T" records as a result of recovery.

An additional complication derives from the possibility that a transaction in the "ready" state has made updates to both online and offline volumes.  In this case, I see no alternative but for the operator to manually force the transaction outcome.

In summary: having too few drives is a bad deal.  For a real server, we should always have a drive for doing backup; perhaps that drive may be appropriated for the time required to make a clean recovery.

	--mark
*start*
00542 00024 US 
Date: 6 Jan. 1982 4:57 pm PST (Wednesday)
From: kolling.PA
Subject: Re: Recovery in the face of volume offline
In-reply-to: MBrown's message of 6-Jan-82 16:04:13 PST (Wednesday)
To: MBrown
cc: Kolling, Taft

Are we guaranteed (essentially) that we will never permanently lose a pack?  
(Are we writing the log in two places?)

Having the operator type in the list of volumes that have already been recovered
makes me extremely nervous.  One slip of the hand and the whole server's data
is potentially invalid.

Karen

*start*
01062 00024 US 
Date: 6 Jan. 1982 5:39 pm PST (Wednesday)
From: kolling.PA
Subject: Re: Recovery in the face of volume offline
In-reply-to: kolling's message of 6 Jan. 1982 4:57 pm PST (Wednesday)
To: kolling
cc: MBrown, Taft

I'm a little confused here.  Does the following work:

The system starts recovery.  Whenever it can't access a volume, it adds that
volume to its missing volume list (kept in volatile storage).  When it reaches the
end of the log, if there are any volumes in the missing volume list it prompts
the operator to mount as many of those volumes as s/he can, and then sweeps
the log again, only writing to the missing volumes.  If any are still missing at
the end of the log, it prompts and cycles again, etc. until recovery really
completes.  Then the operator never has to specify anything and can't mess up
by mounting the wrong volume, since presumably the volumeID is obtainable
from the volume by the software.  (The completed transaction T records can get
written if there are no entries in the missing volume list.)

Karen

*start*
02276 00024 US 
Date:  7-Jan-82 10:07:58 PST (Thursday)
From: MBrown.PA
Subject: Re: Recovery in the face of volume offline
In-reply-to: kolling's message of 6 Jan. 1982 5:39 pm PST (Wednesday)
To: kolling
cc: MBrown, Taft

You are not confused; something in the spirit of this proposal is certainly possible.

Whatever we do, I think it will pay minimize the distinction between the first pass of recovery (in your proposal, first pass tries to access all volumes that are mentioned anywhere in the log) and later passes (in your proposal, later passes try to access only volumes that are part of an exception list), since this distinction may be difficult to isolate in the recovery code (unless we add a distinguished field to each log record that tells what volume it references, so that the recovery manager can decode this much of every log record just as it can presently decode the type and transaction fields.)

My prejudice is also against a design that assumes that the server stays up continuously during the multiple retries, since implementing this sounds like added complexity and I think the case we are discussing is not very common. That is why I proposed a solution in which when recovery fails, it really fails (the server types a message, perhaps into a file, and dies), and when it finally succeeds the server just comes up and stays up.

Questions at this level are probably best deferred until nearer to implementation.  If we can agree now NOT to log individual transaction-volume completions as they occur in recovery, then I am happy for now.  I think that would an unjustified additional complexity.

I think the discussion of media recovery ("do we ever lose a pack?") is separate from this issue ("do we ever have too few drives?")  The short answer to the media recovery question has to be: if the log is intact, we use it (along with the backup system, whatever it may be); otherwise we go to the backup system alone, and lose the guarantee of transaction-consistency.  This is true whether the log is doubly-recorded or not: doubly-recoding only makes it less likely that backup needs to be consulted for updates that were protected by the log.  It is time to do some more design of the backup system, and I am working on it.

	--mark
*start*
00352 00024 US 
Date: 8 Jan. 1982 12:49 pm PST (Friday)
From: Kolling.PA
Subject: pilot question
To: levin
cc: kolling

The manual says "Any attempt to Kill a uniform swap unit is ignored."  This
seems to imply that there is no way to implement UsePages for a space that is
divided into uniform swap units.  Is that right?  If so, why?

Karen

*start*
00571 00024 US 
Date: 8 Jan. 1982 1:32 pm PST (Friday)
From: Levin.PA
Subject: Re: pilot question
In-reply-to: Your message of 8 Jan. 1982 12:49 pm PST (Friday)
To: Kolling

The implementation of uniform swap units is a kludge:  there is NO information
stored on a per-swap-unit basis.  Thus, Pilot can't remember that one swap unit
of a space is dead while another is alive (dead = disk contents uninteresting).

I don't know what "UsePages" refers to, so I can't tell you whether it can be
implemented for a space that is divided into uniform swap units.

Roy

*start*
00524 00024 US 
Date: 11 Jan. 1982 1:42 pm PST (Monday)
From: Kolling.PA
Subject: one page stuff in FPM
To: mbrown, taft
cc: kolling

About the pinning problem regarding the log file (unpinning a space not
guaranteeing that the pages in that space are written out in order):  since we are
toying with the idea of special one page vm spaces for the leader pages of files,
maybe we could use these spaces for log writes?  I.e., something like
ReadOnePage/UseOnePage which would guarantee a one page vm space?

Karen


*start*
03047 00024 US 
Date: 11 Jan. 1982 6:25 pm PST (Monday)
From: MBrown.PA
Subject: Pilot mapped files
To: Levin
cc: Kolling, Taft, MBrown

Roy,

We are now trying to build Alpine's internal low level file system, FilePageMgr,
using Pilot.  We are more concerned than most Pilot clients, I suspect, with the
precise semantics of mapped files: under what conditions can Pilot decide to write
a page?  Does Pilot ever write out a swap unit that is not dirty?  If a space is
divided into uniform swap units and only some of them are dirty, are only the
dirty ones written?

Consider the following approach to managing the log, which is designed to place
as few demands on Pilot as possible.  Each log page contains a bit that is
reserved to say whether the page is "valid" or not.  The first store into a page
(really, into a page of VM that is mapped to the log page) marks the page
invalid.  Subsequent stores into the page write log records, without altering the
state of the "valid" bit.  After all log records have been written, the page is
marked valid.  Then the log manager proceeds in the same manner on
subsequent pages.

When the time comes to do a synchronous log write, the log manager performs a
Space.ForceOut on each space whose written/unwritten status is unknown.  The
pages go out in no particular order, but a sequence number on each page
(incremented each time the log wraps around) ensures that this does not matter.

One attraction of this design is that it avoids our need to pin things in real
memory.

The only potential problem that I can see in this scheme comes if Pilot decides
for some reason to rewrite a already-written page, and the redundant write fails
in the middle.  (This is a rather low probability event in any case, but we'd like
to avoid it if we can.)  I can see no reason for Pilot to do this, since the page
will never be dirty after it is first written in a valid state.  But uniform swap
units have always been something of a mystery to me, so I'd like to check on
this.  If necessary we can just avoid uniform swap units: we aren't
creating/destroying spaces on the fly, mainly just changing their mappings,
forcing them out, etc.

A slightly related question concerns the note in the Pilot Programmer's manual,
page 130, on Space.Deactivate.  The note seems to imply that Deactivate triggers
disk write activity if the space is presently in real memory and dirty, but does
not cause the real memory to be freed.  This is just what you want for doing a
forced log write that spans multiple swap units: first Deactivate each one
(generate the disk commands), then ForceOut each one (wait for the commands
to finish.)  (This behavior of Deactivate may not be what the rest of Alpine
wants, however.)

Another question concerns the implementation of dual logging.  I wonder
whether it is better to use mapped I/O and two sets of buffers, or only one set of
buffers with CopyOut.  The main problem I see with CopyOut is that there is no
asynchronous version, analogous to Space.Deactivate.

	--mark

*start*
01063 00024 US 
Date: 12 Jan. 1982 9:10 am PST (Tuesday)
From: Levin.PA
Subject: Re: Pilot mapped files
In-reply-to: MBrown's message of 11 Jan. 1982 6:25 pm PST (Monday)
To: MBrown
cc: Kolling, Taft

1)  I believe that Pilot will never write a clean swap unit to disk.  A swap unit
is clean if every page in it is clean, otherwise it is dirty.  Uniform swap units
are the same as non-uniform ones; if any page is dirty, the entire swap unit is
considered to be dirty and consequently the whole thing is written out when the
time comes.

2)  The note on page 130 of the Pilot Programmer's manual says "Deactivate
causes the space toe be asynchronously swapped out after rewriting any dirty
pages."  Thus the real memory IS freed after the write completes.  However, the
cost of a ForceOut on a space that is not in real memory (e.g., was just
deactivated) is minimal.

3)  The difference between CopyOut and ForceOut (i.e., read/write vs. mapped
I/O for logging) is minimal.  As for asynchrony, you can always FORK
CopyOut.  Am I missing something?

Roy

*start*
01295 00024 US 
Date: 12 Jan. 1982 9:38 am PST (Tuesday)
From: MBrown.PA
Subject: Re: Pilot mapped files
In-reply-to: Levin's message of 12 Jan. 1982 9:10 am PST (Tuesday)
To: Levin
cc: MBrown, Kolling, Taft

1) Good.  I also seem to recall that Pilot won't write out a pinned swap unit, even
if it is dirty and ForcedOut.

2) It sounds as though we would have to FORK parallel ForceOut calls to get the
behavior that I wanted from Deactivate.  The point is that once the log has been
written we typically turn right around and read it (carry out intentions); hence
we don't want the space to be swapped out, just cleaned up.

3) It is true that for writing the log CopyOut would work fine, since the log is
written in a very systematic way.  This would certainly eliminate all of the
paranoia about Pilot's possibly doing the wrong thing with the log.  But we are
trying to implement both the log and normal files through the FilePageMgr
interface.  Using Pilot's mapped interface has the advantage that the dirty bits of
the map can be used to identify the swap units of a space that have been
dirtied, rather than having the client (who knows, but can make mistakes)
supply this information.  Maybe we'll have to bite the bullet and add a separate
set of calls for the log.

	--mark

*start*
00543 00024 US 
Date: 12 Jan. 1982 10:22 am PST (Tuesday)
From: Levin.PA
Subject: Re: Pilot mapped files
In-reply-to: MBrown's message of 12 Jan. 1982 9:38 am PST (Tuesday)
To: MBrown
cc: Kolling, Taft

1)  Yes, a ForceOut of a pinned space is a no-op.

2)  As you might expect, there is an internal interface that cleans up a space
without releasing the memory behind it.  If it matters enough to you, I could
easily add a Space.Clean operation.  However, this sounds like a performance
optimization that can wait indefinitely.

Roy

*start*
00709 00024 US 
Date: 12 Jan. 1982 12:24 pm PST (Tuesday)
From: kolling.PA
Subject: more Pilot questions
To: levin
cc: kolling

Assuming an n-page parent space divided entirely into one-page uniform swap
units:

1.  Does Activate on the parent space gen one disk command or n?

2.  Assume the middle m pages are dirty.  Is there any dance I can do to cause a
write of the contiguous dirty pages in one disk command, without writing
non-dirty pages?  I.e., we need to avoid writing non-dirty pages to avoid
touching unlogged pages, but sequential writes will be very slow if it takes m
revolutions to write m pages (or is Pilot fast enough so that it wouldn't need a
rev between each command?)

Karen

*start*
00208 00024 US 
Date: 12 Jan. 1982 12:41 pm PST (Tuesday)
From: Levin.PA
Subject: Re: more Pilot questions
In-reply-to: Your message of 12 Jan. 1982 12:24 pm PST (Tuesday)
To: kolling

1) n.

2) No.

*start*
00497 00024 US 
Date: 12-Jan-82 12:56:23 PST (Tuesday)
From: Kolling.PA
Subject: Activate
To: MBrown
cc: Kolling

Roy says Activate doesn't do one command either, but that it's not worth sweating this stuff because in Klamath it will all be different (work right) anyway. Also, whether it manages to get contiguous pages without a rev in between is a function of the machine.  I'm going to fiddle with getting some timings a bit anyway, partly to learn more about running in Pilot.

Karen
*start*
01653 00024 US 
Date: 21 Jan. 1982 6:49 pm PST (Thursday)
From: Kolling.PA
Subject: I don't understand why the numbers are coming out like this.
To: mbrown, taft
cc: kolling

Here's my basic loop:

FOR index: CARD IN [0..1001)
  DO
  random: CARD ← RandomCard.Random[]/fudge;
  IF dirtyPage0 THEN pntr↑ ← 0;
  IF dirtyPage1 THEN pntr1↑ ← 0;
  IF dirtyPage2 THEN pntr2↑ ← 0;
  IF dirtyPage3 THEN pntr3↑ ← 0;
  IF dirtyPage4 THEN pntr4↑ ← 0;
  IF dirtyPage5 THEN pntr5↑ ← 0;
  UNTIL random = 0
    DO random ← random - 1; ENDLOOP;
  IF index # skipIndex THEN PerfStats.Start[timer];
  Space.ForceOut[space];
  IF index # skipIndex THEN PerfStats.Stop[timer];
  ENDLOOP;

NOTES:  0..1001 and skipIndex are to control skipping the first time thru the
loop.  fudge = LAST[CARD]/3000 gives an average random countdown of 24 ms. 
(a rev = about 20 ms).  I know from measurements that the time to do
RandomCard.Random[]/fudge and dirtying the pages is epsilon.  All the
measurements discussed here are for a 2 page space with the first page dirtied
each time thru the loop, no uniform swap units underneath.

RESULTS:  With the random stuff out of the loop, aver = 24, max = 143, min =
16.  With the random stuff in and fudge = LAST[CARD]/1500 (n.b., half a rev),
aver = 29, max = 144, min = 16.

QUESTIONS: What kind of synchronization can explain the average time with no
randomization SLIGHTLY GREATER than a rev?  (Remember that when no page
dirtying was done, the aver time was 3 or 4 ms.)  Why does the average time go
up when I put in an average delay of half a rev??  Why do I have the hard
minimum of 16ms (instead of 4 ms) over several runs?

*start*
01232 00024 US 
Date: 22 Jan. 1982 10:31 am PST (Friday)
From: MBrown.PA
Subject: Re: I don't understand why the numbers are coming out like this.
In-reply-to: Kolling's message of 21 Jan. 1982 6:49 pm PST (Thursday)
To: Kolling
cc: mbrown, taft

The random delay should be uniformly distributed with a range equal to some
multiple of the disk revolution time.  By choosing a large multiple (100
revolutions, say) you can make the experiment less sensitive to how well your
busy-wait loop matches the actual rotation time.  You may want to use the
random long integer generator to get these long delays.

In your test, the delay was uniformly distributed between 0 and 24 ms, but the
disk rotates in 20 ms.  So 1/6 of the time the random delay is just slightly greater
than one revolution; that would tend to bias the results toward higher values.

Unfortunately, this theory does not explain the large minimum value that you
observe.  I think we'll have to call in Maxwell to learn how to use some of the
performance tools; it is possible to get a complete trace of I/O activity with
something called "Ben", I think.  I think you should also try putting in larger
countdowns, and running the same test on Dorado.

	--mark

*start*
01319 00024 US 
Date: 22 Jan. 1982 12:20 pm PST (Friday)
From: kolling.PA
Subject: Re: I don't understand why the numbers are coming out like this.
In-reply-to: MBrown's message of 22 Jan. 1982 10:31 am PST (Friday)
To: MBrown
cc: Kolling, taft

I now believe that over a large range of delay times the average should be
slightly greater than one revolution time, because:

After initial stabilization occurs, if there is no delay, the computation will start
"right after" the write completes and it will finish (remember it is "small") in
time to get that page on that revoution, so the total time = exactly, more or less,
one rev.  Start adding a delay after the write completion that is a small time
compared to a rev, and the total measured time will decrease as the delay
increases, until the delay gets large enough to make the computation start late
enough to just miss being able to write the page in that rev, so as the delay
increases, the measured time for the forceout will vary from the worst case (comp
time + a little less than 1/28 rev + one rev) to the best case (comp time + 1/28
rev), for an average of comp time + 1/28 rev + 1/2 rev, approx.

This still doesn't explain the large minimum value or the isolated case where the
measured time is very long, both of which I am still looking at.

*start*
00380 00024 US 
Date: 22 Jan. 1982 12:36 pm PST (Friday)
From: kolling.PA
Subject: Re: I don't understand why the numbers are coming out like this.
In-reply-to: kolling's message of 22 Jan. 1982 12:20 pm PST (Friday)
To: kolling
cc: MBrown, taft

oops, make that "over a large range of delay times the average should be
slightly greater than one HALF a revolution time".

*start*
00947 00024 US 
Date: 25 Jan. 1982 4:58 pm PST (Monday)
From: kolling.PA
Subject: How fast can Pilot talk?
To: fiala, lauer
cc: mbrown, kolling, taft

Anyone wish to claim enough knowledge of Pilot to answer the following (I've
already talked to Roy and Ed Taft):  I have a file which is contiguous on the
Dorado disk, a space 4 pages long, and a loop as follows:

dirty all pages: 0, 1, 2, 3
ForceOut

and I measure the average, max, and min times (over 1000 iterations) it takes to
do the ForceOut.  If there are one-space uniform swap units under the space the
times are:

average: 62 ms, max 96 ms, min 54 ms

If there are NO one-space uniform swap units under the space the times are:

average: 13 ms, max 22 ms, min 5 ms

There is a random delay in the loop to avoid synchronization.

Question:  Even though the uniform swap units cause 4 disk commands to be
generated, should it really take 4 revs to get these four pages?

Karen


*start*
00568 00024 US 
Date: 26 Jan. 1982 9:19 am PST (Tuesday)
From: MBrown.PA
Subject: Re: How fast can Pilot talk?
In-reply-to: kolling's message of 25 Jan. 1982 4:58 pm PST (Monday)
To: kolling
cc: mbrown

I have been reading the implementation of ForceOut, and the top level of it looks
reasonable (it initiates all the I/O, then waits for it all to complete.)  I think that
by stopping the world at crucial instants and poking around (with CoPilot) we
should be able to learn more about this phenomenon.  Lets run your test on my
machine and do that.

	--mark

*start*
01017 00024 US 
Date: 27 Jan. 1982 8:30 am PST (Wednesday)
From: Ladner.PA
Subject: Re: How fast can Pilot talk?
In-reply-to: Kolling's message of 25 Jan. 1982 4:58 pm PST (Monday)
To: Kolling
cc: MBrown, Taft, SDD-Pilot↑

I don't know that much about the Dorado and the speeds of its components, and
I'm not sure I understand the question, but see if this helps ...

On a Dolphin or Dandelion, if 4 disk commands are issued for consecutive sectors
on the disk, 4 revs would be required to do the I/O.  The time to process a disk
command exceeds the inter-sector gap time by orders of magnitude.  Pilot
provides no disk command chaining for these two machines, and neither does the
microcode.  (Does this answer your question?)

Uniform swap units seem to add an onerous amount of overhead more or less
errr... uhhhh... uniformly across all space operations.  To understand why that is,
and in particular, why 4 disk commands were issued, I will have to let the
resident expert on uniform swap units answer.

*start*
01797 00024 US 
Date: 27 Jan. 1982 9:00 am PST (Wednesday)
From: MBrown.PA
Subject: Re: How fast can Pilot talk?
In-reply-to: Ladner's message of 27 Jan. 1982 8:30 am PST (Wednesday)
To: Ladner
cc: Kolling, Taft, Levin, SDD-Pilot↑, MBrown

We understand this phenomenon now, through a combination of inspecting
counts kept in the disk channel, typing control-swat at an opportune moment,
and reading code.

In response to your suggestions: the Dorado chains disk commands in microcode,
and is capable of transferring consecutive sectors with separate commands.  This
is demonstrated in the Dorado world-swap code, which runs the disk at full
speed.  We expected Pilot to generate the separate disk commands in the case we
were testing; we tested the case because we were curious about the performance
of this, compared to swapping the entire space as a unit.

Now, the answer: the implementation of ForceOut is synchronous on a per swap
unit basis when uniform swap units are involved.  Forcing out a space that has
been divided into uniform swap units causes the uniform swap units to be
enumerated, and for each dirty swap unit a write is initiated.  Unfortunately,
this write must complete before the next write on a swap unit of that space can
be initiated.  The reason is that the entire enclosing space (really swap unit) is
checked-out (in CachedRegion terms) when initiating the write, and not
checked-in until the write completes.  This holds up the next command that is
trying to do its check-out.

In the course of thinking about this problem, we also noticed that storage for
disk commands is statically allocated; in our Pilot, storage for two disk commands
is allocated.  So we would have missed at least one revolution anyway, waiting
to re-use the disk commands.

	--mark

*start*
01060 00024 US 
Date: 27-Jan-82  9:15:13 PST (Wednesday)
From: Knutsen.PA
Subject: Re: How fast can Pilot talk?
In-reply-to: Ladner's message of 27 Jan. 1982 8:30 am PST (Wednesday)
To: Ladner
cc: Kolling, MBrown, Taft, SDD-Pilot↑
Reply-To: Knutsen.PA

To continue Ladner's response...

Why were 4 disk commands issued?

Besides the fact that the microcode does not typically support command chaining, there is: "Spaces are the principal units with which virtual memory is swapped" --  Pilot Programmer's Manual. If you want it to swap as one unit, don't subdivide it into smaller swap units.

Why is there "an onerous amount of [compute] overhead for uniform swap units"?

In the current version of Pilot, uniform swap units are a wart on the swapper.  They were added as an "efficiency improvement" long after Pilot was designed and do not live comfortably within the data structures that Pilot has to manage swapping.  This will be fixed in the Pilot redesign -- in fact, uniform swap units will then be the most efficient kind of swap unit.

	Dale
*start*
00840 00024 US 
Date: 27-Jan-82 16:08:21 PST (Wednesday)
From: kolling.PA
Subject: this is what I needed to know:
To: mbrown
cc: kolling

1. Will any Alpine files (including the log file) be leader page less?  (If so, FPM has to think before it rejects page 0 in PageRun requests to the "normal" ReadAhead/Read/UsePages procedures.)

2.  What is it that you are proposing for log files?  If they use FPM to get a VMPageRun, and they intend to do the CopyOut themselves before calling FPM.ReleaseVMPageRun, they have to know how to convert the VMPageRun to a space.  Or are they to call FPM.CopyOut?  Or do all their operations themselves without using FPM at all?  (I think I prefer calls to FPM.CopyOut[vm: VMPageRun, fileID: FileID, pageNumber: PageNumber] so if something goes wrong, we can trap all the io in one place.)

Karen
*start*
00422 00024 US 
Date: 27 Jan. 1982 4:34 pm PST (Wednesday)
From: Kolling.PA
Subject: By the way
To: mbrown
cc: kolling

what is it about "the swap unit decision" that "impacts the log implementation"? 
Are you worried about it wiping out a page after the last page you've dirtied, so
there would be a problem in recovery?  How were you planning on handling
the mirroring anyway, by two sequences of calls to FPM?

*start*
01469 00024 US 
Date: 26 Jan. 1982 3:48 pm PST (Tuesday)
From: Taft.PA
Subject: FilePageMgr, version 4
To: Kolling
cc: MBrown, Taft

Basically it looks all right to me.  However, I think certain details should be
specified before implementation begin -- in particular, signals and errors.  The
exceptions signalled by each operation should be specified in the style of the
Alpine public interfaces.

For example, in implementing the FileStore operations, I need to know what
happens if I read a nonexistent file page.  Will I get notified in a clean way
when this happens?  Or do I have to first obtain the file's size and check each
incoming request in order to prevent an uncatchable AddressFault from
happening?

I assume that types that aren't defined in FilePageMgr, such as PageRun, are
obtained from AlpineEnvironment.

I'm slightly nervous about having main-line operations such as ReadPages
having to allocate collectable storage on every call (in order to return a LIST OF
VMPageSet).  Perhaps my nervousness is unwarranted; I'd welcome being
reassured on this point.

Why is ShareVMPageRun useful?

Delete, DeleteImmutable, and SetSize raise an "error if any of the file is
currently mapped".  I assume that by "mapped" you mean referred to by one or
more VMPageRuns with nonzero share counts.  The FilePageMgr should take
care of flushing VMPageRuns whose share counts have gone to zero,
synchronizing with any deferred writing activity, etc.

	Ed

*start*
01177 00024 US 
Date: 26-Jan-82 17:20:43 PST (Tuesday)
From: kolling.PA
Subject: Re: FilePageMgr, version 4
In-reply-to: Taft's message of 26 Jan. 1982 3:48 pm PST (Tuesday)
To: Taft
cc: Kolling, MBrown

1. Yes, signals and errors will be in, but I probably won't know what all of them are until the implementation is underway.  Neither do the "Alpine public interfaces" at this point, if they did, I could have finished the error handling in AccessControl already, sigh.

2.  Yes to AlpineEnvironment.

3. Likely I should keep a pool of storage for the VMPageRuns;  it will be on my list.

4.  I don't know about ShareVMPageRun, that's something Mark put in at one point.

5.  "Delete, DeleteImmutable, and SetSize raise an error if any of the file is currently mapped".  I assume that by "mapped" you mean referred to by one or more VMPageRuns with nonzero share counts."  Yes.

6.  "The FilePageMgr should take care of flushing VMPageRuns whose share counts have gone to zero, synchronizing with any deferred writing activity, etc."  Yes, this is what happens;  I thought the stuff at the end of the memo and at releasevmpageruns implied this, but maybe not.

Karen
*start*
02478 00024 US 
Date: 28 Jan. 1982 10:25 am PST (Thursday)
From: MBrown.PA
Subject: Re: FilePageMgr, version 4
In-reply-to: kolling's message of 26-Jan-82 17:20:43 PST (Tuesday)
To: kolling
cc: Taft, MBrown

I think it is ok for ReadPages et al to perform allocations.  The units being
allocated are fixed-length, meaning that the implementation of FilePageMgr can
create its own quantum zone to make allocations more efficient for objects of this
type.  I think it is worth deferring any other optimizations in this area in the
interest of keeping the interface clean.

I am not sure I have a specific need for ShareVMPageRun; I think one will arise
eventually.  It is a trivial procedure to implement (PageSets already need to have
share counts because two calls on ReadPages can reference the same PageSet.)  It
is ok by me to eliminate this procedure now, and add it back in if and when the
need arises.

For my part, I don't see the immediate need for ForceOutFile, but am willing to
take it on faith that we'll find an application later.

Pilot will never implement ReplicateImmutable.  Pilot has changed its entire
philosophy of file naming; file IDs are only guaranteed unique when qualified
by a volume ID.  This means that some higher-level directory structure is
needed to keep track of the fact that two immutable files have the same contents;
a unique ID comparison says nothing.  All of this implies that (1) there is no
need for ReplicateImmutable in FilePageMgr, and (2) all FileIDs in FilePageMgr
should become [volume ID, file ID] pairs.  (Since this is getting rather bulky to
pass around and re-represent everywhere, we may wish to have a centrailzed
data structure, shared by the OpenFileMap and the FilePageMgr, that turns long
file names into "atoms".)  Pilot's handling of multiple logical volumes right now
is very poor (you can't point it to a specific volume when you call Map), but
Trinity will give a moderate improvement in this, and Klamath should do it
right.

We should think a bit harder about how page I/Os and file size changes will be
synchronized.  If transaction A updates pages [10 .. 20) and commits, then
transaction B sets the file size to 15, how does SetSize interact with the deferred
updates?  This is not mainly a FilePageMgr question, since at its level there are
only two options: report an error or wait for the error condition (the writer) to go
away.  But it would be nice to have a handle on this problem.

	--mark

*start*
00578 00024 US 
Date:  3-Feb-82 18:39:29 PST (Wednesday)
From: Kolling.PA
Subject: ReleaseVMPageRun questions
To: mbrown
cc: kolling

1.  We've been saying things like "sequential access clients will release pages with the write behind/deactivate option, random access clients will release pages without the deactivate option", etc. but I don't see any way for people to do this through AlpineFile.....

2.  What type of clients (of FPM) do you expect to select what combinations of write = {writeAndWait, writeButDontWait, writeBehind} and deactivate: BOOLEAN?

Karen

*start*
00650 00024 US 
Date: 5 Feb. 1982 4:03 pm PST (Friday)
From: MBrown.PA
Subject: Re: FileHandles
In-reply-to: Kolling's message of 5-Feb-82 13:52:40 PST (Friday)
To: Kolling, Taft
cc: MBrown

I wonder whether it might turn out to be less work overall to give AccessControl
a side door for reading and writing pages of the owner database file without
opening it (i.e. give it access at a level where it just passes in a TransHandle and
FileHandle to perform read and write on a file, bypassing the open file map and
client map and transaction map.)  A lot depends on how neatly this fits in with
Ed's implementation of these actions.

	--mark

*start*
00390 00024 US 
Date: 5 Feb. 1982 4:11 pm PST (Friday)
From: kolling.PA
Subject: Re: FileHandles
In-reply-to: MBrown's message of 5 Feb. 1982 4:03 pm PST (Friday)
To: MBrown
cc: Kolling, Taft

As long as AC IO goes thru the locking mechanism, I don't care what door I
use.  How much locking I need depends on how much concurrency is removed,
which we haven't decided yet, I think.

*start*
00410 00024 US 
Date: 4 Feb. 1982 12:06 pm PST (Thursday)
From: Kolling.PA
Subject: Re: Pilot Redesign
In-reply-to: MBrown's message of 4 Feb. 1982 10:28 am PST (Thursday)
To: MBrown
cc: Taft, Kolling

I'm wondering if I should remove the references to immutable files from the FPM
interface now.  If we support immutability ourselves, wouldn't it be done at a
higher level as a file property?

Karen

*start*
00811 00024 US 
Date:  5-Feb-82 15:37:01 PST (Friday)
From: Taft.PA
Subject: Log.RecordType
To: MBrown
cc: Kolling, Taft

I suggest that the Log.RecordType enumeration be decentralized, along the lines of Pilot's FileType.  That is, in Log.mesa you say something like:

RecordType: TYPE = RECORD [CARDINAL];
TransactionRecordType: TYPE = RecordType [0..99];
FileRecordType: TYPE = RecordType [100..199];
AccessControlRecordType: TYPE = RecordType [200..299];
... etc.

Then in individual, more private defs files, you say things like:

workerBegin: TransactionRecordType = [7];
workerReady: TransactionRecordType = [8];
... etc.

It would be nice if this could be done with subranges of enumerated types rather than CARDINALs, but I don't think that's possible; I'll ask Satterthwaite to make sure.
	Ed
*start*
01912 00024 US 
Date: 5 Feb. 1982 4:51 pm PST (Friday)
From: Taft.PA
Subject: Satterthwaite's answers
To: MBrown, Kolling

From these I conclude that (1) the FileMap.Object record will have to be declared
in the interface, and (2) we will have to use the Pilot FileType approach for
decentralizing Log.RecordType (if we do it at all).

---------------------------

Date: 5 Feb. 1982 3:57 pm PST (Friday)
From: Satterthwaite.PA
Subject: Re: Inline entry procedures
In-reply-to: Your message of 5 Feb. 1982 10:45 am PST (Friday)
To: Taft
cc: Satterthwaite

The inline has to be able to see the declaration of the monitor lock.  So the best
you can do is something like

  Opaque: TYPE [n];
  SemiOpaque: TYPE = MONITORED RECORD [guts: Opaque]

but this introduces more problems than it solves, so I don't recommend it.

Ed

---------------------------

Date: 5 Feb. 1982 4:08 pm PST (Friday)
From: Satterthwaite.PA
Subject: Re: Decentralized enumerated types
In-reply-to: Your message of 5 Feb. 1982 3:46 pm PST (Friday)
To: Taft

You can use an enumerated type, but again the cure is usually worse than the
disease:

  RecordType: TYPE = MACHINE DEPENDENT {
     firstTransaction (0),
     firstFile (100),
     firstAccessControl (200),
     (1023)}   -- guarantee enough bits

  ....

  TransactionType: TYPE = RecordType[firstTransaction..firstFile);
  workerBegin: TransactionType = LOOPHOLE[7];
  workerRead: TransactionType = NEXT[workerBegin];

Unfortunately, this approach does not give workerBegin the nice scoping
properties of identifiers declared in the original enumeration.  About the only
advantage is that there are likely to be fewer values of the enumerated type
floating around, and the type doesn't have quite so many operations as
CARDINAL.  The Pilot folks get most (but not all) of this by using RECORD
[CARDINAL].

Ed

------------------------------------------------------------

*start*
02433 00024 US 
Date: 8 Feb. 1982 2:03 pm PST (Monday)
From: Taft.PA
Subject: Why LogMap interlocks are not needed
To: MBrown, Kolling
cc: Taft

I've completed a first cut at implementing ReadPages/WritePages and at
specifying some of the internal interfaces they depend on.  I'm now convinced
that no locking is required in the LogMap (except the LogMap monitor itself, for
maintaining internal consistency), at least for page-level operations.  This
depends on doing things in the right order, however.

When beginning a new operation, after setting locks, ReadPages and WritePages
consult the LogMap.  For each overlapping committed intention, it carries out the
intention and then deletes the entry from the LogMap.  Only then does it
actually do the requested operation, perhaps adding new intentions to the
LogMap.

What happens if two concurrent clients in the same transaction both see the
same uncommitted intention in the LogMap?  That's simple: they both carry it
out.  There's nothing logically wrong with that.

The only complication arises from the following sequence of events:

Client A: finds uncommitted intention in LogMap.
A: starts to carry out intention.
B: finds uncommitted intention in LogMap.
B: starts to carry out intention.
B: finishes carrying out intention.
B: deletes intention from LogMap.
B: puts a new intention for its own write into the LogMap.
A: finishes carrying out intention (the old one).
A: deletes intention from LogMap, thereby wiping out B's new intention.

There is an easy way to prevent this.  The LogMap Delete operation requires the
client to uniquely identify the LogMap entry to be deleted; that is, in addition to
the EntityKey (FileID and PageNumber in this case), the client provides some
unique identification (either the RecordID or the TransID will do; I prefer the
RecordID, since the client presumably knows it already).  The Delete operation
will do nothing if the unique identification doesn't match.

This eliminates bad interactions between committed and uncommitted intentions
in the LogMap.  There can't possibly be interactions between two uncommitted
intentions belonging to different transactions, since a Lock must be obtained
before registering such intentions.  If two uncommitted intentions for the same
transaction are registered, the later one overrules the earlier one; we've agreed
that this is ok.  There are no other cases I can think of.
	Ed

*start*
00984 00024 US 
Date: 8 Feb. 1982 2:21 pm PST (Monday)
From: Taft.PA
Subject: Lock
To: MBrown
cc: Kolling, Taft

1. The Lock interface needs an operation for finding out about existing locks. 
Otherwise there is no way to implement AlpineFile.UnlockPages, which is
specified to remove any read locks set by the current transaction but not to
disturb any other kinds of locks.

2. I don't understand the purpose of the reference counts described in the
comments in Lock.mesa.

3. LockNoWait should be abolished, and its function replaced by a "wait:
BOOLEAN" argument to Lock, which, if TRUE, causes Lock to WAIT in the
Transaction monitor.  If the caller needs to back out of monitors before waiting, it
can first call Lock with wait=FALSE; if this fails, back out of its monitors and
then call Lock with wait=TRUE (if appropriate).

[So far I have not needed to call Lock from within any monitors -- though I am
calling it with Work in progress, which I believe is ok.]
	Ed

*start*
00841 00024 US 
Date:  3-Feb-82 12:12:44 PST (Wednesday)
From: MBrown.PA
Subject: ReadPages outline
To: Taft
cc: Kolling, MBrown

ReadPages:
  OpenFileID -> OpenFileHandle (Enter/Exit OFM monitor)
  OpenFileHandle -> FileHandle, TransHandle, Conversation (Enter/Exit OFM)
  Check caller's Conversation
  StartWorking[t]  (Enter/Exit TransObject)
  Acquire locks if necessary (Enter/Exit Locks, perhaps TransObject)
  FileHandle -> LogMap (Enter/Exit LogMap)
    data in log or base?
    do deferred updates
  SELECT
    log => do read from log
      update if necessary (Enter/Exit FileMap)
      return value read
    base => return value from FPM (Enter/Exit FileMap)
  StopWorking[t]  (Enter/Exit TransObject)


OpenFileMap -> FileObject

FileObject contains VolumeID, FileID, nOpenFileHandles (incoming),
  -> LogMap, -> FPM map
*start*
00968 00024 US 
Date: 8 Feb. 1982 4:55 pm PST (Monday)
From: Taft.PA
Subject: Why LogMap interlocks may be needed after all
To: MBrown, Kolling
cc: Taft

Mark correctly points out that if my scenario is carried a bit further it will cause
trouble (new actions are marked by =>):

Client A: finds uncommitted intention in LogMap.
A: starts to carry out intention.
B: finds uncommitted intention in LogMap.
B: starts to carry out intention.
B: finishes carrying out intention.
B: deletes intention from LogMap.
B: puts a new intention for its own write into the LogMap.
=> B: commits its transaction.
=> C: carries out B's intention (C is a third client or a background process).
A: finishes carrying out intention (the old one).

Result: the later intention is clobbered by the earlier one, which is clearly
wrong.

I'd like to find a way out of this, short of locking individual LogMap entries (as
Mark proposed originally); but I haven't found one yet.
	Ed


*start*
01473 00024 US 
Date: 10 Feb. 1982 9:32 am PST (Wednesday)
From: Taft.PA
Subject: Locking file properties
To: MBrown, Kolling
cc: Taft

I currently see no difficulty in locking file properties individually, as opposed to
locking the entire leader page when any property is accessed.  Besides setting
individual locks, all that's required is to log changes to properties individually
and to serialize the actual leader page modifications with some monitor.  Unless
you see something wrong with this, I propose to go ahead and do this.

What led me to think about file properties was the realization that advancing the
high water mark (when writing on newly-allocated pages of a file) of course
requires setting an update lock on the high water mark property.  I think locking
the entire leader page in this case could cause serious loss of concurrency.

Also, during sequential writes, the high water mark is advanced during each
write.  So it seems undesirable to log each update to the high water mark. 
Instead, I should keep the uncommitted high water mark in volatile storage
associated with the file, and only log the new high water mark during phase one
of commit.

This seems like a good application for a File x Transaction map, which we
discussed briefly last week and decided to drop for the time being.  Of course, I
can also record this information in the LogMap, though doing so does stretch the
semantics of the LogMap a bit.  What do you think?
	Ed

*start*
01166 00024 US 
Date: 10 Feb. 1982 3:37 pm PST (Wednesday)
From: MBrown.PA
Subject: Re: Locking file properties
In-reply-to: Taft's message of 10 Feb. 1982 9:32 am PST (Wednesday)
To: Taft
cc: MBrown, Kolling

The other possible high-contention property of a file is the version number.  I
could be convinced that databases can bypass the high water mark stuff
altogether (or even that high water marks are not worth the trouble), but I am
pretty sure that version numbers are worth having.  So I think your design is a
good one.

By the way, Gifford says that for the purposes of his algorithms, it is important
to define a file's version number to be the number of committed update actions
performed on the file -- not the number of committed TRANSactions that updated
the file.  Also, a transaction in progress should see the version number as the
stable version number of the file, plus the number of update actions already
executed on the file for this transaction.  (This has to do with transactions in
which a write quorum is established several times independently.)  The file x
transaction map seems necessary for this information, too.

	--mark



*start*
02396 00024 US 
Date: 11 Feb. 1982 4:06 pm PST (Thursday)
From: kolling.PA
Subject: FPM
To: mbrown, taft
cc: kolling

I believe this is the functionality we arrived at in today's meeting:

1. For Release (ignoring the modifications necessary if there is another client):

options:  {writeIndividually, writeBatched, clean}, waitForWrite, reuse: BOOL.


clean =>                          IF reuse THEN lru ELSE mru.

writeBatched, wait =>           ForceOut after the completion of which {IF reuse
                                       THEN lru ELSE mru}.
writeBatched, dontWait =>      A demon will cause the writes at some appropriate
                                       time, after the completion of which writes {IF
                                       reuse THEN lru ELSE mru}.

writeIndividually, wait =>      ForceOut, after the completion of which mru.
writeIndividually, dontWait => Mru.  (leave write to the swapper).


I propose to ignore waitForWrite if clean.  I also propose to ignore reuse if
writeIndividually since it requires implementation effort if dontWait is set.  We
don't expect any client to be requesting these. 

Question: should "writeBatched, wait" start and wait for completion of the writes
for other pages of that file that are marked for writeBatched, or does it just apply
to the pageRun being released?

Below is the association we expect between clients and options.  (I'm a little
fuzzy about the log stuff).

                     write option:        wait option:    reuse option:

sequential read:    clean                 FALSE           TRUE
sequential write:   writeBatched        FALSE           TRUE

random read:       clean                 FALSE           FALSE
random write:      writeIndividually   FALSE           FALSE

log:                  writeIndividually   TRUE            FALSE
                      writeBatched        TRUE            FALSE
                      clean                 FALSE           FALSE


2. Because FPM will never take a chunk out of its fpm FileObject map until the
chunk has been Space.Unmapped, we can't be messed up by the client giving us
incorrect hints such as clean when not clean, except for performance problems.

3. Any request for a page > eof will cause an Error and no data returned.

4. Yes, page 0 will be interpreted as the first data page everywhere in FPM, at
Ed's suggestion.

Karen


*start*
01353 00024 US 
Date: 12-Feb-82 10:40:14 PST (Friday)
From: MBrown.PA
Subject: Alpine Volatile Maps
To: Taft
cc: Kolling, MBrown

This looks pretty good.  Some detail observations, which may or may not be valid:

1. Open file object "File lock" means "an element of {fileLocks, pageLocks}".  The FileInstanceObject holds the lock mode (R, W, IR, etc) in which the whole file is held by the transaction.  The mode held there is allowed to be weaker than the true mode.

2. Log map: the TransHandle of the updating transaction must be bundled together with the committed size and uncommitted size.  If leader page is updated as a unit, the it needs a special slot in the log map, but the rest of the log map is just the intentions tree.

3. Now I remember the other application for releasing locks: releasing a read lock on a file version number.  Actually, this is no different than the problem of releasing the read lock on the leader page that is obtained implicitly by Open (at least in my naive implementation of Open) in order to read the access control lists.  We need to deal with this.

4. If the tree of intentions is represented as a balanced binary tree, then an additional 5 words per node are required (two REFs and a balance bit.)  I think it is too early to be preoccupied with the space required for this data structure.

	--mark
*start*
00833 00024 US 
Date: 12-Feb-82 16:27:04 PST (Friday)
From: Kolling.PA
Subject: question
To: mbrown
cc: kolling

In RedBlackTreeRefImpl, it says "it is now unsafe for multiple processes to be inside of an instance of this module, even if they operate on distinct tables (it was always unsafe to perform concurrent operations on a single table)."  I think the FPM routines will generally look like this:

      mumble
      get object monitor protecting the fpm area of this
           FileObject, where the fpm area = map + other
           stuff.
      mumble
      OrderedSymbolTableRef.Lookup etc.
      mumble
      release object monitor protecting the fpm area.
      mumble.

This seems to imply that: the object monitor stuff should come out of RedBlackTreeRefImpl, and there should be a module monitor for it, yes?

*start*
00636 00024 US 
Date: 8 Feb. 1982 4:14 pm PST (Monday)
From: Kolling.PA
Subject: Re: Pilot mapped files
In-reply-to: Levin's message of 12 Jan. 1982 10:22 am PST (Tuesday)
To: Levin
cc: kolling

A while back, you mentioned:  "there is an internal interface that cleans up a
space without releasing the memory behind it.  If it matters enough to you, I
could easily add a Space.Clean operation.  However, this sounds like a
performance optimization that can wait indefinitely."

I would really like Space.Clean to implement a writeAndDontWait option.  Any
reason I can't just call that internal interface (which is it?)?

Karen

*start*
00755 00024 US 
Date: 29 May 1981 3:35 pm PDT (Friday)
From: Taft.PA
Subject: FilePageMgr
To: MBrown, Kolling
cc: Taft

It seems to me that the FileStoreID argument can be removed throughout.  The
Pilot interfaces identify files entirely by FileID, and the FilePageMgr should do
likewise.  I can't think of any way in which passing the FileStoreID will help
the FilePageMgr do its work.

By the way, Pilot seems to require that there be only one instance of a file with
a given FileID on any machine.  If we want to replicate a file with the same ID
on multiple volumes, this will cause trouble.  The highest level at which you
get to specify a Volume as well as a FileID is SubVolume.StartIO, which I don't
think we want to be messing with!
	Ed

*start*
02791 00024 US 
Date: 16 Feb. 1982 10:35 am PST (Tuesday)
From: MBrown.PA
Subject: Alpine style conventions
To: Kolling, Taft
cc: MBrown

Questions of Cedar/Mesa coding style are bound to arise as we start to write
Alpine.  There is no way that we'll achieve absolute uniformity in programming
style across three individuals, but we should try for as much commonality as is
practical.  Let me propose the following point of style:

	Minimize the use of OPEN.

Attitudes about OPEN vary widely, but it seems pretty clear that wide use of
OPEN can make code difficult to read.

Ed Satterthwaite, for instance, rarely uses OPEN except in the following
situation: a program module manipulates a set of record types so widely that a
programmer has no hope of understanding the program without understanding
the record types.  Then Ed may OPEN the definitions module containing the
record types.  Ed almost never OPENs a definitions module in order to get
procedures.  If he needs many procedures from an interface with a long name,
he will import the interface with an explicit short name (IMPORTS I: Inline, S:
Space, ... .)  Sometimes he gets record types without open by defining the same
names in the local scope (TransID: TYPE = AlpineEnvironment.TransID; ... .)

Another way to minimize the need for OPEN is to define clusters of operations
associated with types, invoking the operations with the "object" notation
(handle.Operation[args].)  Careful design of defs modules may be required in
order to define useful clusters.  With clusters, it is possible to achieve very
uniform and terse naming conventions.  For instance, suppose that a TransID can
be derived from a Foo.Handle.  Then in interface Foo, define

  TransID: PROC [self: Handle] RETURNS [AlpineEnvironment.TransID];

and if f is a Foo.Handle, write f.TransID to call the TransID procedure in Foo.

In my own programming I prefer to avoid OPEN altogether.  I find that by:

  1) defining commonly used types in the local scope using "TYPE =",
  2) using object notation wherever possible,
  3) where that is inappropriate, defining short names (generally one or two
characters and all caps) for instances of frequently-used long interface names
(IMPORTS S: Space),

I never feel the need for OPEN.

This also removes the need for USING in my DIRECTORY.  USING lists are
informative, but in my experience they add too much overhead to the program
development process (compilation errors, with generally uninformative messages)
to make them productive when actively developing a system.  There are tools for
adding USING lists to modules after the fact, so once a seciton of Alpine becomes
more static the lists can easily be inserted.  We can expect the tools for this to
improve as Cedar really gets going.

	--mark

*start*
01551 00024 US 
Date: 26 Aug. 1981 5:18 pm PDT (Wednesday)
From: Taft.PA
Subject: SIGNAL, etc.
In-reply-to: MBrown's message of 25 Aug. 1981 4:56 pm PDT (Tuesday)
To: MBrown
cc: Kolling, Taft

Your suggested treatment of SIGNALs across a remote interface seems like the
only reasonable one from the point of view of robustness of the server --
independent of whether or not RPC is eventually able to support SIGNALs in all
their generality.

I think you are effectively proposing that RETURN WITH ERROR be the only
allowed use of SIGNALs across a remote interface.  It seems like the standard
Mesa semantics of RETURN WITH ERROR are exactly what we want.  That is, it
unwinds the callee's stack; but catching the SIGNAL, passing parameters, etc.,
are done in the conventional way from the client's point of view.  Can we get
the RPC guys to give us these semantics, perhaps even with the same syntax?

As you say, for debugging purposes this treatment may be somewhat of a
nuisance.  But I think there is an easy solution.  The top-level procedures in the
server (i.e., the ones exporting the remote interface) will have to catch all client
programming errors and abstraction failures coming up from below and turn
them into RETURN WITH ERROR on remotely exported SIGNALs.  The trick is to
have the top-level procedures catch these errors conditionally, based on some
global debugging switch.  If you turn on the debugging switch, these errors will
not be caught, and the server will land in CoPilot with the server's stack still
intact.
	Ed

*start*
00563 00024 US 
Date: 22-Feb-82 12:36:10 PST (Monday)
From: Kolling.PA
Subject: There's one thing I noticed
To: mbrown
cc: taft, kolling

in the new FilePageManager algorithm.  Remember that it now doesn't do anything explicit about writes until the last user releases the chunk.  This means that the chunk is seen as being dirty only if the last user was a dirtier.  Consequently, for seq. access dirty chunks may be dumped in the lru list as lru, making subsequent people wait.  Shall I ignore this possibility, or keep a dirty bit to handle it?

Karen
*start*
00474 00024 US 
Date: 22 Feb. 1982 1:14 pm PST (Monday)
From: Taft.PA
Subject: Re: There's one thing I noticed
In-reply-to: Kolling's message of 22-Feb-82 12:36:10 PST (Monday)
To: Kolling
cc: mbrown, taft

This doesn't seem like a very likely problem.  But if it turns out to be, you can
always get the truth about whether a page is dirty by looking at the hardware
dirty bit.  There is a "friends" interface called PageMap that will get you this
information.
	Ed

*start*
02056 00024 US 
Date: 16 Feb. 1982 7:08 pm PST (Tuesday)
From: Taft.PA
Subject: Handles and objects once again
To: MBrown, Kolling
cc: Taft

I've converted AlpineClient, File, Lock, Log, Transaction, and VolatileMaps to
use the latest conventions for defining handles and objects.  After some
oscillation, we seem to have settled on the following:

When "public internal" defs files (i.e., those exported from one section of Alpine
to another) need to refer to each other's definitions, they instead obtain the
definitions from a common place, namely AlpineInternal (or AlpineEnvironment,
if also seen by Alpine clients).  This is straightforward for all definitions besides
objects.  To enable object notation to work, a RECORD [...] must be interposed.

For example, we have the following in AlpineInternal:

  TransHandle: TYPE = REF TransObject;
  TransObject: TYPE;

and in TransactionMap:

  Handle: TYPE = RECORD[AlpineInternal.TransHandle];

Any other "public internal" defs file that needs to refer to a
TransactionMap.Handle instead refers to an AlpineInternal.TransHandle; for
example, in FileMap:

  GetTransHandle: PROC [FileHandle] RETURNS [AlpineInternal.TransHandle];
  SetTransHandle: PROC [FileHandle, AlpineInternal.TransHandle];

However, in ALL other contexts (private defs files and ALL implementation
modules), one should refer to the "home" defs file rather than to AlpineInternal;
e.g.,

  trans: TransactionMap.Handle;

This enables you to use object notation for operations on the handles; e.g.,

  trans.StartWork[...];

The "home" and AlpineInternal types are inter-assignable where necessary, but
in the "home ← AlpineInternal" direction you must write extra brackets (a
temporary restriction until some future improvements are made in the Cedar
compiler's handling of type clustering).  I think this happens only when calling
procedures that return handles as results.  That is, you need brackets in:

  trans: TransactionMap.Handle ← [fileHandle.GetTransHandle[]];

but not in:

  fileHandle.SetTransHandle[trans];

*start*
03073 00024 US 
Date: 1 Feb. 1982 5:29 pm PST (Monday)
From: MBrown.PA
Subject: Concurrency within a transaction
To: Kolling, Taft
cc: MBrown

We had a short discussion of this topic at our last meeting; I want to capture this
in written form, and amplify it somewhat if I can.  I would like us to work on
this issue since it serves as a useful forcing function: if we understand the
design well enough to believe that concurrency within a transaction really
works, then we have made progress independent of whether or not we finally
choose to implement this feature.

1) Access control impl is programmed as much as possible like a client of
Alpine.  It makes use of Alpine's page locking service to serialize actions
involving a page.  If two processes working for the same transaction can possibly
call access control impl in a way that triggers an owner database access, then the
page locks will not serialize them.  (There may be other examples of this problem
within access control impl.)  

2) When the lock manager selects a victim in a deadlock cycle, then in the
presence of multiple processes it cannot "break" the lock immediately.  Instead it
must be content to mark the transaction "dying", and to prod any processes in
lock wait for the transaction.  A prodded proces must then inspect its transaction
state, and if it is dying must back out of the action that requested the lock.  No
lock may be broken at this point because other processes may be executing under
the assumption that they hold the lock.  The way to get the locks released is to
abort the transaction.  We can arrange this by (1) not allowing new client
actions to start (with exception of FinishTransaction), (2) making sure that all
actions complete in a finite amount of time, and (3) forking a call to
FinishTransaction[abort] on the offending transaction.  (The client may also call
this when he notices the problem; the implementation is prepared for the parallel
calls.)

3) We would like to document the semantics of concurrent actions within a
transaction.  In our discussions we have taken the position that the effect of two
parallel writes to a page should be to give the page one value or the other, not a
mixture of the two (and similarly for any mixture of reads and writes.)  This
seems to imply that all accesses to normal file pages (obtained from FilePageMgr)
must be protected by the transaction monitor.  By using inline entry procedures,
the overhead of this can be made rather small (about 36 Dorado cycles, or 2
microseconds, compared to 50 microseconds to BLT a page.)

Notice the following: if the monitor you acquire to do a page access is the same
as the monitor you acquire to do other serialization within a transaction (e.g. to
avoid having concurrent calls to access control impl) then there is a danger of
deadlock.  This means that either two monitors per transaction are needed, or that
all synchronization of nontrivial operations must be done by manipulating
semaphores within the transaction state, rather than by holding the monitor.

	--mark

*start*
00247 00024 US 
Date:  4-Mar-82 16:11:46 PST (Thursday)
From: Kolling.PA
Subject: suggestion
To: mbrown
cc: kolling

Maybe the thing to say is that "No change in the mapping state of a chunk is allowed while the chunk is on the lru list."
*start*
00550 00024 US 
Date:  8-Mar-82 13:09:20 PST (Monday)
From: Kolling.PA
Subject: Re: Read/WritePages limit?
In-reply-to: Taft's message of 8 March 1982 12:53 pm PST (Monday)
To: Taft, mbrown
cc: Kolling

If FPM can be asked for a gigantic number of pages in one run the vm allocated to FPM will run out.  Do we actually want to tolerate requests for 256 pages at a shot?  What size do you envision for FPM's vm pool?  (Note that this is different from the previously discussed cache problem where a client declares a seq. access file random.)
*start*
01236 00024 US 
Date: 8 March 1982 6:02 pm PST (Monday)
From: Taft.PA
Subject: Re: Read/WritePages limit?
In-reply-to: Kolling's message of 8-Mar-82 13:09:20 PST (Monday)
To: Kolling
cc: Taft, mbrown

If you want to enforce a smaller limit on the length of a run, then export it
through the FilePageMgr interface and I will abide by it (by cutting up my
client's request into smaller runs when necessary).

Perhaps there is a better way to arrange the interface to ReadPages (and
friends).  Instead of returning a LIST OF VMPageSet, it should return a single
VMPageSet describing some initial interval of the requested PageRun.  The caller
makes use of that, releases it, and calls ReadPages again on the remainder of the
PageRun if necessary.

This arrangement gives FilePageMgr complete freedom over how to break up the
requested PageRun, and permits it to deal uniformly with all considerations of
inconveniently-mapped files, maximum run length, and maximum number of
pages mapped simultaneously.  It also eliminates the need to cons up a LIST
during every call; for good performance, it seems desirable to eliminate
unnecessary allocations.

If you do make this change, the corresponding change in my code is trivial.
	Ed

*start*
00917 00024 US 
Date:  8-Mar-82 18:49:21 PST (Monday)
From: Kolling.PA
Subject: Re: Read/WritePages limit?
In-reply-to: Taft's message of 8 March 1982 6:02 pm PST (Monday)
To: mbrown
cc: Kolling, Taft

Ed and I discussed his message.  How do you feel about FPM returning just the chunk containing the beginning of the requested PageRun?  He says that since he would have to deal with the run length limit anyway, we might as well do it this way as it would save the LIST stuff at the expense of more procedure calls. This would solve the run length problem, but not the cache flooding (since that happens with useCount = 0), but that's okay as there is already a simple count catch in there for that.  Also, by "inconveniently-mapped", Ed didn't mean the SetLength when eof is mapped with useCount # 0 problem, but rather was postulating that I was melding chunks together for clients (which I don't).

Karen