Number: 1475

Date: 19-Jun-84 17':49':54

Submitter: Sannella.PA

Source: Peter Karp <KARP@SUMEX-AIM.ARPA>

Subject: Provide facilities for dealing with distributed files

Lisp Version: 

Description: '
Return-Path': <KARP@SUMEX-AIM.ARPA>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 11 JUN 84 15':01':37 PDT'
Date': Mon, 11 Jun 84 14':50':36 PDT'
From': Peter Karp <KARP@SUMEX-AIM.ARPA>'
Subject': Interlisp-D Directions'
To': 1100users@SUMEX-AIM.ARPA'
cc': KARP@SUMEX-AIM.ARPA'
'
I would be very interested in hearing Xerox''s thoughts/plans for how'
Interlisp is going to change to facilitate dealing with files in a'
distributed environment.'
'
For example, consider the environment in Stanford''s HPP.  I do most'
of my work on a Dolphin.  Source files for the system I work on are'
kept on our Vax 750 file server.  I also have access to a Dorado, which'
I occasionally use for computationally intensive tasks like building'
sysouts.  Soon I will probably spend some time working on a Dandelion'
as well.'
'
Interlisp basically has no way of dealing with multiple copies of a'
given file.  I.E., when I make multiple copies of a file I am forced'
to keep track of where all the different copies are and of what copy'
is the newest version.  This is annoying for two reasons.  First, it'
makes it difficult for me to deal with files for which I MUST make'
several copies.  Second, it inhibits me from making copies of files '
which I would LIKE to make several copies of.  In fact I see the latter'
category as being more important.  Leaf service to the Vax is so slow'
that I feel it is crucial to be able to cache source files on the local'
disk.'
'
Are there plans to address this issue?  If so what kind of solutions are'
being contemplated?'
'
Peter'
'
[ I have just begun reading this bboard so if this topic has been'
  discussed before please refer me to that discussion... ]'
'
-------'
'
Return-Path': <SCHOEN@SUMEX-AIM.ARPA>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 11 JUN 84 22':20':59 PDT'
Date': Mon, 11 Jun 84 22':14':03 PDT'
From': Eric Schoen <Schoen@SUMEX-AIM.ARPA>'
Subject': Interlisp-D Filing'
To': 1100users@SUMEX-AIM.ARPA'
'
I haven''t encountered a good system for handling multiple versions of source'
files in any language.  However, rather than light a flame on that topic, I''d'
like to address the issue of file server performance.'
'
Early versions of Interlisp-D (Allegro through Chorus [Feb, 1983]) used an'
implementation of the Leaf protocol which used an outstanding request window'
of exactly 1 packet; that is, no lookahead was performed on the connection,'
reducing a competant virtual circuit protocol to an expensive'
request-response model.  I developed the Tenex/Tops-20 server under these'
Lisp versions.'
'
Later versions of Interlisp support multiple-packet lookahead for situations'
in which the system detects sequential reading or writing (e.g. making or'
loading files).  The Tops-20 server hasn''t been updated to support multiple'
packet lookahead, however (well, the code''s in there, but not too well'
debugged under certain circumstances).  The result is dead time between '
a response and the next request.'
'
We recently implemented a brand-new Leaf server, written in Bliss to run'
under our own Pup environment on VAX/VMS.  The server supports upto 5 packet'
lookahead (Lisp uses only 1 packet, though).  Thus, there''s no deadtime'
between response and next request; the next request has already been made,'
and the response is (hopefully) already back at the D-machine.  I''ve measured'
200KB throughput between a timeshared VAX-11/780 and our Dorado (loading'
.DCOM files), with a Dolphin-based 3-10 MB Interlisp gateway in the path.  On'
the other hand, the Tops-20 Leaf server, on a 2020 running on the same 3 MB'
net as the Dorado, with no users, can only push about 30 KB to the Dorado.'
'
So the point of all this is that Leaf server efficiency is paramount to good'
file I/O in Interlisp.  I haven''t seen the latest Unix Leaf server, but the'
version we had at SDR did no lookahead.  We had patched that version to run'
under our VMS environment for a time, and saw roughly the same performance as'
the 2020.  Note that in a (multiple) gateway environment, packet lookahead'
can do quite a bit to mask gateway delays (so long as your networks are'
reasonably reliable).'
'
Eric'
'
-------'
'
Return-Path': <@SUMEX-AIM.ARPA':tgd@csnet-relay.csnet>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 13 JUN 84 01':05':17 PDT'
Received': from csnet-relay by SUMEX-AIM.ARPA with TCP; Wed 13 Jun 84 00':57':57-PDT'
Received': From oregon-state.csnet by csnet-relay;  13 Jun 84 3':40 EDT'
Date': 12 Jun 84 21':52':09 PDT'
From': tgd%oregon-state.csnet@CSNET-RELAY.ARPA'
Full-Name': Thomas G. Dietterich'
Subject':   RE': multiple copies of distributed files'
To': 1100users@SUMEX-AIM.ARPA, karp@SUMEX-AIM.ARPA'
Cc': Dietterich@SUMEX-AIM.ARPA'
'
'
'
Peter,'
'
What kind of solution do you propose?  '
'
I take a lot of issue with your message, especially its argumentative'
tone.  Most programming languages make no attempt whatsoever to keep'
track of files at all.  Interlisp-D has no difficulty dealing with'
"distributed" files and a distributed file system.  The problem, as you'
note, is that you want to maintain multiple copies of the same file in'
different locations.  Interlisp-D even provides some support for that'
now, in that you can supply a list of directories (on variable'
DIRECTORIES) for the system to search through looking for your file.'
I''ve noticed that this is very slow, though, and it appears to be'
happening inside DWIM.  The result is that the file name has to be'
DWIMified every time you use it (e.g., when doing a LOADFNS to edit a'
function, or when using MASTERSCOPE).'
'
The problem seems to be that Interlisp''s idea of file identity is the'
name of the file (host, directory, name, extension, and version number).'
This needs to be changed, so that there is some sort of file'
abstraction.  Each file needs to be written with a unique identification'
number that encodes the abstract file identity and the specific version.'
'
There needs to be a relocation table that tells Interlisp all of the'
possible locations where each abstract file can be found.  When you'
cleanup, Interlisp should write this relocation table out to disk.'
Then, when you restart your system, Interlisp could make a pass through'
each of the places on DIRECTORIES and update the relocation table (This'
could be optimized in some way.  Perhaps DWIM should update the relocation'
table?)  Interlisp could then choose the most convenient copy of each'
file as the copy to load.  Load commands would not refer to specific'
disk files, but to the abstract files that are then mapped to specific'
disk files by this relocation table.'
'
One subtlety of this approach is that Interlisp needs to distinguish'
between {DSK} devices on different D machines, so that, if I save a file'
on {DSK} on my Dlion and then move over to my Dorado, it knows that it'
probably can''t access my Dlion files (although, if that were the only'
extant version, it would have to ask the user to set up an FTP link'
somehow.)'
'
Also, there would need to be some way to copy and move files from inside'
lisp so that the relocation table gets updated.  There should also be'
some way to rebuild the relocation table if it gets smashed or'
destroyed.'
'
There should also be some way to roll back the system to a previous set'
of files.'
'
--Tom'
'
-----'
'
Return-Path': <@SUMEX-AIM.ARPA':VANBUER@USC-ECL.ARPA>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 13 JUN 84 09':08':40 PDT'
Received': from USC-ECL.ARPA by SUMEX-AIM.ARPA with TCP; Wed 13 Jun 84 07':52':08-PDT'
Date': 13 Jun 84 07':46 PDT'
From': VANBUER@USC-ECL.ARPA'
Subject': RE': multiple copies of distributed files'
To': 1100users@SUMEX-AIM.ARPA'
'
Interlisp (and not just D) deals with half of the problem with distributed files,'
relocating the same version of a file in the face of being moved between directories.'
The full name and creation date recorded in the FILECREATED expression at the'
beginning of each file created by the FILE package (and compiled files also'
contain the data for their source file).  when a file is LOADED (either source'
or compiled), the full name as loaded and the full name as created are noted.'
If you later refer (e.g. LOADFNS or masterscope request), it looks for the'
files in several places': in particular--where it came from, where it was made,'
the current connected directory, and all the ones in the variable DIRECTORIES.'
(Much of this is done because an entry on ERRORTYPELIST says call SPELLFILE on file-not-found.  On Interlisp-10, spellfile even deals with misspelled file'
names, but D generally can''t get directory enumerations, so this is disabled).'
The file is accepted as the same only if the filecreated expression agrees.'
'
Two things that Interlisp does not deal with (but neither does much else) are'
updating all copies of a file [Unix only handles this if several names are hard links to the same object, and then only if they''re on the same filesystem'
(roughly=drive) and writer of file does fancy footwork to overwrite the file'
in place]; and preventing simultaneous updating of a file by two processes.'
Neither problem is new, though local nets have increased the trend toward'
multiple copies of a file.  File locks don''t really deal with the problem of'
two people working on the same file anyway.  It prevents writing a corrupt file'
where two processes alternate blocks of the file, but a lock really needs'
to last for days, not seconds, and apply to the base name, not a particular'
full name.'
'
Maybe what we need is a librarian server (which provides exclusive control of a'
name [e.g. file name]) and a package which modifies the file package interface'
[maybe edit and remake] to "check out" the file name when you begin working on'
a file, and "check back in" when you are done.  This still doesn''t deal with'
multiple copies of a file, however.'
	Darrel Van Buer, SDC'
'
-------'
'
Return-Path': <RICHER@SUMEX-AIM.ARPA>'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 14 JUN 84 15':23':20 PDT'
Date': Thu, 14 Jun 84 15':22':27 PDT'
From': Mark Richer <RICHER@SUMEX-AIM.ARPA>'
Subject': with regard to distributed files'
To': lispsupport.PA'
cc': RICHER@SUMEX-AIM.ARPA'
'
'
I think Karp''s question about the problems of keeping track of'
files when you work on several d-machines and use 1 or more file'
server''s is a valid concern.  But currently, I have some mundane'
concerns about using {DSK} , {SAFE} our VAX with 4.2 unix file server,'
and {sumex}tops-20 with the same files.  First of all fugue6 seems'
to write out a different end-of-line character than older lisps.'
I haven''t thought about this much, but it seems it now is supposed'
to write out a CR and LF?  There are several problems I have now'
experienced ':'
'
(1) see and ty no longer pretty print the files I save so it''s a'
mess to read.'
'
(2) I usually make my files to SAFE, but sometimes it''s down, and'
I save to DSK or SUMEX. Well, it seems that I have had the '
experience that files don''t always get printed correctly. For'
instance, today I did a makefile to dsk with a copy of the old'
version on dsk (though the filedates may have pointed to safe) and'
I didn''t use the NEW option and when I picked up the raven'
copy the body of the defintions were not there because they ran off'
the line as ifit was one big line.  This has happened on other'
occassions and Christopher Schmidt has not been able to completely'
explain what is happening.  '
'
(3) I get confused about where Interlisp is going to look for'
information about files.  When does it look at the filedates'
property or some other property, when does it look at directories,'
etc. Is this documented somewhere?  For instance, if a file was'
made and compiled on lassen, but you moved those files to safe'
and closed your account on lassen (so lassen is not on directories)'
which we did last summer, INterlisp still went off to lassen to'
look for the source code.  '
'
(4) we have had various problems transfering files around different'
systems using ftp, pupftp, copyfile, etc in that files have not'
transfered properly or couldn''t be loaded into lisp, and so on.'
I can''t give details because I barely understood what was going'
wrong at the time. Many of these problems have nothing to do with'
INterlisp necessarily, but I think there are some real problems'
with compatibility between systems.  All the stuff should be'
transparent to the Interlisp application programmer, but I have'
found that details about leafs, and unix 4.1, 4.2, pseudo version'
numbers, copyfile, \FTPAVAILABLEFLG, pups and so on are all things'
I have had to deal with, but have no real source of information'
I could refer to except for Christopher Schmidt who has helped'
me solve numerous problems.  '
'
(5) Lastly, our leaf server constantly hangs up while I''m on the'
dorado (the unix leaf) and I get this so you want to abort msg.'
and nobody seems to know what''s wrong. It seems to be lisp''s'
fault. I have told Schmidt and Bill Croft (our file server keeper)'
about the problem. You probably know about it because I heard '
there was an undocumented bug in the dorado microcode that causes'
this.'
'
-------'
'
Return-Path': <@SUMEX-AIM.ARPA':JonL.pa@Xerox.ARPA>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 14 JUN 84 12':28':42 PDT'
Received': from Xerox.ARPA by SUMEX-AIM.ARPA with TCP; Thu 14 Jun 84 11':09':42-PDT'
Received': from Semillon.ms by ArpaGateway.ms ; 14 JUN 84 11':10':08 PDT'
Date': 14 Jun 84 11':09 PDT'
From': JonL.pa'
Subject': RE': multiple copies of distributed files'
In-reply-to': VANBUER@USC-ECL.ARPA''s message of 13 Jun 84 07':46 PDT'
To': VANBUER@USC-ECL.ARPA'
cc': 1100users@SUMEX-AIM.ARPA'
'
I seem to remember Prof. Ferrari at Berkeley working on a Unix'
distributed computation environment, and addressing (somewhat) the'
problems of a distributed file system.  As you mention, the problem on'
locks on a file isn''t new with distributed computing, but it becomes'
extraordinarily more acute when several independent "hosts" try to'
access a file copies of which may be stored in physically remote places.'
'
I also like your idea of a "librarian" server.'
'
-- JonL White --'
'
-----'
'
Return-Path': <@SUMEX-AIM.ARPA':goldberg@rand-unix>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 14 JUN 84 16':42':07 PDT'
Received': from rand-unix.ARPA by SUMEX-AIM.ARPA with TCP; Thu 14 Jun 84 16':32':57-PDT'
Received': by rand-unix.ARPA (4.12/4.7)'
	id AA03763; Thu, 14 Jun 84 16':34':26 pdt'
From': Arthur Goldberg <goldberg@RAND-UNIX.ARPA>'
Message-Id': <8406142334.AA03763@rand-unix.ARPA>'
Date': 14 Jun 84 16':34':18 PDT (Thu)'
To': Peter Karp <KARP@SUMEX-AIM.ARPA>'
Cc': 1100users@SUMEX-AIM.ARPA, goldberg@RAND-UNIX.ARPA'
Subject': Operating System Support for Distributed Files'
In-Reply-To': Your message of Mon 11 Jun 84 14':50':36-PDT.'
	     <8406112211.AA25004@rand-unix.ARPA>'
'
To my knowledge the best operating system support for distributed'
replicated files is provided by Locus, a distributed Unix developed'
by Popek et. al. at UCLA.  It is currently running on more than'
10 Vax 11/750s connected by an ethernet at UCLA.  A file can be stored at any'
subset of sites in the distributed system.  Replication is'
available on a file system basis.  Modifications to a file are automatically'
propagated to other copies of the file.  Simultaneous modification '
is prevented by a simple locking policy.  Simultaneous modification of'
two copies of a file in two separate partitions of a netwrork'
is prevented by a primary site update policy.  For further information'
see Popek''s "Locus Distributed File System" paper in the 1983 '
Symposium on Operating Systems Principles.'
Arthur Goldberg'
'
-----'
'
Return-Path': <@SUMEX-AIM.ARPA':masinter.pa@Xerox.ARPA>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 24 JUN 84 18':19':34 PDT'
Received': from Xerox.ARPA by SUMEX-AIM.ARPA with TCP; Sun 24 Jun 84 18':11':12-PDT'
Received': from Salvador.ms by ArpaGateway.ms ; 24 JUN 84 18':11':47 PDT'
From': masinter.pa'
Date': 24 Jun 84 18':11':35 PDT'
Subject': Re': multiple copies of distributed files'
In-reply-to': VANBUER@USC-ECL.ARPA''s message of 13 Jun 84 07':46 PDT'
To': VANBUER@USC-ECL.ARPA'
cc': 1100users@SUMEX-AIM.ARPA'
'
There are some assumptions built into the Interlisp file package that'
are not so appropriate in environments where files frequently move from'
one location to another. (For example, it doesn''t make much sense to pay'
attention to the location that the file was originally put, as in the'
FILECREATED expression, if it has moved.)'
'
I hope to reorganize this section of the file package and rationalize'
the data structures and their usage to straighten out some of the'
misfeatures, although other priorities may interfere with getting it'
into the next release.'
'
Larry'
'
-----'
'
Return-Path': <@SUMEX-AIM.ARPA':DDYER@USC-ISIB.ARPA>'
Redistributed': Xerox1100UsersGroup↑.PA'
Received': from SUMEX-AIM.ARPA by Xerox.ARPA ; 24 JUN 84 23':26':13 PDT'
Received': from USC-ISIB.ARPA by SUMEX-AIM.ARPA with TCP; Sun 24 Jun 84 23':17':50-PDT'
Date': 24 Jun 84 23':13':54 PDT'
Subject': Re': multiple copies of distributed files'
From': Dave Dyer <DDYER@USC-ISIB.ARPA>'
To': masinter.pa, VANBUER@USC-ECL.ARPA'
cc': 1100users@SUMEX-AIM.ARPA'
In-Reply-To': (Message from "masinter.pa@XEROX.ARPA" of 24 Jun 84 18':11':35 PDT)'
'
'
 I wrote some code that makes Interlisp use the actual location and name'
a file is read under, in preference to all the internal information'
related to the name it was written under.   This code is installed in'
Interlisp-vax, and was provided to the Interlisp-D folk.  As far as I know,'
they never got around to installing it, and so Intelisp-D still loses'
badly when you move files around.'
-------'


Workaround: 

Test Case: 

Edit-By: Sannella.PA

Edit-Date: 25-Jun-84  9':48':35

Attn: Kaplan

Assigned To: 

In/By: 

Disposition: 

System: Programming Environment

Subsystem: File Package

Machine: 

Disk: 

Microcode Version: 

Memory Size: 

File Server: 

Server Software Version: 

Difficulty: Very Hard

Frequency: Everytime

Impact: Moderate

Priority: Unlikely

Status: Open

Problem Type: Design - Impl

Source Files: