TarDoc.tioga
Bill Jackson (bj) June 15, 1989 2:12:55 pm PDT
Willie-Sue, August 10, 1989 1:00:22 pm PDT
Tar
PCEDAR 2.0 % FOR INTERNAL XEROX USE ONLY
Tar
the tape archive facility
Bill Jackson
Ó Copyright 1989 Xerox Corporation. All rights reserved.
Abstract: "tar, (the tape archive command) dumps several files into one, in a medium suitable for transportation" -- from section 5 of the Unix manual
Created by: Bill Jackson
Maintained by: Bill Jackson <BJackson>
Keywords: tar, unix, tape, archive
XEROX  Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, California 94304

For Internal Xerox Use Only
1. Details
A Sample
Here's what a sample test-case looks like:
(Sample.tar)
csh> tar tfv Sample.tar
rw-r--r--3064/3000 945 Nov 13 18:42 1988 fstab.sd
rw-r--r--3064/3000 942 Nov 13 18:47 1988 fstab.sd.X
rw-r--r--3064/100 1 May 12 19:38 1989 fstab.swap.slice
rw-r--r--3064/3000 1030 Nov 13 18:42 1988 fstab.xy
% ← &s ← FS.StreamOpen[fileName: "Sample.tar", extendFileProc: NIL]
^[streamProcs: 3347062B^, streamData: 21274362B^, propList: 7642204B^, backingStream: NIL]
% ← &h ← TarImpl.GetHeader[&s]
[name: (100)['f, 's, 't, 'a, 'b, '., 's, 'd, '\000, ...],
mode: (8)[' , ' , ' , '6, '4, '4, ' , '\000],
uid: (8)[' , ' , '5, '7, '7, '0, ' , '\000],
gid: (8)[' , ' , '5, '6, '7, '0, ' , '\000],
size: (12)[' , ' , ' , ' , ' , ' , ' , '1, '6, '6, '1, ' ],
mtime: (12)[' , '4, '3, '3, '7, '4, '4, '4, '0, '2, '6, ' ],
chksum: (8)[' , ' , '5, '6, '5, '3, '\000, ' ],
linkflag: ??,
linkname: (100)['\000, ...]
]
% ← &s ← FS.StreamOpen[fileName: "Sample.tar", extendFileProc: NIL]
^[streamProcs: 3347062B^, streamData: 21274362B^, propList: 7642204B^, backingStream: NIL]
% ← TarImpl.stdout ← &stdout[]
% ← TarImpl.Scan[&s, TarImpl.PrintInfo]
File Format
A ``tar tape'' or file is a series of blocks. Each block is of size TBLOCK. A file on the tape is represented by a header block which describes the file, followed by zero or more blocks which give the contents of the file. At the end of the tape are two blocks filled with binary zeros, as an EOF indicator.
The header block looks like:
(The encoding of the header is designed to be portable across machines. )
#define TBLOCK 512
#define NAMSIZ  100
union hblock {
char dummy[TBLOCK];
struct header {
char name[NAMSIZ];
char mode[8];
char uid[8];
char gid[8];
char size[12];
char mtime[12];
char chksum[8];
char linkflag;
char linkname[NAMSIZ];
} dbuf;
};
name is a NULL-terminated string. The other fields are zero-filled octal numbers in ASCII. Each field (of width w) contains w-2 digits, a SPACE, and a NULL, except size and mtime, which do not contain the trailing NULL. name is the name of the file. Files dumped because they were in a directory have the directory name as prefix and /filename as suffix. mode is the file mode, with the top bit masked off. uid and gid are the user and group numbers which own the file. size is the size of the file in bytes. Links and symbolic links are dumped with this field specified as zero. mtime is the modification time of the file at the time it was dumped. chksum is a octal ASCII value which represents the sum of all the bytes in the header block. When calculating the checksum, the chksum field is treated as if it were all blanks. linkflag is ASCII `0' if the file is ``normal'' or a special file, ASCII `1' if it is an hard link, and ASCII `2' if it is a symbolic link. The name linked-to, if any, is in linkname, with a trailing NULL. Unused fields of the header are binary zeros (and are included in the checksum).
Checksum calculation
chksum is a octal ASCII value which represents the sum of all the bytes in the header block. When calculating the checksum, the chksum field is treated as if it were all blanks.
File Format (on physical media)
The blocks are grouped for physical I/O operations. Each group of n blocks (where n is an explicit parameter - defaulted to 20) is written in a single operation; on nine-track tapes, the result of this write is a single tape record. The last group is always written at the full size, so blocks after the two zero blocks contain random data. On reading, the specified or default group size is used for the first read, but if that read returns less than a full tape block, the reduced block size should be used for further reads.
Pragmatics (operational conventions)
The first time a given file (inode number) is dumped, it is dumped as a regular file. The second and subsequent times, it is dumped as a link instead. Upon retrieval, if a link entry is retrieved, but not the file it was linked to, an error message should be printed; the tape must be manually re-scanned to retrieve the linked-to file.