tarfile

reading and writing .tar and tar.gz files



Abstract

The tarfile module enables read and write access to both plain and gzipped TAR files.
TAR is a widely spread file archive format, mostly used in the *NIX world. It does not implement data compression itself, so in most cases a TAR archive is filtered through GZIP in order to reduce its size. TAR files have the suffix .tar, gzipped TAR files end with .tar.gz or .tgz.
There has already been another approach to TAR files by a python module written by Jason Petrone - also called tarfile - which however had only read capabilities.

This module has been developed in accordance to the zipfile standard module. All basic methods and functions are compatible to zipfile. This shall make it easy to add TAR file support to existing projects or to offer two archiving algorithms in a program.
However, the TAR format is much more sophisticated than ZIP in handling different file formats, file permissions and file ownership. Because of that, tarfile offers some additional methods. For detailed information on the TAR format, consult the GNU tar manual's The Standard Format Section.

Please note that the tarfile module is not intended to be a replacement for a full-blown commandline tar program!

Module contents

class TarFile(...)
The class for reading and writing TAR archives. See TarFile Objects for more details.

class TarInfo(...)
The class that stores information about TAR archive members. See TarInfo Objects.

exception error
Exception raised for bad TAR files.

is_tarfile(file)
Return true if file is a valid TAR archive, otherwise false. It looks for the magic string in the first block. file may be a filename or a file-like object. If file points to an empty file, is_tarfile returns true, too.

gzip(file[, gzfile])
Compress file using gzip. file must be a filename of an existing file. If gzfile is given, file is compressed to a file named gzfile, otherwise '.gz' is appended to file as target filename.

gunzip(gzfile[, file])
Reverse function to gzip(). If file is not given, it tries to convert the extension '.gz' resp. '.tgz' to '.tar'.

class stdout(filename)
This is a small class wrapper for sys.stdout. You need this, if you want to write a gzipped TarFile to sys.stdout. filename is the desired filename of the TAR file in the gzip file (e.g. "sources.tar"). This is used due to the fact that gzip files contain the filenames of the original files, and sys.stdout has no proper name. Please note that Tarfile.debug is set to 0 (!).
For win32 users: sys.stdout is set to binary mode implicitly.
Example:
tarfile = TarFile(stdout("sources.tar"), "w", TAR_GZIPPED)
tarfile.debug = 0	# suppress debug messages
	

The module additionally defines some constants:

TAR_PLAIN or ZIP_STORED
Numeric constant for an uncompressed TAR archive.

TAR_GZIPPED or ZIP_DEFLATED
Numeric constant for a gzip compressed TAR archive. This uses the gzip standard module.

TAR specific type constants:

REGTYPE
Numeric constant for a regular file entry in a TAR archive.

AREGTYPE
Another numeric constant for a regular file entry.

LNKTYPE
Numeric constant for a link entry.

SYMTYPE
Numeric constant for a symbolic link entry.

CHRTYPE
Numeric constant for a character special device entry.

BLKTYPE
Numeric constant for a block special device entry.

DIRTYPE
Numeric constant for a directory entry.

FIFOTYPE
Numeric constant for a FIFO special device entry.

CONTTYPE
Numeric constant for a contiguous file entry.

Some GNU tar special types:

GNUTYPE_LONGNAME
Numeric constant for a longname entry.

GNUTYPE_LONGLINK
Numeric constant for a longlink entry.


Subsections:



Copyright © 2002 Lars Gustäbel lars@gustaebel.de