[Patches] [ python-Patches-918101 ] tarfile.py enhancements

SourceForge.net noreply at sourceforge.net
Sat Mar 5 13:48:29 CET 2005


Patches item #918101, was opened at 2004-03-17 16:59
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=918101&group_id=5470

Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Lars Gustäbel (gustaebel)
Assigned to: Martin v. Löwis (loewis)
Summary: tarfile.py enhancements

Initial Comment:
I still develop tarfile.py sporadically on a separate
branch (http://www.gustaebel.de/lars/tarfile/), and so
there are two features from this branch that I'd like
to propose for inclusion in Python's tarfile.py:

1. Overcoming the 8GB file size limit (8GB-limit.patch)

At the moment it is not possible to add files to a tar
archive that exceed 8GB size. Although this is POSIX
compliant, GNU tar offers an extension header for
largefiles that encodes file sizes in an 88-bit number
instead of the common 11-digits octal number. Like all
other GNU extensions in tarfile.py, this feature is
turned on and off using the TarFile.posix attribute. 

2. Automatic compression detection for the stream
interface (stream-detect-compr.patch)

tarfile.py's stream interface (which can be used to
access tape devices or simply read a tar from stdin) is
a bit difficult to use because it's not able to detect
whether an archive is compressed or not. Compression
has to be explicitly specified using mode ("r|",
"r|gz", "r|bz2"). The patch introduces a fourth mode
"r|*" that makes automatic detection possible.


Both patches are not vitally important, but especially
the 8GB-patch is useful IMO.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2005-03-05 13:48

Message:
Logged In: YES 
user_id=21627

Thanks for the patch and the explanation; committed as

libtarfile.tex 1.9
tarfile.py 1.27
test_tarfile.py 1.18
NEWS 1.1268


----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2005-03-05 12:37

Message:
Logged In: YES 
user_id=642936

The asterisk notation is necessary only for the stream
interface There are the three possible modes "r|", "r|gz"
and "r|bz2", and "r|*" is a placeholder for all of them
combined.
For symmetry reasons I thought I'd add the same thing to the
file interface as well. It also has these three modes "r:",
"r:gz" and "r:bz2", for which "r:*" could act as a wildcard.
Let's say "r:*" is the explicit version of "r".

I thought about something like the following example as a
use case:

def open_tar(filename, stream=False):
    mode = "r" + [":", "|"][stream] + "*"
    [...]

I have attached an updated patch including the test.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2005-03-04 20:58

Message:
Logged In: YES 
user_id=21627

Lars, the streaming patch is outdated. If you still think it
is necessary, please update the patch.

While I can understand what the feature "automatic
detection" does, I fail to see why you need a new syntax for
open. AFAICT, "r" is equivalent to the newly-proposed "r:*".
Why is it necessary to have two ways to spell the same thing?

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2004-07-21 14:55

Message:
Logged In: YES 
user_id=642936

I just created tests for the stream-detect-compr.patch,
attached as test.patch.

BTW, I examined bug #949052, and opened a patch (#995126).

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2004-07-21 09:54

Message:
Logged In: YES 
user_id=642936

tarfile.py's stream interface must be used if the user wants
to read an archive that is not a seekable file, e.g. stdin
or a tape device. ATM, it is the user's job to find out
whether the stream is compressed (mode="r|gz" or "r|bz2") or
uncompressed (mode="r|"), which makes the stream interface
kind of awkward and unusable for many users. The patch
introduces an additional mode "r|*" which does this job. I
admit it's just a convenience thing but I think the stream
interface is somehow too complicated without it.

The reason why I changed the "type" argument to "comptype"
was just that the TarFile class uses "comptype" and the
_Stream class uses "type" for the same thing. It doesn't
need to be changed.

You're absolute right about the testcase. I had enough time
to write one ;-)

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2004-07-21 00:31

Message:
Logged In: YES 
user_id=33168

Lars, could you look at bug 949052 and provide any guidance?
 Thanks.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2004-07-21 00:28

Message:
Logged In: YES 
user_id=33168

I checked in the 8GB limit patch. Lib/tarfile.py 1.14.

I didn't check in the stream patch for 2 reasons:
1) I don't know the need.  Is this common?  I've never heard
of it.
2) The type parameter name was changed to comtype.  I wasn't
sure if this was necessary.  It potentially (albeit
unlikely) could break a program.  I'm not concerned about
changing the name of attribute.

Lars, can you provide a good reason to add this part of the
patch?  If it's not likely to be used, I don't think it
should be added.  If it is added, there should also be a test.

Thanks.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=918101&group_id=5470


More information about the Patches mailing list