[Patches] [ python-Patches-918101 ] tarfile.py enhancements
SourceForge.net
noreply at sourceforge.net
Sat Mar 5 13:48:29 CET 2005
Patches item #918101, was opened at 2004-03-17 16:59
Message generated for change (Comment added) made by loewis
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=918101&group_id=5470
Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Lars Gustäbel (gustaebel)
Assigned to: Martin v. Löwis (loewis)
Summary: tarfile.py enhancements
Initial Comment:
I still develop tarfile.py sporadically on a separate
branch (http://www.gustaebel.de/lars/tarfile/), and so
there are two features from this branch that I'd like
to propose for inclusion in Python's tarfile.py:
1. Overcoming the 8GB file size limit (8GB-limit.patch)
At the moment it is not possible to add files to a tar
archive that exceed 8GB size. Although this is POSIX
compliant, GNU tar offers an extension header for
largefiles that encodes file sizes in an 88-bit number
instead of the common 11-digits octal number. Like all
other GNU extensions in tarfile.py, this feature is
turned on and off using the TarFile.posix attribute.
2. Automatic compression detection for the stream
interface (stream-detect-compr.patch)
tarfile.py's stream interface (which can be used to
access tape devices or simply read a tar from stdin) is
a bit difficult to use because it's not able to detect
whether an archive is compressed or not. Compression
has to be explicitly specified using mode ("r|",
"r|gz", "r|bz2"). The patch introduces a fourth mode
"r|*" that makes automatic detection possible.
Both patches are not vitally important, but especially
the 8GB-patch is useful IMO.
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2005-03-05 13:48
Message:
Logged In: YES
user_id=21627
Thanks for the patch and the explanation; committed as
libtarfile.tex 1.9
tarfile.py 1.27
test_tarfile.py 1.18
NEWS 1.1268
----------------------------------------------------------------------
Comment By: Lars Gustäbel (gustaebel)
Date: 2005-03-05 12:37
Message:
Logged In: YES
user_id=642936
The asterisk notation is necessary only for the stream
interface There are the three possible modes "r|", "r|gz"
and "r|bz2", and "r|*" is a placeholder for all of them
combined.
For symmetry reasons I thought I'd add the same thing to the
file interface as well. It also has these three modes "r:",
"r:gz" and "r:bz2", for which "r:*" could act as a wildcard.
Let's say "r:*" is the explicit version of "r".
I thought about something like the following example as a
use case:
def open_tar(filename, stream=False):
mode = "r" + [":", "|"][stream] + "*"
[...]
I have attached an updated patch including the test.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2005-03-04 20:58
Message:
Logged In: YES
user_id=21627
Lars, the streaming patch is outdated. If you still think it
is necessary, please update the patch.
While I can understand what the feature "automatic
detection" does, I fail to see why you need a new syntax for
open. AFAICT, "r" is equivalent to the newly-proposed "r:*".
Why is it necessary to have two ways to spell the same thing?
----------------------------------------------------------------------
Comment By: Lars Gustäbel (gustaebel)
Date: 2004-07-21 14:55
Message:
Logged In: YES
user_id=642936
I just created tests for the stream-detect-compr.patch,
attached as test.patch.
BTW, I examined bug #949052, and opened a patch (#995126).
----------------------------------------------------------------------
Comment By: Lars Gustäbel (gustaebel)
Date: 2004-07-21 09:54
Message:
Logged In: YES
user_id=642936
tarfile.py's stream interface must be used if the user wants
to read an archive that is not a seekable file, e.g. stdin
or a tape device. ATM, it is the user's job to find out
whether the stream is compressed (mode="r|gz" or "r|bz2") or
uncompressed (mode="r|"), which makes the stream interface
kind of awkward and unusable for many users. The patch
introduces an additional mode "r|*" which does this job. I
admit it's just a convenience thing but I think the stream
interface is somehow too complicated without it.
The reason why I changed the "type" argument to "comptype"
was just that the TarFile class uses "comptype" and the
_Stream class uses "type" for the same thing. It doesn't
need to be changed.
You're absolute right about the testcase. I had enough time
to write one ;-)
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2004-07-21 00:31
Message:
Logged In: YES
user_id=33168
Lars, could you look at bug 949052 and provide any guidance?
Thanks.
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2004-07-21 00:28
Message:
Logged In: YES
user_id=33168
I checked in the 8GB limit patch. Lib/tarfile.py 1.14.
I didn't check in the stream patch for 2 reasons:
1) I don't know the need. Is this common? I've never heard
of it.
2) The type parameter name was changed to comtype. I wasn't
sure if this was necessary. It potentially (albeit
unlikely) could break a program. I'm not concerned about
changing the name of attribute.
Lars, can you provide a good reason to add this part of the
patch? If it's not likely to be used, I don't think it
should be added. If it is added, there should also be a test.
Thanks.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=918101&group_id=5470
More information about the Patches
mailing list