[ python-Bugs-1074261 ] gzip dies on gz files with many appended
headers
SourceForge.net
noreply at sourceforge.net
Thu Dec 2 17:43:03 CET 2004
Bugs item #1074261, was opened at 2004-11-27 12:29
Message generated for change (Settings changed) made by akuchling
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074261&group_id=5470
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Mark Eichin (eichin)
>Assigned to: A.M. Kuchling (akuchling)
Summary: gzip dies on gz files with many appended headers
Initial Comment:
One of the values of the gzip format is that one can reopen for
append and the file is, as a whole, still valid. This is accomplished
by adding new headers on reopen. gzip.py (as tested on 2.1, 2.3,
and 2.4rc1 freshly built) doesn't deal well with more than a certain
number of appended headers.
The included test case generates (using gzip.py) such a file, runs
gzip -tv on it to show that it is valid, and then tries to read it with
gzip.py -- and it blows out, with
OverflowError: long int too large to convert to int
in earlier releases, MemoryError in 2.4rc1 - what's going on is that
gzip.GzipFile.read keeps doubling readsize and calling _read again;
_read does call _read_gzip_header, and consumes *one* header.
So, readsize doubling means that older pythons blow out by not
autopromoting past 2**32, and 2.4 blows out trying to call file.read
on a huge value - but basically, more than 30 or so headers and it
fails.
The test case below is based on a real-world queueing case that
generates over 200 appended headers - and isn't bounded in any
useful way. I'll think about ways to make GzipFile more clever, but
I don't have a patch yet.
----------------------------------------------------------------------
Comment By: Mark Eichin (eichin)
Date: 2004-11-27 18:28
Message:
Logged In: YES
user_id=79734
Patch sent to patch-tracker as 1074381.
----------------------------------------------------------------------
Comment By: Mark Eichin (eichin)
Date: 2004-11-27 12:48
Message:
Logged In: YES
user_id=79734
Oh, this is actually easy to fix: just clamp readsize. After all, you don't
*actually* want to try to read gigabyte chunks most of the time. (The
supplied patch allows one to override gzip.GzipFile.max_read_chunk if
one really does.) Tested on 2.4rc1, and a version backported to 2.1
works there too.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1074261&group_id=5470
More information about the Python-bugs-list
mailing list