[Patches] [ python-Patches-403753 ] zlib decompress; uncontrollable memory usage

noreply@sourceforge.net noreply@sourceforge.net
Mon, 04 Jun 2001 13:55:33 -0700


Patches item #403753, was updated on 2001-02-12 08:10
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403753&group_id=5470

Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Toby Dickenson (htrd)
Assigned to: Jeremy Hylton (jhylton)
Summary: zlib decompress; uncontrollable memory usage

Initial Comment:
zlib's decompress method will allocate as much memory as is
needed to hold the decompressed output. The length of the output
buffer may be very much larger than the length of the input buffer,
and the python code calling the decompress method has no other way
to control how much memory is allocated.

In experimentation, I seen decompress generate output that is
1000 times larger than its input

These characteristics may make the decompress method unsuitable for
handling data obtained from untrusted sources (for example,
in a http proxy which implements gzip encoding) since it may be
vulnerable to a denial of service attack. A malicious user could
construct a moderately sized input which forces 'decompress' to
try to allocate too much memory.

This patch adds a new method, decompress_incremental, which allows
the caller to specify the maximum size of the output. This method
returns the excess input, in addition to the decompressed output.

It is possible to solve this problem without a patch:
If input is fed to the decompressor a few tens of bytes
at a time, memory usage will surge by (at most)
a few tens of kilobytes. Such a process is a kludge, and much
less efficient that the approach used in this patch.

(Ive not been able to test the documentation patch; I hope its ok)

(This patch also includes the change from Patch #103748)


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2001-06-04 13:55

Message:
Logged In: YES 
user_id=21627

The patch looks good to me; I recommend to approve it.



----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2001-04-10 10:52

Message:
Logged In: YES 
user_id=11375

I've let this patch gather dust long enough.  Unassigning so
that someone else can review it.

----------------------------------------------------------------------

Comment By: Gregory P. Smith (greg)
Date: 2001-04-07 00:20

Message:
Logged In: YES 
user_id=413

as a side note.  I believe I implemented a python workaround
for this problem by just decompressing data in small chunks
(4k at a time) using a decompressor object.

see the mojonation project on sourceforge if you're
curious.  (specifically, in the mojonation evil module, look
at common/mojoutil.py for function named
safe_zlib_decompress).

Regardless, I like thie idea of this patch.  It would be
good to have that in the main API and documentation for
simplicity. (and because there are too many programmers out
there who don't realize potential denial of service issues
on their own...)


----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-02-22 04:50

Message:
New patch implementing a new optional parameter to .decompress, and a new attribute .unconsumed_tail



----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-02-22 03:42

Message:
Waaah - that last comment should be 'cant' not 'can'

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-02-22 03:40

Message:
We can reuse .unused_data without introducing an ambiguity. I will prepare a patch that uses a new attribute .unconsumed_tail



----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2001-02-21 11:32

Message:
Doesn't .unused_data serve much the same purpose, though?
So that even with a maximum size, .decompress() always returns a string, and .unused_data would contain the unprocessed data.

----------------------------------------------------------------------

Comment By: Toby Dickenson (htrd)
Date: 2001-02-21 06:00

Message:
I did consider that....

An extra change that you didnt mention is the need for a different return value. Currently .decompress() always returns a string. The new method in my patch returns a tuple containing the same string, and an integer specifying how many bytes were consumed from the input.

Overloading return values based on an optional parameter seems a little hairy to me, but I would be happy to change the patch if that is your preferred option.

I also considered (and rejected) the possibility of adding an optional max-size argument to .decompress() as you suggest, but raising an exception if this limit is exceeded. This avoids the need for an extra return value, but looses out on flexibility.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2001-02-20 18:48

Message:
Rather than introducing a new method, why not just add an optional maxlen argument to .decompress().  I think the changes would be:

* add 'int maxlen=-1;'
* add "...|i" ... ,&maxlen to the argument parsing
* if maxlen != -1, length = maxlen else length = DEFAULTALLOC;
* Add '&& maxlen==-1' to the while loop.  (Use the current CVS; I just checked in a patch rearranging the zlib module a bit.)

Do you want to make those changes and resubmit the patch?


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=403753&group_id=5470