[ python-Bugs-1112549 ] cgi.FieldStorage memory usage can spike in line-oriented ops

Thu Aug 10 20:12:15 CEST 2006

Bugs item #1112549, was opened at 2005-01-30 05:40
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1112549&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Closed
Resolution: Accepted
Priority: 8
Submitted By: Chris McDonough (chrism)
Assigned to: Guido van Rossum (gvanrossum)
Summary: cgi.FieldStorage memory usage can spike in line-oriented ops

Initial Comment:
Various parts of cgi.FieldStorage call its
"read_lines_to_outerboundary", "read_lines" and
"skip_lines" methods.    These methods use the
"readline" method of the file object that represents an
input stream.  The input stream is typically data
supplied by an untrusted source (such as a user
uploading a file from a web browser).  The input data
is not required by the RFC 822/1521/1522/1867
specifications to contain any newline characters.  For
example, it is within the bounds of the specification
to supply a a multipart/form-data input stream with a
"file-data" part that consists of a 2GB string composed
entirely of "x" characters (which happens to be
something I did that led me to noticing this bug).

The simplest fix is to make use of the "size" argument
of the readline method of the file object where it is
used within all parts of FieldStorage that make use of
it.  A patch against the Python 2.3.4 cgi.py module
that does this is attached.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2006-08-10 11:12

Message:
Logged In: YES 
user_id=33168

Yes, I believe so.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-08-10 10:46

Message:
Logged In: YES 
user_id=849994

Is this a backportable fix?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-10 10:44

Message:
Logged In: YES 
user_id=6380

Checked in as r51190 (Chris's patch plus a warning added to
an XXX comment for parse_multipart()) and r51191 (Misc/NEWS).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-10 07:24

Message:
Logged In: YES 
user_id=6380

OK, let's forget about parse_multipart(). Feel free to add a
doc patch and a comment that deprecate it (actual warnings
are not a good idea IMO).

----------------------------------------------------------------------

Comment By: Chris McDonough (chrism)
Date: 2006-08-10 04:08

Message:
Logged In: YES 
user_id=32974

wrt parse_multipart: this function just turns around and puts the output from 
readline() into a list, as opposed to FieldStorage, which writes its chunked 
output to a tempfile, so just adding an argument to readline within it would 
not be useful.  We'd only be able to fix the memory consumption issue in 
parse_multipart if we were to make it also use a tempfile, but it's not clear 
that *not* writing a tempfile isn't its expected behavior as the docs for 
parse_multipart state:

  Returns a dictionary just like parse_qs() keys are the field names, each 
  value is a list of values for that field. This is easy to use but not much
  good if you are expecting megabytes to be uploaded -- in that case, use 
  the FieldStorage class instead which is much more flexible.

Is it OK to write a tempfile in this function (e.g. does that make it not useful 
on stuff like embedded systems)?

If not, maybe we should just deprecate parse_multipart?  I do find things that 
use it if I google hard enough but there are only 187 hits when I google for 
"cgi.parse_multipart" and 53,600 hits when I google for "cgi.FieldStorage".

I'm uploading another file with your style change suggestions.  It bundles all 
fixes to cgi.py, test_cgi.py, and test_cgi and includes the style changes, but 
does nothing about parse_multipart.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-09 15:23

Message:
Logged In: YES 
user_id=6380

BTW it would be better if all patches were in a single file
-- then you can delete the older patches (if SF lets you do
that).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-09 14:58

Message:
Logged In: YES 
user_id=6380

+1.

minor nits:

in the main patch: instead of

+            if line.endswith('\n'):
+                last_line_lfend = True
+            else:
+                last_line_lfend = False

you can just use

  last_line_lfend = line.endswith('\n')

in the unit test: instead of

  if type(a) != type(0):

use

  if not isinstance(a, int):

so that if some future release changes file.closed to return
a bool (as it should :-) this test won't break.

Is tehre a reason why you're not patching the fp.readline()
call in parse_multipart()?  It would seem to have the same
issue (even if it isn't used in Zope :-).

----------------------------------------------------------------------

Comment By: Chris McDonough (chrism)
Date: 2006-08-07 10:51

Message:
Logged In: YES 
user_id=32974

Yup, test/output/test_cgi did need fixing.  Apologies, I did not understand 
the test regime.  A new patch file named test_output_test_cgi-
svn-50879.patch has been uploaded with the required change.  regrtest.py 
now passes.

As far as verify vs. vereq, the test_cgi module uses verify all over the place.  
I'm apt to not change all of those places, so to change it in just that one place 
is probably ineffective.  Same for type comparison vs. isinstance.  I'm trying to 
make the smallest change possible as opposed to refactoring the test 
module.

I've uploaded a patch which contains a) the fix to cgi.py, b) the fix to 
test_cgi.py, and the fix to output/test_cgi.

The stylistic change wrt to last_line_lfend is fine with me, but I'll leave that 
jugment to someone else.

I'm not sure how to ensure the fix doesn't create other problems other than 
what has already been done by proving it still passes the tests it was 
subjected to in the "old" test suite and adds new tests that prove it no longer 
has the denial of service problem.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-08-06 22:50

Message:
Logged In: YES 
user_id=33168

Doesn't this require a change to test/output/test_cgi or
something like that?  Does this test pass when run under
regrtest.py ?

The verify(x == y) should be vereq(x, y).
type(a) != type(0), should be not isinstance(a, int).

The last chunk of the patch in cgi.py should be:
    last_line_lfend = line.endswith('\n')

rather than the 4 lines of if/else.

I don't know if this patch really addresses the problem or
creates other problems.  However, if someone more
knowledgable is confident about this patch, I'm fine with
this going in.  At this point, it might be better to wait
for 2.5.1 though.

----------------------------------------------------------------------

Comment By: Chris McDonough (chrism)
Date: 2006-07-27 14:42

Message:
Logged In: YES 
user_id=32974

The files I've just uploaded are revisions to the cgi and test_cgi modules for the 
current state of the SVN trunk.  If someone could apply these, it would be 
appreciated, or give me access and I'll be happy to.

FTR, this is a bug which exposes systems which use the cgi.FieldStorage class 
(most Python web frameworks do) to a denial of service potential.

----------------------------------------------------------------------

Comment By: Chris McDonough (chrism)
Date: 2005-04-02 20:00

Message:
Logged In: YES 
user_id=32974

FYI, I'd be happy to do the merging here if you wanted to
give me checkin access.

----------------------------------------------------------------------

Comment By: Chris McDonough (chrism)
Date: 2005-04-02 19:42

Message:
Logged In: YES 
user_id=32974

An updated test_cgi.py is attached.  I test both the
readline behavior and add a test for basic multipart parsing. 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2005-03-31 20:48

Message:
Logged In: YES 
user_id=6380

Can I tweak you into uploading a unit test?

----------------------------------------------------------------------

Comment By: Chris McDonough (chrism)
Date: 2005-03-31 18:56

Message:
Logged In: YES 
user_id=32974

Re: parse_multipart..  yes, it looks like there's no use
fixing that as it just turns around and puts the line into a
list.. it is vulnerable but just by virtue of its non-use of
a tempfile, it appears doomed anyway for large requests.  I
don't know of anything that uses it.

Good catch wrt boundary recognition bug, I'm uploading
another patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2005-03-31 14:13

Message:
Logged In: YES 
user_id=6380

Methinks that the fix isn't quite right: it would
incorrectly recognize as a boundary a very long line
starting with "--" followed by the appropriate random string
at offset 2**16. This could probably be taken care of by
adding a flag that is true initially and after that keeps
track of whether the previous line ended in \n.

Also, there's a call to fp.readline() in parse_multipart()
that you didn't patch -- it wouldn't help because that code
is saving the lines in a list anyway, but isn't that code
vulnerable as well? Or is it not used?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1112549&group_id=5470