[New-bugs-announce] [issue10879] cgi memory usage
report at bugs.python.org
Mon Jan 10 09:34:50 CET 2011
New submission from Glenn Linderman <v+python at g.nevcal.com>:
In attempting to review issue 4953, I discovered a conundrum in handling of multipart/formdata.
cgi.py has claimed for some time (at least since 2.4) that it "handles" file storage for uploading large files. I looked at the code in 2.6 that handles such, and it uses the rfc822.Message method, which parses headers from any object supporting readline(). In particular, it doesn't attempt to read message bodies, and there is code in cgi.py to perform that.
There is still code in 3.2 cgi.py to read message bodies, but... rfc822 has gone away, and been replaced with the email package. Theoretically this is good, but the cgi FieldStorage read_multi method now parses the whole CGI input and then iteration parcels out items to FieldStorage instances. There is a significant difference here: email reads everything into memory (if I understand it correctly). That will never work to upload large or many files when combined with a Web server that launches CGI programs with memory limits.
I see several possible actions that could be taken:
1) Documentation. While it is doubtful that any is using 3.x CGI, and this makes it more doubtful, the present code does not match the documentation, because while the documenteation claims to handle file uploads as files, rather than in-memory blobs, the current code does not do that.
2) If there is a method in the email package that corresponds to rfc822.Message, parsing only headers, I couldn't find it. Perhaps it is possible to feed just headers to BytesFeedParser, and stop, and get the same sort of effect. However, this is not the way the cgi.py presently is coded. And if there is a better API, for parsing only headers, that is or could be exposed by email, that might be handy.
3) The 2.6 cgi.py does not claim to support nested multipart/ stuff, only one level. I'm not sure if any present or planned web browsers use nested multipart/ stuff... I guess it would require a nested <form> tag? which is illegal HTML last I checked. So perhaps the general logic flow of 2.6 cgi.py could be reinstated, with a technique to feed only headers to BytesFeedParser, together with reinstating the MIME body parsing in cgi.py,b and this could make a solution that works.
I discovered this, beacuase I couldn't figure out where a bunch of the methods in cgi.py were called from, particularly read_lines_to_outerboundary, and make_file. They seemed to be called much too late in the process. It wasn't until I looked back at 2.6 code that I could see that there was a transition from using rfc822 only for headers to using email for parsing the whole data stream, and that that was the cause of the documentation not seeming to match the code logic. I have no idea if this problem is in 2.7, as I don't have it installed here for easy reference, and I'm personally much more interested in 3.2.
components: Library (Lib)
nosy: r.david.murray, v+python
title: cgi memory usage
versions: Python 3.1, Python 3.2, Python 3.3
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce