[issue4953] cgi module cannot handle POST with multipart/form-data in 3.0
report at bugs.python.org
Thu Jan 13 01:11:07 CET 2011
Glenn Linderman <v+python at g.nevcal.com> added the comment:
The encoding used by the browser is defined in the Content-Type meta tag, or the content-type header ; if not, the default seems to vary for different browsers. So it's definitely better to define it
The argument stream_encoding used in FieldStorage *must* be this encoding
I agree it is better to define it. I think you just said the same thing that the page I linked to said, I might not have conveyed that correctly in my paraphrasing. I assume you are talking about the charset of the Content-Type of the form page itself, as served to the browser, as the browser, sadly, doesn't send that charset back with the form data.
But this raises another problem, when the CGI script has to print the data received. The built-in print() function encodes the string with sys.stdout.encoding, and this will fail if the string can't be encoded with it. It is the case on my PC, where sys.stdout.encoding is cp1252 : it can't handle Arabic or Chinese characters
I don't think there is any need to override print, especially not builtins.print. It is still true that the HTTP data stream is and should be treated as a binary stream. So the script author is responsible for creating such a binary stream.
The FieldStorage class does not use the print method, so it seems inappropriate to add a parameter to its constructor to create a print method that it doesn't use.
For the convenience of CGI script authors, it would be nice if CGI provided access to the output stream in a useful way... and I agree that because the generation of an output page comes complete with its own encoding, that the output stream encoding parameter should be separate from the stream_encoding parameter required for FieldStorage.
A separate, new function or class for doing that seems appropriate, possibly included in cgi.py, but not in FieldStorage. Message 125100 in this issue describes a class IOMix that I wrote and use for such; codifying it by including it in cgi.py would be fine by me... I've been using it quite successfully for some months now.
The last line of Message 125100 may be true, perhaps a few more methods should be added. However, print is not one of them. I think you'll be pleasantly surprised to discover (as I was, after writing that line) that the builtins.print converts its parameters to str, and writes to stdout, assuming that stdout will do the appropriate encoding. The class IOMix will, in fact, do that appropriate encoding (given an appropriate parameter to its initialization. Perhaps for CGI, a convenience function could be added to IOMix to include the last two code lines after IOMix in the prior message:
def setup( encoding="UTF-8"):
sys.stdout = IOMix( sys.stdout, encoding )
sys.stderr = IOMix( sys.stderr, encoding )
Note that IOMix allows the users choice of output stream encoding, applies it to both stdout and stderr, which both need it, and also allows the user to generate binary directly (if sending back a file, for example), as both bytes and str are accepted.
print can be used with a file= parameter in 3.x which your implementation doesn't permit, and which could be used to write to other files by a CGI script, so I really, really don't think we want to override builtins.print without the file= parameter, and specifically tying it to stdout.
My message 126075 still needs to be included in your next patch.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list