Change in cgi module's handling of POST requests

Bob Kline bkline at rksystems.com
Thu Feb 12 23:43:48 EST 2009


Joshua Kugler wrote:
>> We just upgraded Python to 2.6 on some of our servers and a number of our
>> CGI scripts broke because the cgi module has changed the way it handles
>> POST requests.  When the 'action' attribute was not present in the form 
>> element on an HTML page the module behaved as if the value of the
>> attribute was the URL which brought the user to the page with the form,
>> but without the query (?x=y...) part.
>
> This does not make sense.  Can you give an example?

Sure.  Here's a tiny repro script:

#!/usr/bin/python
import cgi, xml.sax.saxutils
def quote(me): return me and xml.sax.saxutils.quoteattr(str(me)) or ''
print """\
Content-type: text/html

<html><body><form method='post'><input name='x' value=%s>
<input type='submit'>
</form></body></html>""" % quote(cgi.FieldStorage().getvalue('x'))

####################  end of repro script  ########################

Try it out on this pre-2.6 Python page:

http://www.rksystems.com/cgi-bin/cgi-repro.py?x=y

When the page comes up, click Submit.  Click it several times.  No 
change in the content of the text field, which is populated when the 
page first comes up from the GET request's URL, and then subsequently 
from the POST request's parameters.

For comparison, here's the equivalent Perl page, which behaves the same way:

http://www.rksystems.com/cgi-bin/cgi-repro.pl?x=y

Or PHP; again, same behavior, no matter how many times you click the 
Submit button:

http://www.rksystems.com/cgi-repro.php?x=y

Now try the Python script above from a server where Python has been 
upgraded to version 2.6:

http://mahler.nci.nih.gov/cgi-bin/cgi-repro.py?x=y

Notice that when you click on the Submit button, the field is populated 
with the string representation of the list which FieldStorage.getvalue() 
returns.  Each time you click the submit button you'll see the effect 
recursively snowballing.  This is exactly the same script as the one 
behind the first URL above, byte for byte.

>> Now FieldStorage.getvalue () is 
>> giving the script a list of two copies of the value for some of the
>> parameters (folding in the parameters from the previous request) instead
>> of the single string it used to return for each.
>
> There is a function call to get only one value.  I think it's get_first() or
> some such.

That's true, but risky.  I have no guarantee that the value entered by 
the user on the form will be first on the list.  I might instead get the 
initial value carried over from the URL which brought up the form to 
begin with.  We're working around the problem by modifying the broken 
scripts to explicitly set the action attributes.

>
> Well, the CGI module hasn't had many changes.  There was this bug fix a few
> months back:
>
> http://svn.python.org/view?rev=64447&view=rev

Looks like that was where it happened.

> It is possible that by fixing a bug, they brought the behavior in line with
> what it *should* be.

That's certainly possible.  I'm not contending that Perl and PHP and the 
previous versions of Python all got it right and the new Python is 
wrong.  It could very well be the other way around.[1]  But my 
expectation, based on what I've seen happen over the years with other 
proposed changes to the language and the libraries, was that there would 
have been some (possibly extended) discussion of the risks of breaking 
existing code, and the best way to phase in the change with as little 
sudden breakage as possible.  I haven't been able to find that 
discussion, and I was hoping some kind soul would point me in the right 
direction.

>   Or maybe the browser behavior changed?

Clearly not, as you will see by using the same browser to try out the 
URLs above.  If you look at the HTML source when the page first comes up 
for each of the scripts, you'll see it's the same.  It's the behavior on 
the server (that is, in the Python library module) which changes.

>   The server
> does not care about an "action" attribute.  That only tells the browser
> where to send the data.

Well that's a pretty good formulation of the conclusion you would come 
to based on the behavior of all of Perl, PHP, and (pre-2.6) Python.  And 
intuitively, that's how one (or at least I) would expect things to 
work.  The parameters in the original URL are appropriately used to seed 
initial values in the form when the form is invoked with a GET request, 
but after that point it's hard to see them as anything but history.  But 
that's not how the new version of the cgi module is behaving.  It's 
folding in the parameters it finds in the original URL, which it gets 
from the environment's QUERY_STRING variable, in with the fields it 
parses from the POST request's body.

>   It is possible the browser did not properly format
> a request when there was no "action" attribute.

When the 'action' attribute is not present in the form element, the 
browser implicitly assigns it the value of the original URL which first 
brought up the page with the form.  This browser behavior has not 
changed.  It's doing the same thing no matter which version of which 
language and libraries are used to implement the CGI script (it has no 
idea what those are).  Nor, as far as I have been able to determine, is 
this behavior dependent on which (version of which) browser you're using.

>  Can you provide more details?

I think we should have enough specifics with what I've provided above to 
make it clear what's happening, but if you can think of anything I've 
left out which you think would be useful, let me know and I'll try to 
supply it.

Cheers,
Bob

[1] I haven't yet finished my attempts to parse the relevant RFCs; I 
assumed that the original authors and maintainers of this module (which 
includes the BDFL himself), would have been more adept at that than I 
am, which is one of the reasons I was hoping to find some discussion in 
the mailing list archives of the discussion of the proposed change in 
the module's behavior.




More information about the Python-list mailing list