Found urllib strangeness with redirects - is this really a problem?
Akai Majo Sizer
sizer at nospam.com
Thu Feb 27 01:23:30 CET 2003
First of all, this is very weird and obscure.
My cgi script needs to do a POST to a script on another server. So I do a
remote = urllib.urlopen(url,postdata)
and what I see is that whatever I'm doing gets repeated 10 times. When I
do this action directly with a browser instead of my script I see a POST
to the url followed by a GET to a slightly different url. When my script
runs with urllib, I see 10 POSTs!
What's happening is that the cgi on the other end is taking my POST data,
performing the operation, then immediately returning a 302 (redirect) to
itself to a place to get the results. This is very bizarre, and I don't
want to defend that, but that's what it does, and I can't muck with that
FancyURLopener sees the 302 and immediately does a POST to the new URL
(because it carries my postdata along, like a good citizen). This freaks
out the remote CGI because it thinks I'm trying to do the operation
again... wash, rinse, repeat.
I 'fixed' this by creating my own FancyURLopener and overriding
def __init__(self, *args):
def http_error_302(self, url, fp, errcode, errmsg, headers,
return urllib.FancyURLopener.http_error_302(self, url, fp, errcode,
errmsg, headers, None )
As you can see, it just forces the POST to be a GET instead, instead of
consicentiously carrying the data along like FancyURLopener does.
My question is, should that be what urllib does normally, or an option?
It makes sense, theoretically, that if you get a 302 redirect on a POST
you would do a POST to wherever you are redirected. On the other hand,
no browser I tried (Mozilla 1.2, Mozilla 1.3b, IE6, Netscape 4) actually
did that. They all turned it into a GET. I found this in RFC 2616, which
mirrors the results I found, suggesting that urllib is technically
correct, but doesn't do what any browser does:
Note: RFC 1945 and RFC 2068 specify that the client is not allowed
to change the method on the redirected request. However, most
existing user agent implementations treat 302 as if it were a 303
response, performing a GET on the Location field-value regardless
of the original request method. The status codes 303 and 307 have
been added for servers that wish to make unambiguously clear which
kind of reaction is expected of the client.
Is this even worth caring about? Maybe at least a comment in the code?
Personally, my problem is solved, but it was a bear to figure out what
was going on, so I thought I'd mention it.
More information about the Python-list