Found urllib strangeness with redirects - is this really a problem?

Wed Feb 26 19:23:30 EST 2003

First of all, this is very weird and obscure.

My cgi script needs to do a POST to a script on another server. So I do a 
    	remote = urllib.urlopen(url,postdata)

and what I see is that whatever I'm doing gets repeated 10 times. When I 
do this action directly with a browser instead of my script I see a POST 
to the url followed by a GET to a slightly different url.  When my script 
runs with urllib, I see 10 POSTs!

What's happening is that the cgi on the other end is taking my POST data, 
performing the operation, then immediately returning a 302 (redirect) to 
itself to a place to get the results. This is very bizarre, and I don't 
want to defend that, but that's what it does, and I can't muck with that 
cgi.

FancyURLopener sees the 302 and immediately does a POST to the new URL 
(because it carries my postdata along, like a good citizen). This freaks 
out the remote CGI because it thinks I'm trying to do the operation 
again... wash, rinse, repeat.

I 'fixed' this by creating my own FancyURLopener and overriding 
http_error_302:

  class MyURLopener(urllib.FancyURLopener):
    def __init__(self, *args):
      urllib.FancyURLopener.__init__(self)
    def http_error_302(self, url, fp, errcode, errmsg, headers, 
data=None):
      return urllib.FancyURLopener.http_error_302(self, url, fp, errcode, 
errmsg, headers, None )

As you can see, it just forces the POST to be a GET instead, instead of 
consicentiously carrying the data along like FancyURLopener does.

My question is, should that be what urllib does normally, or an option? 
It makes sense, theoretically, that if you get a 302 redirect on a POST 
you would do a POST to wherever you are redirected.  On the other hand, 
no browser I tried (Mozilla 1.2, Mozilla 1.3b, IE6, Netscape 4) actually 
did that. They all turned it into a GET. I found this in RFC 2616, which 
mirrors the results I found, suggesting that urllib is technically 
correct, but doesn't do what any browser does:

  Note: RFC 1945 and RFC 2068 specify that the client is not allowed
  to change the method on the redirected request.  However, most
  existing user agent implementations treat 302 as if it were a 303
  response, performing a GET on the Location field-value regardless
  of the original request method. The status codes 303 and 307 have
  been added for servers that wish to make unambiguously clear which
  kind of reaction is expected of the client.

Is this even worth caring about? Maybe at least a comment in the code? 
Personally, my problem is solved, but it was a bear to figure out what 
was going on, so I thought I'd mention it.