[Python-bugs-list] [ python-Bugs-549151 ] urllib2 POSTs on redirect

noreply@sourceforge.net noreply@sourceforge.net
Wed, 19 Jun 2002 14:57:00 -0700


Bugs item #549151, was opened at 2002-04-26 17:04
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549151&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Jeremy Hylton (jhylton)
Summary: urllib2 POSTs on redirect

Initial Comment:

urllib2 (I'm using 1.13.22 with Python 2.0, but I assume the 2.2 branch does the same) uses the 
POST method on redirect, contrary to RFC1945 section 9.3:

> 9.3  Redirection 3xx
> 
>    This class of status code indicates that further action needs to be
>    taken by the user agent in order to fulfill the request. The action
>    required may be carried out by the user agent without interaction
>    with the user if and only if the method used in the subsequent
>    request is GET or HEAD. A user agent should never automatically
>    redirect a request more than 5 times, since such redirections usually
>    indicate an infinite loop.

Can be fixed in HTTPRedirectHandler.http_error_302 by replacing
        new = Request(newurl, req.get_data())

with
        new = Request(newurl)

so that GET is done on redirect instead of POST.

I suppose the limit of 10 in the same function should be changed to 5, also.


----------------------------------------------------------------------

>Comment By: John J Lee (jjlee)
Date: 2002-06-19 22:56

Message:
Logged In: YES 
user_id=261020

I've attached a patch -- was simpler than I thought.

I think this is in compliance with RFC 2616.  It's also
simple for clients to inherit from HTTPRedirectHandler and
override redirect_request if, for example, they know they
want all 302 POSTs to be redirected as a GET, which removes
the need for mixing redirection code with normal client code
even when the server doesn't follow the RFC (if the server
expects a 302 POST to be redirected as a GET, for example).
Note that I had to add a method, named 'method', to the
Request class, for maintainability (it's possible that
urllib2 may be able to do methods other than GET and POST in
future).

Possibly redirect_request should take fewer parameters.

John


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-06-06 22:25

Message:
Logged In: YES 
user_id=261020

Oh, I hadn't realised httplib uses HTTP/1.1 now -- and now I
check, I see that even RFC 1945 (HTTP/1.0) has a whole set of
restrictions on 30x for particular values of x which I missed
completely.

I guess it should do what Guido suggested -- report an error if a
POST is redirected -- until somebody gets time to do it properly.
I won't have time for a couple of months, but if nobody has done
it by then I'll upload a patch.


John


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-06 16:09

Message:
Logged In: YES 
user_id=31392

I think you need to review the current HTTP spec -- RFC 2616
-- and look at the section on redirection (10.3).  I think
urllib2 could improve its handling of redirection, but the
behavior you describe from lynx sounds incorrect.  I'd be
happy to review a patch the implemented the current spec.

Also, RFC 2616 removes the recommendation of a 5 redirect limit.




----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-04-29 19:10

Message:
Logged In: YES 
user_id=6380

Fair enough. I'll leve it to Jeremy to review the proposed
fix. (Note that he's busy though.)

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-04-29 17:29

Message:
Logged In: YES 
user_id=261020

I don't see why it shouldn't substitue a GET.  That certainly seems to be
the standard practice (well, at least that's what lynx does), and in the case
of the only site where I've encountered redirects on POST, the redirect URL
contains urlencoded stuff, so it clearly expects the user-agent to do a GET.
The site would break if this didn't happen, so I guess Netscape and IE must
do the same thing.

Clearly the RFC *does* allow this, though it doesn't require it (or specify
what should happen here in any way other than to say that a POST is not
allowed, in fact).  Since it's standard practice and allowed by the RFC, I
don't think it should be an error.

John


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-04-29 01:53

Message:
Logged In: YES 
user_id=6380

Hm, the way I interpret the text you quote, if the original
request is  a POST, it should probably not substitute a GET
but report the error.

Assigning to Jeremy since it's his module.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-04-28 17:21

Message:
Logged In: YES 
user_id=261020

1. Bug is also in 2.2 branch

2. The fix (in 2.1 and 2.2) should reflect the earlier bug fix in the 2.2 branch to add the old headers:

new = Request(newurl, headers=req.headers)

3. I guess 10 should be replaced with 4, not 5.


John


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549151&group_id=5470