[Python-bugs-list] [ python-Bugs-549151 ] urllib2 POSTs on redirect

SourceForge.net noreply@sourceforge.net
Sun, 20 Apr 2003 13:44:24 -0700


Bugs item #549151, was opened at 2002-04-26 17:04
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=549151&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: Accepted
Priority: 7
Submitted By: John J Lee (jjlee)
Assigned to: Raymond Hettinger (rhettinger)
Summary: urllib2 POSTs on redirect

Initial Comment:

urllib2 (I'm using 1.13.22 with Python 2.0, but I assume the 2.2 branch does the same) uses the 
POST method on redirect, contrary to RFC1945 section 9.3:

> 9.3  Redirection 3xx
> 
>    This class of status code indicates that further action needs to be
>    taken by the user agent in order to fulfill the request. The action
>    required may be carried out by the user agent without interaction
>    with the user if and only if the method used in the subsequent
>    request is GET or HEAD. A user agent should never automatically
>    redirect a request more than 5 times, since such redirections usually
>    indicate an infinite loop.

Can be fixed in HTTPRedirectHandler.http_error_302 by replacing
        new = Request(newurl, req.get_data())

with
        new = Request(newurl)

so that GET is done on redirect instead of POST.

I suppose the limit of 10 in the same function should be changed to 5, also.


----------------------------------------------------------------------

>Comment By: John J Lee (jjlee)
Date: 2003-04-20 21:44

Message:
Logged In: YES 
user_id=261020

Just noticed that I forgot to upload one of the doc patches 
on 2003-03-05.  Here it is. 
 
The patches that need to be applied are the three I 
uploaded on 2003-03-05 (urllib.py.patch, urllib2.py.patch 
and liburllib2.tex.patch), and the liburllib.tex.patch I just 
uploaded. 
 

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-04-16 21:35

Message:
Logged In: YES 
user_id=6380

I think this is the right thing to do. Can you check it in?
 (There are several files -- urllib, urllib2, and its docs).

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-04-16 21:27

Message:
Logged In: YES 
user_id=80475

Do you have time to take this one back?  My head starts to 
explode when researching whether this is the right thing 
to do.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-03-07 20:57

Message:
Logged In: YES 
user_id=6380

I'll try to work on this.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2003-03-05 21:03

Message:
Logged In: YES 
user_id=261020

The fix for this was set to go into 2.3, but nothing has  
happened yet. 
 
There were a couple of minor details to be fixed in the 
patches, so I've done that, and uploaded new patches for  
urllib2.py, urllib.py, and their documentation files (against  
current CVS). Well, I'm just about to upload them... 
 
Jeremy, can you apply the patches and finally close this 
one now?  Let me know if anything else needs doing. 
 
This should actually close another bug on SF, too (if it's still 
open), which deals with the urllib.py case.  I can't find it,  
though.  I did attach a comment to it (before I lost it) that 
references this bug. 
 

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-10-15 16:22

Message:
Logged In: YES 
user_id=261020

Well, when I carefully read the RFC I came to the conclusion
that 301 should be redirected to GET, not POST.  I also came
to the conclusion that the RFC was slightly ambiguous on
this point, so I guess that's why A. Flavell came to the
conclusion that 301 should redirect to POST rather than GET.
Anyway, clearly the established practice is to redirect 301
as GET, so this is all academic.

Assuming he's right about Netscape / IE etc., I suppose I'm
happy for 301 to redirect without an exception, since that's
what we've agreed to do for 302.  Obviously, the docs should
say this is contrary to the RFC (as my doc patch says for
302 already).

As for urllib.py, I see the problem.  303 should still be
added, though, since that poses no problem at all.


John


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-10-11 16:40

Message:
Logged In: YES 
user_id=31392

I'm still uncertain about what to do on 301 responses.  As
you say, there are two issues to sort out: 1) what method
should a POST be redicted to and 2) whether the user should
be asked to confirm.

There's discussion of this at:
http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html
The page is a bit old, but I don't know if it is out of date.
It says that IE5, Netscape 4, and Mozilla 1.0 all redirect a
301 POST to a GET without user confirmation.  That seems
like a widespread disregard for the spec that we ought to
emulate.

I agree with your interpretation that urllib2 raise an
HTTPError to signal "request confirm" because an HTTPError
is also a valid response that the user could interpret.  But
of urllib, an HTTP error doesn't contain a valid response. 
The change would make it impossible for the client to do
anything if a 301 response is returned from a POST.  That
seems worse than doing the wrong.



----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-10-10 02:27

Message:
Logged In: YES 
user_id=6380

OK, over to jeremy.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-10-09 22:16

Message:
Logged In: YES 
user_id=261020

Ah.  Here it is.

John


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-10-08 19:23

Message:
Logged In: YES 
user_id=6380

Um, I don't see the patch for urllib.py.  Perhaps you didn't
check the box?

Jeremy, let's do this in 2.3, OK?

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-10-08 17:53

Message:
Logged In: YES 
user_id=261020

Guido mentioned that urllib.py should be fixed
too.  I've attached another patch.


John


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-10-05 18:44

Message:
Logged In: YES 
user_id=261020

Here is a patch for the minor documentation changes required
for the patch I uploaded earlier to fix this bug.  I've also
uploaded a slightly revised patch:

I renamed the `method' method to `get_method' for
consistency with the other methods of the Request class.
The reason this new method is there at all is because the
logic that decides whether or not a redirection should take
place is defined in terms of the method type
(GET/HEAD/POST).  urllib2 doesn't support HEAD ATM, but it
might in future.  OTOH, the Request class is supposed to be
generic -- not specific to HTTP -- so perhaps get_method
isn't really appropriate.  OTOOH, add_data is already
documented to only be meaningful for HTTP Requests.  I
suppose there could be a subclass HTTPRequest, but that's
less simple.  What do you think is best?

I also changed redirect_request to raise an exception
instead of returning None (because no other Handler should
handle the redirect), and changed its documented return
value / exception interface to be consistent with the rest
of urllib2.


John


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-06-23 20:23

Message:
Logged In: YES 
user_id=261020

First, can I underline the fact that there are two issues
here:

  1. What method should POST be redirected as for a
     particular 30x response?

  2. Should the user be asked to confirm the redirection
     (which presumably maps onto raising an HTTPError in
     urllib2)?

I think current client implementations go against the RFCs
on both these points for 302.  Contrastingly, RFC 2616's
note on 301 responses (section 10.3.2) implies that only a
subset of HTTP/1.0 (not 1.1, note) clients violate the RFCs
(1945, 2068, 2616) on point 1 for 301 responses, and I know
of no clients that violate the RFCs on point 2 for 301
responses (but I haven't tested).  Also, I guess the
original intent (for both 301 and 302) was that POSTs
represent an action, so it's dangerous in general to
redirect them without asking the user: I presume this is why
307 was brought in: 307 POSTs on redirected POST, so the
user must be asked:

10.3 Redirection 3xx
[...]
   The action required MAY be carried out by the user agent
   without interaction with the user if and only if the
   method used in the second request is GET or HEAD.

(and 303 is just meant to have the 'erroneous' 302
behaviour: GET on redirected POST, don't ask the user)

So, I agree with Guido that 302 should probably now be
treated the way most clients treat it: GET on redirected
POST, don't ask the user.  As for 301, if it *is* true that
only a few HTTP/1.0 clients have misinterpreted the RFCs on
301, we should stick to the safe behaviour indicated by the
RFC.

As for Guido's first question: A 307 should indeed be
redirected as a POST, but it should ask the user first (see
10.3 quote, above).  That's what my patch does (the 'return
None' in the else clause in redirect_request causes an
exception to be raised eventually -- maybe it should just
raise it itself).

I've attached a revised patch that deals with 302 (and 303)
in the 'erroneous' way (again: GET on redirected POST, don't
ask the user).

I suppose the comment about HTTPRedirectHandler also needs
to state that 301 and 307 POST requests are not
automatically redirected -- HTTPError is raised in that
case.


John


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-06-20 01:57

Message:
Logged In: YES 
user_id=6380

Hm, I thought that 307 should be treated differently than
303. In particular, a 307 response to POST should cause the
POST to be redirected. And I'm not sure about what a 301
response to POST should do.

(In the mean time I have been convinced that changing POST
to GET on 302 is the best thing to do given that most
browsers do that. The old urllib.py should be fixed too.)

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-06-19 22:57

Message:
Logged In: YES 
user_id=261020

I've attached a patch -- was simpler than I thought.

I think this is in compliance with RFC 2616.  It's also
simple for clients to inherit from HTTPRedirectHandler and
override redirect_request if, for example, they know they
want all 302 POSTs to be redirected as a GET, which removes
the need for mixing redirection code with normal client code
even when the server doesn't follow the RFC (if the server
expects a 302 POST to be redirected as a GET, for example).
Note that I had to add a method, named 'method', to the
Request class, for maintainability (it's possible that
urllib2 may be able to do methods other than GET and POST in
future).

Possibly redirect_request should take fewer parameters.

John


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-06-19 22:56

Message:
Logged In: YES 
user_id=261020

I've attached a patch -- was simpler than I thought.

I think this is in compliance with RFC 2616.  It's also
simple for clients to inherit from HTTPRedirectHandler and
override redirect_request if, for example, they know they
want all 302 POSTs to be redirected as a GET, which removes
the need for mixing redirection code with normal client code
even when the server doesn't follow the RFC (if the server
expects a 302 POST to be redirected as a GET, for example).
Note that I had to add a method, named 'method', to the
Request class, for maintainability (it's possible that
urllib2 may be able to do methods other than GET and POST in
future).

Possibly redirect_request should take fewer parameters.

John


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-06-06 22:25

Message:
Logged In: YES 
user_id=261020

Oh, I hadn't realised httplib uses HTTP/1.1 now -- and now I
check, I see that even RFC 1945 (HTTP/1.0) has a whole set of
restrictions on 30x for particular values of x which I missed
completely.

I guess it should do what Guido suggested -- report an error if a
POST is redirected -- until somebody gets time to do it properly.
I won't have time for a couple of months, but if nobody has done
it by then I'll upload a patch.


John


----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-06 16:09

Message:
Logged In: YES 
user_id=31392

I think you need to review the current HTTP spec -- RFC 2616
-- and look at the section on redirection (10.3).  I think
urllib2 could improve its handling of redirection, but the
behavior you describe from lynx sounds incorrect.  I'd be
happy to review a patch the implemented the current spec.

Also, RFC 2616 removes the recommendation of a 5 redirect limit.




----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-04-29 19:10

Message:
Logged In: YES 
user_id=6380

Fair enough. I'll leve it to Jeremy to review the proposed
fix. (Note that he's busy though.)

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-04-29 17:29

Message:
Logged In: YES 
user_id=261020

I don't see why it shouldn't substitue a GET.  That certainly seems to be
the standard practice (well, at least that's what lynx does), and in the case
of the only site where I've encountered redirects on POST, the redirect URL
contains urlencoded stuff, so it clearly expects the user-agent to do a GET.
The site would break if this didn't happen, so I guess Netscape and IE must
do the same thing.

Clearly the RFC *does* allow this, though it doesn't require it (or specify
what should happen here in any way other than to say that a POST is not
allowed, in fact).  Since it's standard practice and allowed by the RFC, I
don't think it should be an error.

John


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-04-29 01:53

Message:
Logged In: YES 
user_id=6380

Hm, the way I interpret the text you quote, if the original
request is  a POST, it should probably not substitute a GET
but report the error.

Assigning to Jeremy since it's his module.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2002-04-28 17:21

Message:
Logged In: YES 
user_id=261020

1. Bug is also in 2.2 branch

2. The fix (in 2.1 and 2.2) should reflect the earlier bug fix in the 2.2 branch to add the old headers:

new = Request(newurl, headers=req.headers)

3. I guess 10 should be replaced with 4, not 5.


John


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=549151&group_id=5470