[ python-Bugs-918368 ] urllib doesn't correct server returned urls
SourceForge.net
noreply at sourceforge.net
Tue Mar 30 23:45:02 EST 2004
Bugs item #918368, was opened at 2004-03-17 15:41
Message generated for change (Comment added) made by mike_j_brown
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=918368&group_id=5470
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Rob Probin (robzed)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib doesn't correct server returned urls
Initial Comment:
On a URL request where the server returns a URL with spaces in,
urllib doesn't correct it before requesting the new page.
I think this is technically a server error, however, it does work
from web browsers (Mozilla, Safari) but not from Python urllib.
I would suggest that when urllib is following "moved temporarily"
links (or similar) from a server it translates spaces to %20.
See example program file for more including detailed server/client
transactions text.
----------------------------------------------------------------------
Comment By: Mike Brown (mike_j_brown)
Date: 2004-03-30 21:45
Message:
Logged In: YES
user_id=371366
I agree that it is a server error to put something that doesn't
meet the syntactic definition of a URI in the Location header
of a response. I don't see any harm in correcting obvious
errors, though, in the interest of usability.
As for your proposed fix, instead of just correcting spaces, I
would do
newurl = quote(newurl, safe="/:=&?#+!$,;'@()*[]")
quote() is a urllib function that does percent-encoding. It is
way out of date and does poorly with Unicode strings, but if
called with the above arguments, it should safely clean up
most mistakes. The set of additional "safe" characters I am
passing in is the complete set of "reserved" characters
according to the latest draft of RFC 2396bis; these are
characters that definitely do or might possibly have special
meaning in a URL and thus should not be percent-encoded
blindly.
I am currently working on a urllib.quote() replacement for
4Suite's Ft.Lib.Uri library, and will then see about getting the
improvements folded back into urllib.
----------------------------------------------------------------------
Comment By: Rob Probin (robzed)
Date: 2004-03-18 15:38
Message:
Logged In: YES
user_id=1000470
I've tested a change to "redirect_internal(self, url, fp, errcode, errmsg,
headers, data)" in "urllib.py" that adds a single line newurl =
newurl.replace(" ","%20") after the basejoin() function call that appears
to fix the problem.
This information is placed in the public domain.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=918368&group_id=5470
More information about the Python-bugs-list
mailing list