Possible bug in urllib.urljoin
Dear all, We've found a problem using urllib.urljoin when upgrading from python 2.3 to 2.4. It no longer joins a particular corner case of URLs correctly (we think!). The code appears to follow the algorithm (from http://www.ietf.org/rfc/rfc1808.txt) for resolving urls almost exacty... I believe the problem occurs when reaching "step 5" (approx line 160) which will happen if the embedded url has no scheme, netloc or path (and is nonempty). Following the algorithm the resulting url should now be returned using the base urls scheme,netloc and path but the embedded urls params / query (if present else set to base ones) which follows in 2.3: if not path: if not params: params = bparams if not query: query = bquery return urlunparse((scheme, netloc, bpath, params, query, fragment)) However in 2.4, even if the embedded urls path is empty, unless the params and query segments are empty too, flow passes to step 6. if not (path or params or query): return urlunparse((scheme, netloc, bpath, bparams, bquery, fragment)) and thus the last segment of the base path will be removed in order to append the embedded url's path, but the path is empty! and so the resulting path is returned incorrectly. Can you tell me if this was a deliberate decision to move from following the algorithm? If so then we'll work around it. -- ############################################################################## Andrew Edmondson PGP Key: http://search.keyserver.net:11371/pks/lookup?op=get&search=0xCEE814DC PGP Fingerprint: 7B32 4D1E AC4F 29E2 9EAA 9550 1A3D BBA4 CEE8 14DC
On Fri, 23 Sep 2005, Andrew Edmondson wrote:
We've found a problem using urllib.urljoin when upgrading from python 2.3 to 2.4. It no longer joins a particular corner case of URLs correctly (we think!).
The code appears to follow the algorithm (from http://www.ietf.org/rfc/rfc1808.txt) for resolving urls almost exacty... [...] Can you tell me if this was a deliberate decision to move from following the algorithm? If so then we'll work around it.
I don't know if it was done right, but this came in at revision 1.41 of urlparse.py -- the commit comment is actually in 1.42: | Make urlparse RFC 2396 compliant. | Closes bug #450225 (thanks Michael Stone). So I guess the answer to your question is "yes". http://python.org/sf/450225 http://www.ietf.org/rfc/rfc2396.txt John
participants (2)
-
Andrew Edmondson
-
John J Lee