[Python-Dev] bug in urlparse

Mike Brown mike at skew.org
Thu Sep 8 20:41:39 CEST 2005


jepler at unpythonic.net wrote:
> According to RFC 2396[1] section 5.2:

RFC 2396 is obsolete. It was superseded by RFC 3986 / STD 66 early this year.

In particular, the procedure for removing dot-segments from the path component 
of a URI reference -- a procedure that is only supposed to be done when 
'resolving' a reference to absolute form (i.e., merging it with a base URI, 
which, being a URI, not a URI reference, is not allowed to contain 
dot-segments) -- has received a significant overhaul.

The implementation guidance you quoted from RFC 2396 is no longer relevant. 
Technically, it never was relevant, since urlparse only claims to implement 
RFC 1808 (2396's predecessor, now ten years old).

The new procedure says

  "...dot-segments are intended for use in URI references to
   express an identifier relative to the hierarchy of names in the base
   URI.  The remove_dot_segments algorithm respects that hierarchy by
   removing extra dot-segments rather than treat them as an error or
   leaving them to be misinterpreted by dereference implementations."

-Mike


More information about the Python-Dev mailing list